Data Sets for Quantitative Research: Public Use Datasets

Finding Datasets on the Internet

There are many research organizations making data available on the web, but still no perfect mechanism for searching the content of all these collections. The links below will take you to data search portals which seem to be among the best available. Note that these portals point to both free and pay sources for data, and to both raw data and processed statistics. Pay sources are marked with an asterisk.

Transform web information into machine-readable data for analysis

Have you found fantastic numeric information in a less-than-ideal format, such as PDF or HTML?   Here are some software products that may help you transform those formats into numbers that you can read into a spreadsheet or statistical software program.  Some of these are free or offer limited time, free trials:

  • Convert PDF charts and tables into machine-readable, numeric datasets
    • PDFTables : PDF to Excel Converter
    • Tabula : Extract tables from PDFs
    • Abbyy Finereader : Access and modify information locked in paper-based documents and PDF files
  • Web scraping tools
    • Parsehub : Data mining tool for data scientists and journalists
    • Webhose : Turn unstructured web content into machine-readable data feeds
    • Data Streamer : Index weblogs, mainstream news, and social media
    • Outwit : Turn websites into structured data


Feeling intrigued, but unsure how to leverage web-based data for your own research?  Here are some how-to guides:

Selected datasets on the Internet, arranged by topic

These are some of the most significant datasets available on the internet, arranged by topic.  Almost everything here is freely available. The few that do involve fees are marked with asterisks (*). Note that of the listings below are also available in ICPSR.

Political Science/Public Policy