Skip to Main Content

Data Sets for Quantitative Research: Public Use Datasets

Finding Datasets on the Internet

There are many research organizations making data available on the web, but still no perfect mechanism for searching the content of all these collections. The links below will take you to data search portals which seem to be among the best available. Note that these portals point to both free and pay sources for data, and to both raw data and processed statistics.

 * Resources that are not entirely free are marked with an asterisk.

Transform web information into machine-readable data for analysis

Have you found fantastic numeric information in a less-than-ideal format, such as PDF or HTML?   Here are some software products that may help you transform those formats into numbers that you can read into a spreadsheet or statistical software program.  Some of these are free or offer limited time, free trials:

  • Convert PDF charts and tables into machine-readable, numeric datasets
    • Spark OCR: Find tables in images, visually identify rows and columns, and extract data from cells into data frames. Turn scans from financial disclosures, academic papers, lab results and more into usable data. 
    • PDFTables : PDF to Excel Converter
    • Tabula : Extract tables from PDFs
    • table-ocr: For those who know Python
    • Abbyy Finereader : Access and modify information locked in paper-based documents and PDF files
    • OCR Space: This free service transforms PDFs into plain text files directly in your browser.  Rows and columns are preserved, making it easier to import the file into Excel using the Import Text Wizard.  See further explanation and instructions here: Table recognition with OCR.
       
  • Web scraping tools
    • Parsehub : Data mining tool for data scientists and journalists
    • Webhose : Turn unstructured web content into machine-readable data feeds
    • Data Streamer : Index weblogs, mainstream news, and social media
    • Outwit : Turn websites into structured data

 

Feeling intrigued, but unsure how to leverage web-based data for your own research?  Here are some how-to guides:

Selected datasets on the Internet, arranged by topic

These are some of the most significant datasets available on the internet, arranged by topic.  Almost everything here is freely available. The few that do involve fees are marked with asterisks (*). Note that some of the listings below are also available in ICPSR.

Political Science/Public Policy

Demographics

Business and Economics

Health

Science

Sociology

Education

Miscellaneous

*Resources that are not entirely free are marked with an asterisk