Skip to Main Content

Data, Datasets, and Statistical Resources

When You're Not Sure Where to Start

  • Stop by the Research/IT Desk or make an appointment with your liaison librarian.
  • Consult one of the subject guides to data.
  • Take a look at the Data Reference Worksheet to brainstorm ideas for searching.

Tracking Data Sources in Research Literature

At the very first stages of your literature review, start taking notes on potential data sources.  Make a habit of jotting down the data used in each study you read to make it faster when you come back later in your search for data.  Also, this practice can help you see and articulate how your contribution is unique.  You might want to keep these notes in a table like the following for easy reference.

Author(s) and Year of Publication Claim Data Dependent Variable/Estimation Technique Significant Findings
     

 

 

For an editable table like the one above, open this spreadsheet, save a copy for yourself, then use it to track your literature.

Searching at the Variable Level

Most data catalogs describe datasets at the "study level." This means that when you search by subject or keyword, your search terms are matched against words used to describe the whole study broadly, not all the specific topics addressed within it. This practice is similar to the way books are described in the library's catalog.

However, it is often the case that you would like to search at the "variable level."  For example, you might like to search for all surveys that ask a question about happiness. What is a researcher to do? Because of improvements in data description, some datasets are searchable by variable. Below are some places that allow variable searching. Be aware, though, that you cannot search all data by variable.

Comparing Variables Across Datasets

In your search for data you may need to compare similar questions from several surveys. Below is a grid to help you keep track of this process.

Searching for Raw Data

The term "raw data" is being used increasingly to mean many things, but in the world of social science research data, it means something much more specific. If you are looking for microdata (data at the individual level, analyzable at the level at which it was collected, e.g., persons, households, etc) formatted for use with statistical software, then you're in the right place.

Data in ASCII format, or formatted for use with statistical packages (e.g., SPSS, Stata, R), can often be found most efficiently by searching collections and archives of data.

Here are just a few prime candidates. For more specific recommendations and ideas, contact a librarian.

 

Finding Replication Data

Many researchers want to make their data available to others to replicate their findings, provide transparency, and to encourage secondary research using their data. Here are some places to consider when looking for replication data.

  • Search engines for replication data
    • DataCite's Metadata Search allows you to search by keyword across a growing collection of replication datasets.
    • WorldWideScience.org Advanced Search allows a search by keyword across multiple data archives, data repositories and open science portals. Once you search, limit to Type:Dataset to focus your results. 
  • Disciplinary data repositories
    • Dryad is a repository of nearly 10,000 scientific replication datasets
    • re3data.org is a registry of data repositories that you can browse by discipline
  • Non-disciplinary data repositories
    • DataVerse is a place where researchers can host datasets associated with research publications.
    • FigShare is a place where anyone can post research data and publications. Beware, there is no editorical or curatorial process to ensure quality.
  • Author's academic web sites
    • Pippa Norris is my favorite example.
    • Search Google for the researcher's name and 'site:edu' (without quotes)
    • Look for areas of their page called data, datasets, data sets, or replication data
  • Journal web sites
    • Identify journals that publish a large amount of applied empirical research. Ask a librarian or your professor for help.
    • The Annals of Applied Statistics journal maintains a large collection of replication data
    • More examples: Journal of Peace Research
    • Go directly to their web sites and look for sections labeled data, datasets, data sets, or replication data
    • Gary King at Harvard maintains of list of journals with replication data policies
    • Many more journals require that data associated with articles be shared in trusted repositories. To find data associated with an article, open up an article and look near the end or beginning for a statement on data availability. Examples: American Naturalist, Evolution, Journal of Evolutionary Biology, Molecular Ecology, and Heredity.
  • Scholarly association web sites

Finding Statistical Resources in Print

Web sites with data almost certainly have the most current numbers, but don't always include data collected in previous years. The library's paper collections afford a wealth of places to find statistics and data beyond what is available from databases online. Books often contain appendixes with tables. Government and non-government agency reports and statistical compendia are likely to contain -- or consist nearly entirely of -- statistical tables and charts.

Search

Most statistical publications are cataloged with a subject heading that contains the word "statistics," for example: Crime - Economic aspects - United States - Statistics.

Perform an Advanced Search with one dropdown set to "Subject contains" and type the word "statistics." Example catalog search for crime statistics.

Browse

Another plausible way to find older statistics is to browse the stacks. This is a useful way of getting around the problem of needing to guess the right words for an effective catalog search. The oversize stacks on 2nd contain a rich collection of statistical compendia.

Chase Citations

Use the books and articles you already have in hand and skim the bibliographies and "Data" sections of the text for the names of studies, datasets, or collection agencies. Search for these names in the catalog.

For example, you might see the IMF's "Direction of Trade Statistics" cited frequently in the literature. Their web site provides access to the current data. Search Catalyst and discover that the annual index is available in its entirety back to 1962 on the second floor of the library.

Data on CD-Rom

Some datasets are distributed on CD-rom and are housed in the Library.

To find more data on CD-ROM, perform a search in the library catalog, Catalyst. Try using an Advanced Search and use "CD-ROM" as a keyword, and use a publishing agency as an author. For example:

Any Keyword: cd-rom
and
Author Contains: "bureau of the census"

You could also start by browsing the catalog for data in the following topics:

To Librarians:

Please feel free to reuse content from this page with acknowledgment.