Skip to Main Content

STAT 399

Course guide for STAT 399 - Comps - Taught by Katie St. Clair

Explore Publications for Datasets

Datasets are often associated with the articles that use them in their analysis and claims. One of the goals of modern research is to allow others to see the work that you have done and confirm it. This is part of the open science movement that aims to make more research replicable and allows other scientists and scholars to confidently build on the reported findings. Whether you are reusing a dataset that has been collected by other scholars or using your own dataset that you've generated, one the objectives when you publish the research is to lead readers to the dataset. As such, readers and consumers can often locate the data that is associated with an article, and sometimes it's even in a great format for reuse without specialized software. On this page, we'll discuss how to find these types of data through publications. We'll divide this into two categories and provide recommendations on searching for each type.

  • Papers reporting data the authors have collected (sometimes referred to as replication datasets)
  • Papers reporting datasets the authors have reused

Papers Reporting Data the Authors have Collected

Finding data associated with a paper is easier than ever. Some publications are requiring that the data be posted with the paper upon publication and others are requiring a data availability statement be posted with the work indicating how to obtain the data. In a perfect world, required elements for a data availability statement would be very clearly defined and standardized. However, the current data landscape is messy and still has a long way to go before each published article has easy to use data right along with it. Part of this is due to the publishing landscape being highly diverse.

Because data is associated with publications, let's identify some publishers that have very good practices for requiring data to be posted with a published work. 

1. Enter a search term like Dryad or Figshare into the search box, make sure Data Availability is selected from the all fields drop down. 

2. Add another search term by selecting clicking the plus icon. 

3. Hit search and browse through articles that have published data shared in a repository. 

PLoS acessible data tagAs you search using this method and are browsing through articles, you may note a Data Accessible tag on article records. These articles have been published after 2016 and have data shared from a known data repository. 

Many more journals require that data associated with articles be shared in trusted repositories. To find data associated with an article, open up an article and look near the end or beginning for a statement on data availability. Examples: American NaturalistEvolutionJournal of Evolutionary BiologyMolecular Ecology, and Heredity.

Search in Common Repositories for Data Associated with Publications

Another route to finding data related to articles is to search directly in common repositories that hold data related to publications. Several examples are: 

 

Papers Reporting Data that the Authors have Reused

Papers reporting data that the authors have reused should list that data in the cited references of the work. These data will often be from:

  • Repositories like ICPSR (Inter-university Consortium for Political and Social Research)
  • NGOs or Think tanks like World DataBank or Pew
  • Government sources like the census

It's best practice to cite all data sources used in papers and analysis that you didn't create. Citing a dataset in a reference list is very similar to citing other resources. The objective should be to lead you back to the original source. Frequently, this means that there are links back to original datasets and often digital object identifiers, which should be persistent links. However, it's not always the case that authors have appropriately created a citation for a dataset and furthermore datasets can change URLs and face removal from their original place of publication.

If you've found a resource discussing a source you aren't sure how to find, please contact Hannah, your librarian.