Skip to Main Content

STAT 310: Spatial Statistics

For Prof. Claire Kelling

Strategies for Searching for Spatial Data

Step 1, Know What You're Looking For

Take a moment to define what kind of data you are hoping to find. You may want data about a topic, or for a specific kind of place, or from a specific time period. Considering the following qualities, which are most important: subject matter, place, time, etc.

Step 2, Consider Who Might Produce These Data

Think about how data are distributed and shared. Some data like educational data are collected by schools and distributed through governmental agencies. Environmental data are collected by scientists and shared alongside research articles or by agencies across varying levels of government. GIS data across a wide array of topics are shared by GIS practitioners in repositories. You might not have a clear sense of who would collect the data you want, but it helps to keep it in mind as you're searching.

Step 3, Decide How Where to Start Searching

There is not one way to look for data, and the qualities most important to you will impact where it will be most fruitful to search. If shapes within a US state are what you are looking for, maybe start with a state data repository. If you want to see what is available in shapefile format, start with a GIS repository. This guide helps you explore in several different, but overlapping ways to find data. 

1. Could your data be found in a location-based collection of data? If so, then use the Explore by Place tab.

2. Could your data be found in a format-specific collection of data? If so, then start with the Explore by Format tab.

3. Could your data be found in a topical collection of data? If yes, then try the Explore by Topic tab.

4. Can you go directly to the source and bypass collections? Is your data likely to be part of a specific project or process you already know about? If so, Google searching will likely be sufficient.

5. Check the "other data-searching tools" tab to help you answer some of these questions. Some of these tools let you search by keyword and then sort by source or type. 

Terms to Watch For

[Explain terms like: data repositories, data catalogs, datasets, etc. Also identify documentation terms like codebooks, data dictionaries, metadata, etc.]

Data Catalog: an organized listing of data, which can be searched in order to discover available data. Often the content of data catalogs are descriptive with links to where data can be accessed. Data catalogs are created around collections (usually) that reflect the interests of the organization that collects. Brainstorming organizations that might collect and distribute the data you need is an important part of searching for data (e.g., universities, non-governmental organizations). When searching a data catalog, use keywords that are likely to be used in the study-level description of a dataset. Example: World Bank Microdata Catalog

Data Portal: a web site that allows users to search for data, and which allows data producers to make their data more easily discoverable. Portal is a broader concept and can take the form of a data catalog, a simple listing, or even a data repository. Like catalogs, portals usually point out to where the data can be downloaded. When using a portal, expect to hop across various web sites. Example: Data.gov

Data Repository: in the context of research, a data repository is a centralized place (usually itself a database infrastructure accessible via a website) to store, preserve, organize, and provide access to data of interest to researchers. Data repositories are created around communities, which might be defined by place (e.g., a state data repository), topic of interest (e.g, snow and ice data repository), disciplinary focus (e.g. health sciences), governmental or organizational mission (e.g., meteorological data), or format (e.g. GIS data). Brainstorming and identifying potential repositories that might hold data of interest to you is a major part of searching for data.

Metadata: data require documentation to be usable. Metadata are data about data -- structured descriptions of datasets. These help you by telling you what variables mean, what units are being employed, who created the dataset and under what conditions. Metadata are key to determining whether you will be able to use a dataset accurately and ethically. 

Query Tools: Many data websites let you build your own subset or your own table of data instead of requiring you to download the whole thing or scour through premade tables. This feature goes by many names. Look for "custom tables," "online analysis," "interactive data," "data query,"