Skip to Main Content

STAT 310: Spatial Statistics

Professor Claire Kelling

Strategies for Searching for Spatial Data

Searching for data doesn't really happen in a step by step process. It's more circular and iterative than that. However, having a process in mind can help you notice the decisions you're making and possibilities that you haven't yet pursued. 

Step 1: Decide What You're Looking For

Take a moment to decide what kind of data you are hoping to find. You may want data about a specific subject or topic. Or you may want data for a specific geographic place. Or you may want data from a specific time period. Which of these: topic, place, or time -- is most important to you?

Step 2: Consider Who Might Produce These Data

Think about how data are distributed and shared. Some data like educational data are collected by schools and distributed through governmental agencies. Environmental data are collected by scientists and shared with research articles or by agencies across different levels of government. GIS data across a wide range of topics are shared by GIS practitioners in repositories. You might not have a clear sense of who would collect the data you want, but it helps to keep it in mind as you're searching, especially in addressing the next step.

Step 3: Decide How and Where to Start Searching

There isn't just one way to look for data, and most important thing to you (see Step 1) will affect where to start your search. If shapes within a US state are what you are looking for, maybe start with a state data repository. If you want to see what is available in shapefile format, start with a GIS repository. This guide helps you explore in several different but overlapping ways to find data. The following questions help you decide where to start.

1. Could your data be found in a location-based collection of data? If so, then use the Explore by Place tab.

2. Could your data be found in a format-specific collection of data? If so, then start with the Explore by Format tab.

3. Could your data be found in a topical collection of data? If yes, then try the Explore by Topic tab.

4. Can you go directly to the source and bypass collections? Is your data likely to be part of a specific project or process you already know about? If so, Google searching may be sufficient.

5. Check the "Other Data-Finding Tools" tab to help you answer some of these questions. Some of these tools let you search by keyword and then sort by source or type. 

Step 4: Document Your Progress and Seek Assistance

Keep notes for yourself of where you look and what you find, so you can retrace your steps. Something you rule out early might lead you to a dataset you need later. 

When you feel you could benefit from some guidance in your searching, get in touch with the librarian

Formats to Look For

Formatted for GIS Software

.shp - shape files - These are formatted for GIS software and will be easy for you to work with. When you see a shape file, be sure to download all accompanying files, too. They work together, so you need all of them. Keep an eye out for metadata files (usually .xml or .gml).

.gdb - geodatabase - This is your shapefile and all associated files saved as one, more portable file

.geojson or .json - geographic JavaScript object notation - vector points, lines, and polygons and tabular information can all be saved in GeoJSON format. You're most likely to find this format if you download from a web-based map. 

.kml or .kmz - Google Keyhole markup language - Google Earth and Google Maps download in this format, which is readable in ArcGIS Online and can be reformatted into a shape file.

Tabular Data with Shapes and Points

.csv - comma separated values - The most flexible option for tabular data

.xls or .xlsx - Excel format, which can be converted to CSV

.tab - tab separated values - Similar to CSV

When collecting point data in these formats, make sure one of the columns contains latitude/longitude.

 

Terms to Watch For

Data Catalog: an organized listing of data, which can be searched in order to discover available data. Often the content of data catalogs are descriptive with links to where data can be accessed. Data catalogs are created around collections (usually) that reflect the interests of the organization that collects. Brainstorming organizations that might collect and distribute the data you need is an important part of searching for data (e.g., universities, non-governmental organizations). When searching a data catalog, use keywords that are likely to be used in the study-level description of a dataset. Example: World Bank Microdata Catalog

Data Portal: a web site that allows users to search for data, and which allows data producers to make their data more easily discoverable. Portal is a broader concept and can take the form of a data catalog, a simple listing, or even a data repository. Like catalogs, portals usually point out to where the data can be downloaded. When using a portal, expect to hop across various web sites. Example: Data.gov

Data Repository: in the context of research, a data repository is a centralized place (usually itself a database infrastructure accessible via a website) to store, preserve, organize, and provide access to data of interest to researchers. Data repositories are created around communities, which might be defined by place (e.g., a state data repository), topic of interest (e.g, snow and ice data repository), disciplinary focus (e.g. health sciences), governmental or organizational mission (e.g., meteorological data), or format (e.g. GIS data). Brainstorming and identifying potential repositories that might hold data of interest to you is a major part of searching for data.

Metadata: data require documentation to be usable. Metadata are data about data -- structured descriptions of datasets. These help you by telling you what variables mean, what units are being employed, who created the dataset and under what conditions. Metadata are key to determining whether you will be able to use a dataset accurately and ethically. 

Query Tools: Many data websites let you build your own subset or your own table of data instead of requiring you to download the whole thing or scour through premade tables. This feature goes by many names. Look for "custom tables," "online analysis," "interactive data," "data query," etc..