Skip to Main Content

Digital Text Analysis and Text Mining

Finding word trends across large bodies of text.

Where do I find text to analyze?

The Internet is full of content that anyone can access, but just becuase you can see it doesn't mean you can download the content and turn it into a dataset. This kind of activity is considered "reuse." It is important to always check and understand the copyright status and license (when applicable) for any restrictions on reuse of any text you want to mine. Librarians are great resources to consult when looking for resources to use for text mining.

Government Documents

Primary Source Collections

Database APIs & CONTENTdm

DO NOT SCRAPE LIBRARY DATABASES

Why? You're stealing protected information, and that violation of Terms of Service can get access turned off for the entire campus.

But, there are options!

Always ask your library liaison if you don't know how to access the underlying data in a library database -- we can help you do it legally! Depending on the database vendor, we may be able to get hard drives of all the underlying data, or we can help to set up API access for you.