Typically, your goal should be to obtain or create a Plain Text (UTF 8) version of the work(s) you will be analyzing. This may involve:
Many free tools exist, but pay attention to their privacy and security practices before giving them your texts. Here are a few that function well with Carleton infrastructure.
Most plain text will need some cleaning and manipulation before it's useful for analysis. This can also be an iterative process as you move through your project. Here are common reasons for cleaning and manipulation:
You may find that the options here are not powerful enough for you. For example, maybe you want remove a list of language- or domain-specific Stop Words from your texts. Your librarian can help you identify lists of stop words, and Paula Lackie (plackie@carleton.edu) or the Data Squad for data-manipulation support.
Questions? Contact reference@carleton.edu
Powered by Springshare.