Skip to Main Content

Gould Library staff continue our commitment to support the teaching and research needs of the Carleton community. Information on remote access to library resources and services will be updated regularly on the Remote Resources and Guidance for Library Users page and this FAQ. Please do not hesitate to contact us if you need additional assistance.

Gould Data Knowledge Base

Getting Started

Voyant is a web based tool. Just go to voyant-tools.org and get started.

From the search box on Voyant's home page, you can enter a URL to analyze the text on that page, or just paste in text. Alternatively, you can upload files from your computer. 

Analyzing Tool: Voyant

Acceptable text formats:

  • Plain text
  • HTML
  • XML
  • PDF
  • RTF
  • Microsoft Word

Limitations:

  • Using large texts/multiple texts can obscure trends
  • Can be a little sluggish, which makes it harder to learn
  • If using PDFs, analysis is only as good as the optical character recognition

Getting Oriented

Getting Oriented

Each section of the results page is called a panel. Each panel can be customized and changed. There are many different tools available than are first visible. To export, change, or learn more about a panel, hover your mouse over the grey line at the top and menu items will appear.

  1. Export options for the current view
  2. Change the panel to a different analysis tool
  3. Change options (not always available)
  4. Hover over the question mark and a brief description of the current tool will appear.

Tutorial

Step by Step Tour

  1. (not shown above) From the home page, enter a URL, paste in text, or upload files. To upload, click the upload button. Browse your computer to pick one file or multiples using ctrl to select. You can also zip your files together in this step to create one corpus. Take note of what you upload (see step 3). 

  2. Cirrus: The word cloud tool. Hovering over words shows a frequency count of that word.

    1. Use the terms slider to increase or decrease the number of terms that appear in the word cloud.

    2. Use the export button to export the cloud as a direct link, embed code, or save as an image.

    3. The options on/off toggle allows you to exclude words.

    4. Scale allows you to choose between looking at the whole corpus or an individual document.

  3. Clicking on a word in the word cloud displays that word in theTrends panel to the right.

    1. The trends box shows relative frequencies of the words. It is useful looking for patterns or anomalies. Notice that the graph shows the beginnings of names of the documents. Make sure your document names can be distinguished by their first characters or you will have a hard time using this feature. 

  4. Clicking on the Terms button of the word cloud panel gives a table view of term counts from the entire corpus.

  5. Clicking on the Links button shows keywords in green and words in proximity in red. Hovering over the green keywords shows frequency. Hovering over the red words shows how often they are in proximity to the main words.

  6. The Summary panel below gives an overview of words used in the text of the corpus, including document lengths, vocabulary density, most frequent words by document or over the entire corpus. Clicking on a document here will display it in the Reader panel. Clicking on a word will display it in the Trends panel and highlight it in the Reader panel. 

  7. Search boxes can be used to search in that panel. Press the help icon to view a menu explaining the syntax of complex searches (ex. search*: match terms that start with search such as search, searches, searching; coat|jacket: match synonymous terms separated by pipe as a single term, e.g. get a count of how many times coats or jackets were mentioned).

  8. The Contexts panel shows words with surrounding words. Within this window, use the search box to search by terms, add or remove context with the slider, or use the plus sign to see more of the text in context.

  9. The Bubblelinestool shows bubbles that correspond to how frequently and at what time words appear per document.