Research Guides: Gould Data Knowledge Base: Twitter API Tutorial

LibKey Nomad

Twitter API using Python

You can use the Twitter API to retrieve recent tweets from users and retrieve tweets with certain hashtags. This guide goes through the basic process of using Python to retrieve information from the Twitter API .

By the end of this guide you will be able to:

Retrieve the most recent tweets by a user.
Retrieve a list of most recent uses of a certain hashtag.

To use the twitter API with Python you will need a few things which will be explained below.

A Twitter account that connects to your phone.
The python-twitter library.
The Python language.

Consider Limitations of Twitter API Data

There are several limitations to the Twitter API unless you are willing to pay for data.

You are not able to retrieve tweets older than 7 days.
When making large requests, you will only receive a sample of all of the available tweets in your given time period.
You can only query the API a few times per given time window (Twitter left this vague, but there is a limit).

It is important to note that there are some Twitter datasets that were made publicly available which people/organizations had purchased. There are fewer of these, however, they might fulfill your needs. It is worth doing a Google search.

Twitter Documentation
The Twitter Documentation has everything you need to know about how to interact with the Twitter API.

Consider Alternative Twitter API Tools (Instead of Python)

Desktop App - Hydrator
Hydrator is a desktop application created by the Documenting the Now project. It is used to reconstitute Twitter datasets using tweet identifiers. Since Twitter's Terms of Service do not allow the full JSON for datasets of tweets to be distributed to third parties, tools like Hydrator helps turn these tweet IDs back into JSON.
R - RTweet
Rtweet is a Twitter client using R. The R community is very robust, so there are a number of tutorials online for using rtweet (and a documentation website: http://rtweet.info/). The advantage of using rtweet is that it brings your Twitter data into R, where you can readily clean, subset, analyze, and graph your data. The downside is that it's not set up for systematically gathering and saving tweets.
Command Line - Twarc
twarc is a command line tool and Python library for archiving Twitter data. It has a number of features and filter options and good documentation. Note that the data is stored as line-oriented JSON. twarc was developed as part of the Documenting the Now project.
Twarc Tutorial
Source
These resources were made available to us by Penn State University Libraries.

API Key

To retrieve an API key:

https://twitter.com/login?redirect_after_login=https%3A/apps.twitter.com/
Sign in with your Twitter account
Click "Create New App"
Name your app
For the "website" part just write something that begins with "https://" and ends with ".com"
When you have your app created go to "Keys and Access Tokens" you should record your consumer key, consumer secret, access token, and access token secret. Make sure you have those accessible as you proceed.

Installing the Python Library

We are now going to use the command line to install a python-twitter library that allows us to use Python on the Twitter API.

If you are using Windows, open Command Prompt. If you are using Mac OS or Linux, open Terminal (you can open the terminal by typing in into spotlight) Now type:

> pip install python-twitter

This will install the python-twitter library that creates an interface between the API and your Python platform.

Using Jupyter Notebook

Its necessary to install Jupyter Notebook as an interface to use Python

To install Anaconda, go to the website and follow the installation instructions for your operating system. Unless you have a specific reason not to, you should install the version that includes the latest version of Python. Also, make sure you are installing the full Anaconda package and not the “Miniconda” version, which doesn’t include any libraries. Once everything is installed, you should be able to open a Jupyter Notebook. Use the already open command line and type:

> jupyter notebook

and press enter. Some stuff should appear in the command line window, and then your default web browser should pop up and begin loading what appears to be a website. This “website” is actually the Jupyter interface.

Note: if the browser opens to a page requesting a token, go back to your command line window and find the line that says “Copy/paste this URL into your browser when you connect for the first time, to login with a token:” followed by a strange-looking URL. Right-click this URL and select “copy link address”, then simply paste it into your browser’s address bar and hit Enter.

You should see a list of your files and folders at the “home” location on your computer. Using this interface, you can browse through your files and open existing Python files or create new ones. We are interested in creating a notebook that we can use to request and work with API data, so navigate to a place where you want to store your notebook and then click “New” -> “Python 3”. This should open a new tab with your new notebook loaded.

Using the Twitter API

This will walk through setting up a Jupyter Notebook to work with the API, accessing the API, and pulling some basic information.

First, open a new Jupyter Notebook and type the following lines into one cell, then press Shift + Enter to execute (i.e. run) all of them at once:

import sys import operator import requests import json import twitter

These first few lines give you access to various packages that are needed to handle APIs.

To gain access to the Twitter API, you'll need those Twitter credentials you retrieved earlier. They are probably hard to remember, so give them a recognizable name. In a new cell, enter the following lines (with your own information filled in where indicated by brackets), then execute:

twitter_consumer_key = '[your consumer key]' twitter_consumer_secret = '[your consumer secret]' twitter_access_token = '[your access token]' twitter_access_secret = '[your access secret]'

Now send this information along to Twitter to gain access to the API. Copy and paste the following line into a new cell, then execute:

twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

Now that you have access, you can pull various information from Twitter. Try accessing the last 200 tweets from a particular user. First, choose a handle from which to pull the tweets and then enter/execute the following line in a new cell:

handle = '[Twitter handle of the user you want to retrieve tweets from, without @]'

Now use the API to retrieve the tweets:

statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False)

The extra stuff inside the parentheses here limits the tweets retrieved to 200 and ensures that retweets are not retrieved. You can switch the tweet count to whatever number you want. However, keep in mind that the more data you retrieve, the slower the program will run.

Finally, enter the following lines to display the results (note that if you are using Jupyter Notebook, the indentation should be inserted automatically if you end the first line with a colon and press Enter):

for status in statuses: print(status.text)

You should get the most recent 200 tweets from the handle you entered.

Retrieving Hashtags

To retrieve tweets with a certain hashtag:

hashtags_to_track = '[Some hashtag]' stream = twitter_api.GetStreamFilter (track = hashtags_to_track) for line in stream: if 'in_reply_to_status_id' in line: tweet = twitter.Status.NewFromJsonDict(line) user = tweet.user.screen_name tweet_text = tweet.text print('User: ' + user[0] + '\t Tweet: ' + tweet_text + '\n')

(Note: this assumes you already have access to the API; see the box above first.)

Converting JSON to CSV

To save the tweets you collected in the box above to a csv file:

import csv

with open('tweets.csv', 'w+') as csv_file: csv_writer = csv.writer(csv_file) for line in stream: if 'in_reply_to_status_id' in line: tweet = twitter.Status.NewFromJsonDict(line) print(tweet.id) row = [tweet.id, tweet.user.screen_name, tweet.text] csv_writer.writerow(row)

To open CSV in excel:

Open excel and click import.
Import the file that you named (search for csv and it should come up).
Choose your file. You will then get the option of delimited or fixed width—chose fixed width. Then just keep clicking next until you get to finish.