You can use the Twitter API to retrieve recent tweets from users and retrieve tweets with certain hashtags. This guide goes through the basic process of using Python to retrieve information from the Twitter API .
By the end of this guide you will be able to:
To use the twitter API with Python you will need a few things which will be explained below.
There are several limitations to the Twitter API unless you are willing to pay for data.
It is important to note that there are some Twitter datasets that were made publicly available which people/organizations had purchased. There are fewer of these, however, they might fulfill your needs. It is worth doing a Google search.
To retrieve an API key:
We are now going to use the command line to install a python-twitter library that allows us to use Python on the Twitter API.
If you are using Windows, open Command Prompt. If you are using Mac OS or Linux, open Terminal (you can open the terminal by typing in into spotlight) Now type:
> pip install python-twitter
This will install the python-twitter library that creates an interface between the API and your Python platform.
Its necessary to install Jupyter Notebook as an interface to use Python
To install Anaconda, go to the website and follow the installation instructions for your operating system. Unless you have a specific reason not to, you should install the version that includes the latest version of Python. Also, make sure you are installing the full Anaconda package and not the “Miniconda” version, which doesn’t include any libraries. Once everything is installed, you should be able to open a Jupyter Notebook. Use the already open command line and type:
> jupyter notebook
and press enter. Some stuff should appear in the command line window, and then your default web browser should pop up and begin loading what appears to be a website. This “website” is actually the Jupyter interface.
Note: if the browser opens to a page requesting a token, go back to your command line window and find the line that says “Copy/paste this URL into your browser when you connect for the first time, to login with a token:” followed by a strange-looking URL. Right-click this URL and select “copy link address”, then simply paste it into your browser’s address bar and hit Enter.
You should see a list of your files and folders at the “home” location on your computer. Using this interface, you can browse through your files and open existing Python files or create new ones. We are interested in creating a notebook that we can use to request and work with API data, so navigate to a place where you want to store your notebook and then click “New” -> “Python 3”. This should open a new tab with your new notebook loaded.
This will walk through setting up a Jupyter Notebook to work with the API, accessing the API, and pulling some basic information.
First, open a new Jupyter Notebook and type the following lines into one cell, then press Shift + Enter to execute (i.e. run) all of them at once:
import sys
import operator
import requests
import json
import twitter
These first few lines give you access to various packages that are needed to handle APIs.
To gain access to the Twitter API, you'll need those Twitter credentials you retrieved earlier. They are probably hard to remember, so give them a recognizable name. In a new cell, enter the following lines (with your own information filled in where indicated by brackets), then execute:
twitter_consumer_key = '[your consumer key]'
twitter_consumer_secret = '[your consumer secret]'
twitter_access_token = '[your access token]'
twitter_access_secret = '[your access secret]'
Now send this information along to Twitter to gain access to the API. Copy and paste the following line into a new cell, then execute:
twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)
Now that you have access, you can pull various information from Twitter. Try accessing the last 200 tweets from a particular user. First, choose a handle from which to pull the tweets and then enter/execute the following line in a new cell:
handle = '[Twitter handle of the user you want to retrieve tweets from, without @]'
Now use the API to retrieve the tweets:
statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False)
The extra stuff inside the parentheses here limits the tweets retrieved to 200 and ensures that retweets are not retrieved. You can switch the tweet count to whatever number you want. However, keep in mind that the more data you retrieve, the slower the program will run.
Finally, enter the following lines to display the results (note that if you are using Jupyter Notebook, the indentation should be inserted automatically if you end the first line with a colon and press Enter):
for status in statuses:
print(status.text)
You should get the most recent 200 tweets from the handle you entered.
To retrieve tweets with a certain hashtag:
hashtags_to_track = '[Some hashtag]'
stream = twitter_api.GetStreamFilter (track = hashtags_to_track)
for line in stream:
if 'in_reply_to_status_id' in line:
tweet = twitter.Status.NewFromJsonDict(line)
user = tweet.user.screen_name
tweet_text = tweet.text
print('User: ' + user[0] + '\t Tweet: ' + tweet_text + '\n')
(Note: this assumes you already have access to the API; see the box above first.)
To save the tweets you collected in the box above to a csv file:
import csv
with open('tweets.csv', 'w+') as csv_file:
csv_writer = csv.writer(csv_file)
for line in stream:
if 'in_reply_to_status_id' in line:
tweet = twitter.Status.NewFromJsonDict(line)
print(tweet.id)
row = [tweet.id, tweet.user.screen_name, tweet.text]
csv_writer.writerow(row)
To open CSV in excel:
Questions? Contact reference@carleton.edu
Powered by Springshare.