The purpose of this blog post is to provide information and examples for getting Twitter data for academic research. Specifically, I describe how to collect tweets according to specific criteria via the Twitter API using the Python library tweepy. The goal is to use tweepy for retrieving specific tweets and their properties (such as length, number of retweets, number of likes) for a specific hashtag. A comprehensive overview for the tweepy package describing all the functionality is given in the tweepy documentation. Finally, a few examples in an economic framework are given for the application of this kind of analysis.
Access to Twitter API
Twitter’s API (short for “Application Programming Interface”) allows free and convenient access for many external application depending on various programming languages. First it is necessary to apply for access for the Twitter API and tools and create a developer account. Getting access to the Search API, however, is quite easy.
- Go to: http://apps.twitter.com
- Login with your created developer account
- Create a new application (for the search queries)
- Fill out the form and create an access token
After setting up an application of your choice, the next step is to access the Search API using Tweepy. One important point to mention is that I use the Standard API access which provides limited access to just a random sample of tweets from the past six days. The Premium and Enterprise search versions, however, allow access to all tweets starting since the beginning of Twitter. Unfortunately, these API versions are not for free. Moreover, the standard Search API has a rate limit regarding the number of queries within a 15 minute interval. The search API allows 450 (180) queries every 15 minutes with application-only authentication (user authentication).
For every platform the latest version of python can be downloaded from python.org. A convenient way to install Python libraries is the python package management system called pip. This Pip manager is preinstalled when at least python 2.7 is running on your machine. Otherwise pip can be installed via Mac terminal or Windows command line:
pip install tweepy
There might be some package requirements for Tweepy, however, these can also be installed via pip.
Using the created Twitter Application in combination with the Tweepy package allows simple search queries via the Twitter search API. Twitter’s documentation provides a good overview on how the search queries look like and how to access the API. The following code, however, is provided to use the search function directly. Of course, the code should be edited according to your requirements. But first a few explanations.
The code collects 10 tweets (items) depending on your search query (targettwitterhashtag). These tweets (among other data such as for instance tweet length (dbtweetlen), tweet date (dbtweetdate), the number of likes as well as the number of retweets (dbtweetretweets)) are collected within a SQL database. Please note, that authentication information such as consumer_key, consumer_secret, access_token and access_token_secret must be filled in by yourself. Finally, TextBlob is used for a simple application in natural language processing (NLP). Using tweet texts a sentiment analysis is performed and the polarity value is also saved in the SQL database (polarity). Code for this project is provided in the gist below:
Please feel free to contribute to this project and reach out to me. For example, if you have any ideas for applications in an economic context or on how to make the code more efficient.
Conclusion and application examples
This guide describes how to set up a Twitter application and how to install and use Tweepy to access the Twitter Search API. There is indeed much room for other and/or more advanced applications, in which the code should be altered or enhanced. In an economic context, there are already quite a few paper taking advantage of twitter or other internet data. For example Gans et al. (2020) who investigate twitter interactions of airlines and costumers in different market forms. Bailey et al. (2018) use Facebook data to construct a social connectedness index. Shen et al. (2019) use the number of tweets as a measure of investor attention in trading Bitcoins. These are, however, just a few recent examples which might stand for a rather new strand of literature utilising internet data in economic research. Finally, for an overview on using internet data for economic research see e.g. Edelman (2012). Hopefully this guide has provided some new information and description.
- Bailey, M., Cao, R., Kuchler, T., Stroebel, J., & Wong, A. (2018). Social connectedness: Measurement, determinants, and effects. Journal of Economic Perspectives, 32(3), 259-80.
- Edelman, B. (2012). Using internet data for economic research. Journal of Economic Perspectives, 26(2), 189-206.
- Gans, J. S., Goldfarb, A., & Lederman, M. (2020). Exit, Tweets and Loyalty. American Economic Journal: Microeconomics, forthcoming.
- Shen, D., Urquhart, A., & Wang, P. (2019). Does twitter predict Bitcoin? Economics Letters, 174, 118-122.