The Twitter Capture and Analysis Tool offers a combination of data extraction through the Twitter API and various options for data treatment, visualization and export.

Last modified: 13.04.2015

Developed by the Digital Methods Initiative in Amsterdam and the Center for the Study of Invention and Social Process (CSISP) at Goldsmiths, TCAT is an application built for extracting data from Twitter through its publicly available application programming interfaces (the stream API and the rest API). It requires setup on a server and we host a local instance through our participation in the Copenhagen Association for Digital Methods (CADM). Due to the high load TCAT operations puts on the server, it is unfortunately not possible for us to offer public access to the setup. The code, however, is open source and you can setup a version of your own.


Get the code


TCAT allows you to set up harvests querying for this data. Most often we use a keyword query, where we ask TCAT to return all tweets including specific words, combinations of words or whole phrases. TCAT cannot grab data backwards in time. It collects data in real-time from you press go till you stop.

TCAT provides a range of opportunities for reducing and analyzing the collected data. First, it has a data-selection interface, that allows you to carve out a subset of data that is of specific interest. The illustration below shows that you can, for instance, decide to work with data from a specific time interval, data that is written by a specific user or data that contains a specific URL. This is an important feature for working with focused subsets of your overall dataset.

The data-selection interface in TCAT


Secondly, TCAT provides you with a series of data-files that you can work with in other tools such as Excel and Gephi. One data-output is statistical overviews that, for instance, allows you to see how many tweets your data contains and who is tweeting the most. Another output is the actual tweets, which enables you to do a proper qualitative analysis of your material. A third output is network files to be analyzed in Gephi. For instance, you can extract a network showing user-post interaction and a co-hashtag network that illustrates how hashtags occur together in the tweets. Finally, TCAT offers two visualizations -the cascade and the associational profile - that allows you to explore temporal features in your data.

This is a screenshot of the cascade visualization. Each dot represents a tweet and each link represents a re-tweet. The distribution of the x-axis tells you the day of the tweet, whereas the distribution on the y-axis tells you who has tweeted.



A big advantage of TCAT is that it returns a lot of meta-data. Much more than, for instance, Netvizz. In TANT-Lab we have taken advantage of this data in, for instance, a project on the Eurovision Song Contest and a project on visions for the future of the schools in the municipality of Aalborg.

The screenshot below is from the latter project and it shows a network of people expressing their views about IT and digitalization in the schools. For instance, you can see that there is a group dedicated to the discussions about the importance of programming competencies and another group exchanging ideas about potential uses of Minecraft as a learning tool. This visualization illustrates how we have used TCAT to get a sense of current initiatives in a specific area as well as insights into the persons that are promoting these initiatives.

Network of people engaged in the Twitter- dialogue about the role of IT and digitalization in the Danish schools.