Can Twitter predict the elections?

Peaks and Persistence: Modeling the Shape of Microblog Conversations, by Shamma et al, presents different methods for finding temporal topics from Twitter streams. In particular, the paper presents two key metrics – peaky topics that show highly localized, momentary terms of interest, and persistent conversational topics that show less salient terms which sustain for a longer duration. The paper shows that the textual content of tweets can reveal a great deal about the structure and content of the event as well as relative level of interest that individual moments generate.

But can the temporal evolution of the textual content of tweets be used to predict the upcoming election? Twitter has released an interesting application called the Twindex, or the Twitter Political Index, on their homepage.

The Twindex shows both Barack Obama and Mitt Romney’s daily tweet index on a historical timeline.

Details behind the Twindex are rather opaque – but speculation says the index is composed by a candidates’ daily tweet volume and the textual sentiment belonging to those tweets.

A simple approach for a student project may be calculating and comparing the Σ[tweet sentiment] for each tweet about a particular candidate, where [tweet sentiment] is 1 if the tweet is positive and -1 if the tweet is negative. And perhaps normalize the tweets by user – so that a single spammy user can’t greatly affect the index (maybe by applying diminishing returns to each tweet from a single user after his first daily tweet – such that the first tweet adds +1, the second tweet adds +0.5, the third tweet adds +0.25, and so on…).