Many of our customers receive thousands of mentions per day — far more than can be read and understood in aggregate. The recently launched Trends Report was created to provide an easily consumable digest of this data. The report displays who is talking about your brand and what they are talking about.
To process the incoming Twitter mention data, we created a finagle or thrift service that extracts topics and supporting words from tweet text with a natural language processing algorithm. To aggregate the data we leveraged our Hadoop cluster and wrote several pig scripts that run each night, and to serve the data we added functionality to our existing Django API engine.
Extracting Topics
The core of the project was parsing tweet text into topics, and words that supported them. If you tweeted “@SproutSocial, Your software is awesome,” we wanted “software” to be a topic.
Further, we wanted to know what people are saying about the topics and hashtags, so we want to capture “awesome” as a “cloud word” — meaning it appears in a word cloud around the topic. It was important that this algorithm extend to any line of business, so if someone is discussing a menu item at a restaurant, …read more
Source: Sprout Social