Twitter Sentiment Extraction

In Testing - Analyzes tweets for positive and negative customer service experiences

##What is it?## The Twitter Sentiment Extraction project is a tool that, given a tweet, will tell you whether it is positive or negative. Most of my testing has been in the area of tweets about positive and negative customer service.

One great application of the tool would be for a company to watch for tweets containing their name and alert for negative customer service experiences. Then someone from the company could investigate and try to track down the user and fix the problem.

##When## In late 2010 I had been researching Natural Language Processing (NLP) and how I could apply it. I’ve been reading the NLP with Python Book and it has been giving me some great ideas.

One of the really interesting parts of NLP for me is sentiment classification. I wanted to know how accurate classification of sentiment could be.

##Technology##

NLTK (Natural Language Toolkit)
Python for the glue
The twitter search api

##Details## The main component that I was working on was the feature extractors. These are functions that take in a raw tweet and output features of the tweet. I’ve tried several different combinations of feature extractors, some ideas I used:

n-grams
stemmers
feature limiting

##Success?## So far with only 300 tweets of training data I was able to achieve about a 81% accuracy of classify sentiment. I think with more training data the accuracy would increase.

##Improvements## I think the biggest issue with my tests was the limited amount of training data. I had to manually classify the tweets for the training data and 300 was enough for my patience.

My next step is to get more training data, perhaps using Amazon’s Mechanical Turk to get much more training data.