Stack Overflow Word Trends by Day

Here’s a quick hack to see how different words are used over time on StackOverflow.

Inspired by Google’s Ngram Viewer I hacked together this page to show how different words on Stack Overflow vary over time.

The y-axis is the percentage of times the word was used on that day. You can compare multiple words by using a comma.

Graph the occurrences of with smoothing of

 

The Backend

The backend is a Java app that uses Google’s LevelDB to store all the words and the frequencies per day. It’s not optimized very much at all and the data is definitely too verbose but it’s a quick hack.

I used the data from Stack Overflow Data Dump. I wrote some quick Python scripts to parse the data and get all of the words used in posts (questions and answers).

I have data for 2,3,4-grams but I haven’t loaded it up into the server yet because I want to clean up server and the client first. While this site is on Amazon’s S3, I’m a student and my server is single core Celeron with 2 GB of RAM.

Thoughts?

If you have any thoughts or suggestions, comment in the thread on HackerNews or send me an email ([email protected]).

This app got <iframe id='hnbutton' src='http://hnapiwrapper.herokuapp.com/button.html?width=120&url=http://stephenholiday.com/articles/2011/stack-overflow-by-day/&title=Word Trends from Stack Overflow' frameborder='0' height='22' width='90'> </iframe> on HN.