##What is it?## HighPi is my live and working implementation of a URL shortening service.
The URL shortener takes a long url like http://www.google.com/search?q=stephen+holiday&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a to a short URL like http://highpi.com/1o.
This is really useful for giving a link to a friend over the phone or writing it down. URL shorteners are also really useful in Twitter so that users can send links in less than 140 characters.
##Why## I wanted to experiment with key-value stores and get some real data for an upcoming analytic project.
A URL shortener provides an interesting yet simple problem, take in some arbitrary data and get a key which to return the data. URL shorteners also give the operator a lot if interesting information in terms of which links are popular on the internet now.
##When## July 2010, during my 1B School Term
##Technology## The major technology used was Cassandra for the key-value store. Since you will only be accessing the url with a key and not doing operations on a set of URLs, a hashtable/lookup database is perfect solution.
I choose Cassandra since it is fast and I wanted to use it in a future project so I felt this project would allow me to get my feet wet. It is also distributed which allows for multiple servers to serve the redirect.
One main issue with this type of project is generating the unique but short key for the url. I could have easily just hashed the URL with md5 or something similar but this would have created large URLs.
Instead, I wanted to create the smallest URLs possible, and the size of the URLs to increase if more were needed. For this I implemented a ticketing system similar to what flickr uses for their unique keys. So, I created a table in MySQL that had all of the URLs and the associated unique key. This was just an auto-increment counter. So the number grew with the number of URLs. Since the number of people creating URLs at any given time is much lower than the number of people accessing URLs using the Cassandra lookup, redirects are still fast and generation of URLs is still fast.