Mining Jobmine - Fall 2010 - Part 1

JobMine is the tool we use at the University of Waterloo for our co-op. In the system, there are thousands of job postings for various positions. Having access to a big database always gets my brain churning with possible alternative uses for the data contained within.

First, I should credit Lisa Zhang with the inspiration for this idea (and the title), she did some really cool stuff with JobMine and you should check her blog out.

The first thing we (students) look at when browsing JobMine are the job titles. So naturally the titles affect which jobs we click through to read and then apply to. The job titles have a lot of weight on our application process.

I should note that I only analyzed the jobs in my field (jobs in the Computer Engineering or Software Engineering categories). I also had some additional filters on so this is not exactly a perfect sample, but I think it’s interesting nonetheless.

So I went through the jobs, grabbed the title and ran a custom tokenizer to extract the grams. Here is a chart of the most popular words in job titles.

Click on the image for a bigger version

1-gram Frequency in Titles (Top 25)

And here are the top 100:

Click on the image for a bigger version

1-gram Frequency in Titles (Top 100)


So this interesting but not terribly surprising. It’s logical that the top word is “software” and the next “developer”. Something I found interesting was the choice of “engineer” versus “engineering”. Engineering occurs more frequent than engineer by a factor of almost 20 times. Technically in Ontario, having a job title of something engineer is against PEO and you could be prosecuted for it (according to my Engineering Ethics and Law class). Perhaps this is the reason for the difference. On a side note, “Software Engineering” has always looked awkward on a resume to me.

I was surprised by the low number of QA jobs, but then again, I think I had the filter to remove junior jobs and they tend to be more QA oriented.

There are definitely some interesting things that can be taken from this analysis but there is still much more to look at. My next article continues the analysis of titles with different n-gram sizes.