Update: Following up on some useful feedback from Hacker News users, I tweaked the spying experiment today to be a bit more effective. The first criticism was that stating the model’s %-accuracy is useless because I reported no baseline. In other words, if the sample data I gave the model was 67% biased in favor of a particular gender, then a 67% accuracy would mean that the model was actually just turning out results based on the distribution of records in the sample set and not actually finding any statistical pattern. I went back and checked my training data and found that it was roughly even, but did favor males by about 4%, meaning my real accuracy is closer to about 63%, which is only 13% better than flipping a coin.

Similarly, the second criticism focused on the fact that I didn’t do enough analysis on how good my model actually is given the limited criteria I gave it to train on. User quchen suggested verifying the model by increasing and decreasing the size of the training dataset and feeding it deliberately false data. When I did this, I noticed one thing in particular: the timestamp of the call did very little to effect the accuracy of the model. Looking at the output from the Prediction API’s “analyze” method, I found out why: Google was treating each timestamp as a single token, and because it’s pretty unlikely to receive two calls or text messages at the exact same time, it found almost no connections between the items in this category.

In retrospect, this was an obvious mistake. Even without understanding the nuances of the algorithms behind these models (and honestly, I really don’t), it makes little intuitive sense to look for a pattern in the exact timestamp of a call. Where they may be a pattern, however, is around what times of the day or month or even hour each gender prefers to call. To rectify this problem, I split up the timestamp into four different tokens: the day of the week, day of the month, hour, and minute of the call.

This time when the model came back, it found lots of connections between gender and the various times they prefer to call or text, and the accuracy improved, too: Now I can say that my model can predict the gender of a caller with 80% accuracy. That’s still not anything you’d want to deploy in a production system, but it is enough to suggest that there’s a pattern, at least among the people unfortunate enough to interact with me via voice and SMS.

Last week, we learned that the NSA has been secretly collecting billions of phone records from major U.S. providers and mining the data, ostensibly to look for terrorists and other threats to national security. To justify these programs, the government is pointing to the fact that they don’t collect the contents of these calls and text messages, just “metadata,” and that to associate this data with real people, they need a warrant.

Here’s the catch: there appears to be nothing that says the government can’t use full, non-anonymous datasets to mine this metadata for pure gold. We’ve been covering data science in business at Co.Labs, but if you need a refresher, here’s how basic data-mining typically works: you take a set of data that contains examples of the types of patterns you’re looking for, and use it to train a computer to look for similar patterns in another set of data.

These techniques are now so widespread that performing simple data-mining on an individual level is becoming much easier, thanks to numerous prediction libraries available in just about any programming language and powerful cloud-based tools like Google’s Prediction API. To understand exactly what the government can do with this metadata, I decided to beat the NSA at its game by spying on my own data.