Using Twitter, a pair of London-based academics have created a map of the city’s linguistic diversity. The study, over a six-month period, analyzed around 3.3 million GPS-located tweets, putting them through Google Translate, to find out which languages are most prevalent in certain areas. The home language dominated, with over nine of every 10 tweets in English, followed by Spanish, French, and Turkish. Most visible were the pockets of French-language Twitter users.
One of the authors, student Ed Manley, who thought up the idea with Professor James Cheshire as a distraction from his PhD, reminds readers of just how skewed Twitter data can be. “In total, 92.5% of tweets are detected as English, far above existing estimations (60%) of English speakers in London.”
One amusing little anecdote does arise from the study. Tagalog, a language from the Philippines, was originally identified as one of the most tweeted languages. On closer inspection, the researchers discovered that English lingo such as lolololol and hahaha was being misclassified.