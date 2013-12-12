DataSift aggregates social data from sources like Twitter, Facebook, and blogs and lets you query it via a single API, but making effective use of that data is still a big problem for many businesses–until today.

“A lot of the trouble that you have when you try to integrate social data into an actual business,” says DataSift’s CTO and founder, Nick Halstead, “is what is the pivot point to connect the social data to the business behavior? So you need to know what the discussion actually means rather than just finding that somebody is talking about you.”

Every business is different and therefore may use the same data in a different way. Are you an airline which wants to automatically classify new customer conversations as queries, complaints, or urgent requests? Or a B2B business which wants to determine the profession and seniority level of Twitter followers in order to identify leads? DataSift’s professional services team often builds custom tools like Dell’s Social Net Advocacy for this very reason. Companies with in-house data scientists might do the same using DataSift’s data.

DataSift’s new intelligence engine, Vedo, allows developers with no data science background to build custom models of customers and conversations and use them to automatically classify the real-time social data available via DataSift’s API. Vedo can even build those models for you using Machine Learning.

A classification model automatically sorts new examples of an object like a tweet or Twitter profile into a number of classes. DataSift created 30 classification models and released the code so that they can be used by developers as examples. One of DataSift’s sample models takes every new message sent to an airline’s customer service account and categorizes it as a “query,” “rant,” or “rave” based on the words used in the tweet.

Another example distinguishes Twitter bots from humans. “Tweets always come from a source like TweetDeck,” says Halstead. “There’s about 80,000 of them. If you see a tweet coming from TweetDeck you know for sure that a human sent it. There’s no way to fake that easily. TweetDeck users, on average, follow a daily 24-hour volume curve. We built a model to identify bots by knowing when they are tweeting at high volumes out of sync with that 24-hour cycle.”

DataSift’s query language, CSDL (Curated Stream Definition Language), is used to develop custom classification models. For example, you can create a rule like “tweets which contain the word ‘help.'” You can then add a tag to the rule-like query which indicates that all matching tweets which match the rule are queries. Tweets which match a different rule might be classed as complaints.