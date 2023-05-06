A new paper suggests that the field of bot detection is based on a flawed premise due to poor-quality original data.

The research, presented this week at the Web Conference (where it was awarded best paper), found that bot detection tools can rely on funky, flawed data sets that replicate mistakes made within one another, rather than trying to actually accurately identify bots. Zachary Schutzman, a researcher at the Massachusetts Institute of Technology, and his colleagues first investigated the accuracy of bot detection tools when they themselves wanted to analyze conversations on Twitter, but needed to strip their dataset of bot-generated content. They found the existing range of bot detection tools weren’t very good. “We downloaded one of the [bot] datasets from a website, trained a simple model in Python, and got, like, 99% accuracy,” says Schutzman. The team’s first reaction was that they had done something wrong: Their simple model couldn’t possibly be as accurate as the complicated neural networks their peers had deployed. “It turned out we didn’t make a silly mistake: A very simple model did work very well on this data,” he says.

They thought maybe it was a problem with the specific training data they had downloaded, which had been used on one bot-detection model. So they tried another data set. And got the same result. “Right off the shelf, we were getting a model that was getting a 95 or 98% accuracy,” he says. Their super-simple attempts at bot classification should not have come anywhere close to matching the accuracy of highly complicated machine learning methods, which are in vogue to detect bots. They set out to understand why. They found numerous glitches with the data set collection and labeling. In trying to favor the whizz-bang of new AI technology, past researchers had in fact made the decisions their models make fiendishly simple. It turned out if the account had ever liked a tweet, it was labeled as a human in the data set. If it had never liked a tweet, it was a bot in the data set. “We realized that this was a systemic issue in the data sets that are commonly used for bot detection,” Schutzman says. The models developed to detect bots may have been very complicated. But the underlying data they were trained on was so simple. And because of the way academic research works, those data sets are used in multiple models, replicating the errors along the way.

