Twitter wants your help fighting falsehoods. It’s risky, but it might just work

With Birdwatch, Twitter is trying an old weapon against misinformation—the crowd. But as any grizzled internet user knows, that won’t be easy.

Twitter wants your help fighting falsehoods. It’s risky, but it might just work
[Source photo: mostafa meraji/Unsplash]

Twitter has spent years and millions of dollars fighting falsehoods with a now well-worn arsenal—fact-checks and warning labels and context labels and algorithmic tweaks and bans and bans on the President of the United States—but this week the company unveiled a totally new weapon: us.


Select Twitter users who participate in the new program, called Birdwatch, can identify tweets they believe are misleading, write notes that provide context to the tweet, and rate the quality of other participants’ notes, the company says. The eventual goal is to add community-written notes directly beneath Tweets, through what Twitter calls “consensus from a broad and diverse set of contributors.” (The company is starting with 1,000 qualified users, a Twitter spokesperson said, and aims to eventually expand the program to 100,000 people; you can apply here, and see tweets that have recently received notes, and read some of their notes, here.)

It’s actually an old idea, harnessing the wisdom of the crowd that powers Wikipedia and a range of decades-old digital forums. But Twitter is not Wikipedia, and within minutes of the debut of Birdwatch many people, well, took to Twitter.

“Unlike Wikipedia, Twitter is not one cohesive community, and users are not dedicated to a common purpose of sharing knowledge,” contended Tiffany C. Li, a professor at Boston University School of Law. “Imagine the harassment and disinfo you already see in replies and QTs, but transposed to a ‘fact check’ context!” Others pointed to the obvious: “Free labor,” wrote Jennifer Grygiel, an assistant professor at the Newhouse School of Public Communications at Syracuse University. “Twitter is a Corp . . . this is crowdsourced labor and people should be paid.”


It will be messy, even Twitter admits. But given how existing interventions have fared, it might be worth trying. And in some cases, it might just work.

In a sense, Birdwatch is leveraging typical behavior on Twitter, where users already routinely fact-check viral tweets. Countering information with more information is a response that CEO Jack Dorsey, like Facebook’s Mark Zuckerberg, has long insisted is one of the best ways to deal with viral lies. Over the past year, however, Twitter has ramped up its interventions, placing labels, contextual notices, or penalties on highly trafficked tweets, with the help of Twitter employees, “trusted partners,” and algorithms. The focus is on manipulated media, COVID-19 misinformation, and content that violates its civic integrity policy, said the Twitter spokesperson.

But Twitter’s response to the historic, chaotic information storm of 2020 showed that fact-checks and labels are only so effective, and are usually implemented too late, after a piece of content has already spread a lie. At the same time, researchers and journalists have demonstrated how Wikipedia and other crowdsourcing communities are surprisingly effective at correcting misinformation quickly.


This dynamic exists on Twitter, too: In one recent set of experiments examined by Twitter’s Birdwatch team, sociologists at New York University and Georgia Tech surveyed more than 150,000 tweets related to COVID-19, a sizable portion of which contained falsehoods, and found a “quick response and a corresponding increase in tweets that refute such misinformation.”

On a more basic level, a community-based system boasts something that a Facebook-like system doesn’t: It can scale.

In another study awaiting peer review, researchers at the Massachusetts Institute of Technology found that the crowd was as reliable at correcting misinformation as the professional fact-checkers that Facebook pays millions of dollars every year. (They also found that “layperson cognitive reflection, political knowledge, and Democratic Party preference are positively related to agreement with fact-checker ratings.”) Facebook’s third-party program has had its share of problems, including meddling by Facebook itself on behalf of powerful publishers and politicians. But on a more basic level, a community-based system boasts something that a Facebook-like system doesn’t: It can scale.

A community-driven system could also allow for more transparency and trust—valued commodities at a time when Big Tech is trying hard not to look like Big Brother. Twitter is building Birdwatch “in the open,” releasing its code and data on GitHub, and says it has consulted with reporters and researchers, including the University of Chicago’s Radical Innovation for Social Change center, and conducted interviews with more than 100 politically diverse users.


“We’ve heard again and again that many people feel context on tweets would be more impactful if it came from the community rather than Twitter or any singular institution,” the company says. “So if this works, we believe it could have a real impact.”

How Birdwatch works

At first, Birdwatch notes will not automatically lead to labels on tweets. Twitter says they are only meant to complement its existing interventions against misinformation, providing additional context to tweets that might not break Twitter’s rules or to those that don’t receive as much widespread attention.

That means that initially, Birdwatch notes will not be directly visible on Twitter for users outside of a pilot group of users, though they will be visible on the Birdwatch site. For now, Twitter says each set of notes will also have its own permalink, which users can include in replies to tweets.


Users who are part of Birdwatch will see a new option in any given tweet’s drop-down menu, which will allow them to add a note.

After reminding participants about Birdwatch values, like building understanding or acting in good faith, users are required to answer multiple choice questions which are “designed to understand, in a structured way, why a Tweet might or might not be misleading.”


In a text field, Birdwatchers can explain their reasoning. “Citing sources is helpful,” Twitter explains on a how-to page. The company insists Birdwatch is “not a place for quick dunks, personal opinions, or insults.” Once published, a note can be upvoted by other Birdwatchers.

[Screenshot: Twitter]
That data, which Twitter also hopes to make public, will be used to power future reputation models that can recognize users “whose contributions are consistently found helpful by a diverse set of people.”


The risk of bias

Diverse is the key word. Birdwatch is allowing users who meet certain criteria (you can’t have recently violated Twitter’s rules, for instance) but who also represent a broad array of the platform’s users. The “more diverse the community,” says Twitter, “the better Birdwatch will be at effectively addressing misinformation.”

And the less biased it might be. If the validity of a tweet is measured by the community’s favorite appended note, what if the Birdwatch community happens to be made up of, say, mostly coastal Twitter users with blue checkmarks?

“There are so many fundamental problems with [Birdwatch],” warned tech entrepreneur Shireen Mitchell, a member of the Facebook accountability group the Real Facebook Oversight Board. Having the people “decide what’s a fact or not,” she wrote in a tweet, is like “W[estern] academics that believe they have more ‘truth’ about slavery than Black Academics who are descendants.”


Bias is already a serious problem on Wikipedia, where a predominantly white Western male editor community has built an incredibly detailed encyclopedia that’s full of giant blind spots. The Wikipedia community has spent years trying to fix itself, which suggests that Twitter will need to work hard to prevent bias from creeping in from the beginning. Even the appearance of bias could imperil the project: just see the tweets of the early critics. “Anyone think Twitter will actually use this feature fairly?” House Judiciary Committee Republicans tweeted. Representative Ken Buck, Republican of Colorado, tweeted: “Crowd-sourced censorship . . . what could go wrong?”

J. Nathan Matias, an assistant professor of communication at Cornell University who has studied what he calls “civic labor” on digital platforms, underscores the value of diversity. “The question of who participates will determine the success/failure of Birdwatch,” he tweeted. On “citizen science” platforms like Wikipedia, “if you just let people self-select, you will end up with an non-representative community that creates long-term problems. . . . Creating diverse, equitable communities of volunteers takes a lot of listening, organizing, & ongoing support.”

The threat of brigading

If the Birdwatch communities are too homogenous, political or ideological bias won’t be the only problem; the system will also be vulnerable to coordinated manipulation campaigns, or brigading.


“Say one extremist forum REALLY hates one true tweet by a specific user,” tweeted NBC News’s Ben Collins, who coauthored a story about Birdwatch.

“They all sign up en masse and drown out good info. As this rolls out to more people, I didn’t see defense against that.”

On the same week that hedge funds were gamed by armies of Redditors, Twitter says it’s counting on these attacks. “We expect such attempts to occur,” it says in the project notes.


To fend off takeovers, the company says it will focus on populating Birdwatch with recently active users and “those that tend to follow and engage with different tweets than existing participants do—so as to reduce the likelihood that participants would be predominantly from one ideology, background, or interest space.”

Twitter has been vague about how it will manage the process of choosing participants or preventing brigading, but according to its FAQ, it will be “experimenting with different mechanics and incentives.” For example, on a page describing “Challenges,” it said: “Birdwatch can factor in not just a count of how many people said a note is helpful but also the diversity of those inputs. Additionally, we plan for Birdwatch to have a reputation system in which one earns reputation for contributions that people from a wide range of perspectives find helpful.”

The risk of harassment

Twitter says it will monitor the Birdwatch community closely and remain in regular communication with participants, in part through a dedicated community manager. But it’s not clear yet how Birdwatch will address another of its biggest immediate risks: assholes.


“How will Twitter protect those who are targeted by people who are unhappy with being fact-checked?” asks Molly White, a software engineer and longtime Wikipedia editor.

The concern is particularly acute on Birdwatch, where a user’s Twitter handle appears above their note. If names are tied to these fact-checks, White foresees harassment problems. “We all saw Donald Trump level attacks at Twitter and Jack Dorsey when they began to mark his tweets as containing misinformation; can you imagine if instead the person to mark the tweet had just been some average person using her real name?”


Can you imagine if instead the person to mark the tweet had just been some average person using her real name?”

Molly White

The Twitter spokesperson told me in an email that the company is aware of the risks. Harassment “is something to which we will be paying close attention throughout the pilot,” they said, and as the pilot progresses, the company aims to build unspecified “safeguards and protocols.”

Twitter says that linking Birdwatch notes to their authors’ Twitter accounts is helpful for building trust. “In concept tests, people consistently told us that they found notes more helpful when they can see who wrote the note (vs. it being anonymous).” Anyway, it says, Birdwatch participants can use any display name on their Twitter page, and are welcome to “use secondary accounts that use a pseudonym.”

White isn’t convinced. Over 13 years of editing and moderating on Wikipedia, she’s been harassed, doxxed, and threatened with lawsuits and violence. Wikipedia also doesn’t require real names, but White—who attributes the harassment to her power on the platform, her gender, and her focus on controversial topics like the Boogaloo movement—gave up trying to hide her real identity from the trolls years ago.

“My own experiences with Twitter’s handling of instances where I’ve been targeted and abused on their platform makes me extremely skeptical they can do this well,” she says.

Free labor

Even without harassment, Twitter faces a more fundamental quandary: How do you get Twitter users to give their free labor to begin with? Will Twitter users display the same motivation as editors of Wikipedia or Reddit moderators or the people who answer questions on Stack Overflow or Quora?

“I’m perfectly willing to provide my labor for free on Wikipedia because the Wikimedia Foundation is a nonprofit organization with a noble purpose,” White told me last year, when the Birdwatch idea was first floated. “If Google asked me to do something for them, I’d better be getting a paycheck.”

Of course, as evidenced by endless threads of Twitter replies, people’s impulse to correct each other can be strong, White says. “‘Someone’s wrong on the Internet’ is a powerful instinct.”

Still, she adds, “if Twitter is hoping that people will actively comb through a large portion of tweets besides the ones they are already seeing on their feed, or if they are hoping that they will do detailed fact-checks of less-obvious falsehoods, I think they may have some trouble achieving that.”

White can think of a few factors that motivate people to do free labor in online communities, like personal interest, connection to a topic or group or a sense of power.

If Google asked me to do something for them, I’d better be getting a paycheck.”

Molly White

An ideological goal can be powerful too: White points to Parler, the free-speech-friendly, conservative-backed platform that until last year recruited some dedicated users to help moderate content. “They may have felt that Parler’s existence was threatened by content going unmoderated, and that they were helping to keep it online by providing their help,” she says. (These volunteers may have also thought Parler couldn’t afford to pay them; in November, The Wall Street Journal reported that Republican megadonor Rebekah Mercer was bankrolling the project.)

But among Twitter users, White says, “I don’t think there are many people who are ideologically behind the idea of Twitter in the way that there are with Parler.”

Reputation-building can also be a strong motivator, thanks to point systems like Reddit karma, Stack Overflow answers, or Wikipedia edit count. Twitter could further amplify these factors using Birdwatch’s reputation system.

But as many have pointed out, the Wikipedia community is motivated by a goal that is harder to find on Twitter—a goal White says could be “the strongest draw of Wikipedia’s community, and what makes people willing to do enormous amounts of work for free.” That is, “the desire to make the world a better place by providing free knowledge.”

Then again, perhaps Twitter users will be motivated to pitch in because they know just how damaging misinformation can be.

Moderators for a corporate platform ought to be paid, Grygiel, at Syracuse, told me in an email. But, they conceded, “I think the public understands that they have to pick up the slack re: content moderation” from platforms, and that internet users are “doing so more and more as these toxic environments spill over into society—they are not simply contained on the internet.”


About the author

Alex is a contributing editor at Fast Company, the founding editor and editor at large of Motherboard at Vice, and a freelance writer and producer with a focus on the intersections of science, technology, media, politics, and culture.