The study found that with a group of just eight laypeople, there was no statistically significant difference between the crowd performance and a given fact checker. Once the groups got up to 22 people, they actually started significantly outperforming the fact checkers. (These numbers describe the results when the laypeople were told the source of the article. When they didn’t know the source, the crowd did slightly worse.) Perhaps most important, the lay crowds outperformed the fact checkers most dramatically for stories categorized as “political,” because those stories are where the fact checkers were most likely to disagree with each other. Political fact-checking is really hard.
It might seem impossible that random groups of people could surpass the work of trained fact checkers—especially based on nothing more than knowing the headline, first sentence, and publication. But that’s the whole idea behind the wisdom of the crowd: get enough people together, acting independently, and their results will beat the experts’.
“Our sense of what is happening is people are reading this and asking themselves, ‘How well does this line up with everything else I know?’” said Rand. “This is where the wisdom of crowds comes in. You don’t need all the people to know what’s up. By averaging the ratings, the noise cancels out and you get a much higher resolution signal than you would for any individual person.”
This isn’t the same thing as a Reddit-style system of upvotes and downvotes, nor is it the Wikipedia model of citizen-editors. In those cases, small, nonrepresentative subsets of users self-select to curate material, and each one can see what the others are doing. The wisdom of crowds only materializes when groups are diverse and the individuals are making their judgments independently. And relying on randomly assembled, politically balanced groups, rather than a corps of volunteers, makes the researchers’ approach much harder to game. (This also explains why the experiment’s approach is different from Twitter’s Birdwatch, a pilot program that enlists users to write notes explaining why a given tweet is misleading.)
The paper’s main conclusion is straightforward: Social media platforms like Facebook and Twitter could use a crowd-based system to dramatically and cheaply scale up their fact-checking operations without sacrificing accuracy. (The laypeople in the study were paid $9 per hour, which translated to a cost of about $.90 per article.) The crowd-sourcing approach, the researchers argue, would also help increase trust in the process, since it’s easy to assemble groups of laypeople that are politically balanced and thus harder to accuse of partisan bias. (According to a 2019 Pew survey, Republicans overwhelmingly believe fact checkers “tend to favor one side.”) Facebook has already debuted something similar, paying groups of users to “work as researchers to find information that can contradict the most obvious online hoaxes or corroborate other claims.” But that effort is designed to inform the work of the official fact-checking partners, not augment it.
Scaled up fact-checking is one thing. The far more interesting question is how platforms should use it. Should stories labeled false be banned? What about stories that might not have any objectively false information in them, but that are nonetheless misleading or manipulative?
The researchers argue that platforms should move away from both the true/false binary and the leave it alone/flag it binary. Instead, they suggest that platforms incorporate “continuous crowdsourced accuracy ratings” into their ranking algorithms. Instead of having a single true/false cutoff, and treating everything above it one way and everything below it another, platforms should instead incorporate the crowd-assigned score proportionally when determining how prominently a given link should be featured in user feeds. In other words, the less accurate the crowd judges a story to be, the more it gets downranked by the algorithm.