The COVID-19 pandemic has forever altered our lives. But even in the face of unprecedented disruption and tragedy, the scientific community is experiencing a foundational, and hopeful, shift. As researchers race to better understand the virus, they’re disseminating their findings more rapidly, at higher quantities, and more publicly than ever before. By prompting a mad dash for knowledge, COVID-19 has placed scientific inquiry firmly in the public domain, and expedited the movement toward open science.
In large part, this shift has been due to the rise of preprints—scientific manuscripts that are made available in advance of their formal peer review and publication. As the coronavirus death toll surpasses 625,000 worldwide, preprints are accelerating scientific discovery at a time when it’s needed most. Unlike the traditional publishing process, which can easily take months or years to complete, researchers can submit a preprint and have it posted within days. And while scientific journals often sit behind paywalls, preprints are being made freely accessible to the public and researchers alike.
As popular as they’ve become recently, preprints are not a new tool. In the 1960s, the National Institutes of Health ran an experiment circulating preprints so scientists could share information more quickly. The rise of technology has only made this practice more common. Mathematicians, physicists, and computer scientists use preprints through arXiv, a server launched in 1991—the project that pioneered preprints in the digital era. BioRxiv, a server focused on the biological sciences, was founded in 2013, and medRxiv, a server focused on health and medicine, launched in 2019.
In March, as the World Health Organization declared COVID-19 a pandemic, 8,830 biomedical preprints were published, a 142% increase from last year. Over the past few months, approximately half of all available scientific work on COVID-19 has been published through preprint servers, amounting to more than 18,000 preprints as of July 2020. Traffic to these servers has jumped substantially too. MedRxiv’s page views have spiked to 15 million a month, compared to 1 million a month before the pandemic began.
Compare this to the 2014 Ebola and 2015 Zika outbreaks, when only 5% of published papers were shared in the form of a preprint. This is a missed opportunity, because those preprints, which usually appeared around 100 days before journal articles later published on the same topic, had a better chance of making an impact. Or consider the 2003 SARS epidemic: a recent analysis found that 93% of the papers published about the spread of SARS in Hong Kong and Toronto were published after the epidemic period had ended.
Too often, discussions about preprints cast them as a less polished, even rushed or rash, version of a scientific manuscript. But preprints are something entirely different, and they are laying the groundwork for a new model of scientific communication. They break disciplinary barriers, encourage collaboration, speed up dissemination, and dramatically broaden the audience of scientists who can review, critique, or corroborate a study’s findings, all in real time.
The pandemic has shined a light on how the scientific community can self-organize and make the critique and evaluation of new results more scalable. The traditional peer-review process is opaque, and errors that slip through after a manuscript is reviewed by only a handful of reviewers can take months or years to be identified, corrected, or retracted. In contrast, when one preprint’s speculation on the sequence of SARS-CoV-2 fueled conspiracy theories that the virus was man-made, a host of scientists swiftly and publicly pointed out errors in the study’s methods. Within 48 hours, the authors had formally withdrawn their findings. The coronavirus pandemic has demonstrated that preprints shared in the open paired with tools and platforms for critiquing and analyzing these results can lead to a much faster and more robust quality-control process that involves dozens or hundreds of scientists. This development has strong similarities with the trajectory of collaborative knowledge production systems such as Wikipedia and open-source code repositories, and mimics the nonlinear, iterative structure of scientific discourse.
Rapid, community-driven open peer review of results shared through preprints has been the norm, not the exception, during the first months of the pandemic, and researchers worldwide have hailed them as a model for science in general. Combined with social media platforms and the convergence of attention of many scientists in the biomedical community, preprints have become the most significant experiment in large-scale collaboration in modern science. Flawed results are quickly identified and withdrawn; methods are discussed, reused, and repurposed for similar studies; incomplete manuscripts are revised and improved through collective feedback; new research questions are generated in real time. This kind of self-regulation is a clear benefit of opening up academic findings to a broader community of scientists.
However, these benefits do not come for free. While preprints are on the rise, they still represent a fraction of biomedical literature, and incentives for scientists to contribute preprints and engage in open peer review are not always as straightforward as our current pandemic emergency. Preprint servers are also struggling to come up with effective strategies for being transparent about the preliminary status of their results. MedRxiv features a banner on its website emphasizing that preprints aren’t meant to guide clinical practice or to be reported as fact in the news.
Additionally, some preprint servers have screening processes in place, where submissions are reviewed by subject matter experts, checked for plagiarism, and rejected if they make a health claim that might change human behavior—for example, by asserting that doing something might increase one’s risk of infection.
Of course, these screening processes and notices must be matched by efforts to educate the public about what preprints are and how to use them. New best practices need to be developed to help reporters evaluate what they find in preprints and report on them responsibly. And so do new support networks. Preprint advocates have called, for example, for the creation of rapid-response review venues to connect reporters with independent scientists and offer on-demand, expert perspectives on new preprints of interest. This kind of collaboration, similar to past experiments in expert engagement, could play out through any number of platforms and partnerships, and ensure that preprint findings are reported accurately and within context.
More broadly, this shift requires us all to think beyond formal papers and manuscripts as the only units of scientific knowledge, and to be more critical consumers of research. Right now, the production, vetting, and editing of a manuscript still happens behind closed doors, locked inside journal editing screening processes. We envision a world in which all of that takes place out in the open, and a marketplace of tools, services, and community-driven initiatives help make science better by engaging more people to participate in the process.
In this world, we might have an off-the-shelf service that translates a public scientific paper for a general audience. We could have a community of scientists to help ensure that gene-variant names in another study aren’t misspelled. There could be new technology that would help analyze the connection between results, methods, data, and resources. And, for sure, there would be more novel collaborations that bring important breakthroughs to the forefront when they’re needed most. These “overlay services” built on top of preprints to deliver value to scientists are what made other examples of online collaboration and peer production successful.
At the Chan Zuckerberg Initiative, we’ve been thinking about these possibilities, and about how preprints might pave the way for a more democratic and rigorous model of science. That’s why we support preprint servers bioRxiv and medRxiv, and organizations such as ASAPbio that promote the uptake of preprints. It’s why we’re one of a growing number of science organizations that makes funding contingent on whether or not our grantees share their results on preprint servers. And it’s why we’re working with a number of partners, and on the lookout for more, to determine what the future of preprints might look like.
We believe that the recent popularity of preprints is just the beginning. And that, with continued work, it’ll point us to a better model of open science: one that fuels greater collaboration, accelerates scientific progress, and allows us all—scientist or not—to have the full force of scientific research at our fingertips.
Dario Taraborelli is the Science Program Officer for Open Science at the Chan Zuckerberg Initiative. Dario is a social computing researcher and an open knowledge advocate. As the Science Program Officer for Open Science at CZI, his goal is to build programs and technology to support open, reproducible, and accessible research. Prior to joining CZI, he served as the director, head of research at the Wikimedia Foundation, the nonprofit that operates Wikipedia and its sister projects. As a coauthor of the “Altmetrics Manifesto,” a cofounder of the Initiative for Open Citations, and a long-standing open access advocate, he has been designing systems and programs to accelerate the discoverability and reuse of scientific knowledge by scholars, policy makers, and the general public alike.