It’s time to get rid of the peer-reviewed paper

The peer-reviewed journal article, perhaps the single most important device behind the expansion of scientific knowledge in the last century, is an extraordinarily expensive way to share novelty: expensive in time, in labor, in infrastructure, and in the cognitive resources of the people who write, review, edit, and read it. No wonder it occupies the role of the central currency of academic life — what you need produce to get hired, promoted, funded, and recognized.

Cartoon by Nicholas Kim.

The article’s dominance is inseparable from the publish-or-perish culture that took hold in most research institutions and evaluation agencies over the last 60 or 70 years. As institutions scaled up, as grant funding became more competitive, and as institutions began evaluating researchers in ever more quantified ways, productivity—often measured through the rather blunt raw publication/journal impact factor output—became the dominant metric of scholarly worth. The consequences are visible in the data: the average number of publications per researcher per year has more than doubled over the past fifty years. Assistant professor job listings now routinely list publication requirements that would have looked like a mid-career record a generation ago (I’m being generous here). We have built an elaborate institutional machine for producing papers, and the machine has been running hot.

The oil that lubricates this machine is almost entirely our (uncompensated) labor. Peer review, the epistemic backbone of scientific communication today, is performed by researchers who receive nothing for their time beyond the vague promise of reciprocity and the diffuse good of a functioning knowledge commons. This arrangement suits a very clear, almost parasitic entity that has grown along with modern science: the for-profit publishers who control much of the publishing infrastructure. The scientific publishing industry is a remarkably lucrative business: Elsevier routinely reports profit margins above 30%, financed largely by the unpaid labor of the same academics whose institutions then pay eye-watering subscription fees to access the results (sure, the lucky ones are “compensated” through our salaries; but our salaries are not paid in any way or form by Elsevier’s 30% profit margins). “Publish or perish” means something quite different for the executives of these global corporations: there simply is no incentive in this structure to reduce volume or, indeed, reduce the number of submissions that relentlessly reach the desks of editors and reviewers. More submissions means more journals, more journals means more prestige differentiation, more prestige differentiation means more institutional subscriptions, and the cycle continues.

Unlike the ever-increasing volume of new papers, the reviewer pool has not grown at the same rate. The system is stretched, and has been for some time. Response times lengthen. Quality declines. Desk rejections proliferate (and paradoxically remain insufficient) as editorial bandwidth runs thin. In 2020 alone, reviewers spent an estimated 100 million hours working on peer review, amounting to approximately $1.5 billion in time for US-based reviewers alone, all of it contributed for free which remains chronically insufficient in quantity. This is the context into which AI arrived, and not as a solution but as dry wood added to fire with some jet fuel sprinkled on top.

Per Engzell and Nathan Wilmers’s The Paper Factory elicits much reflection, not because it resolves any of these tensions but because it makes them concrete. Engzell and Wilmers built a multi-agent LLM workflow capable of producing a full quantitative social science paper from an initial prompt. Over a week, the system generated 34 papers. In their assessment, many of these papers would be sent out for review though none were immediately publishable. The papers found empirical discoveries of the kind that “build up social science” but fell short with coherence, framing, and the exercise of judgment under competing criteria.

Engzell and Wilmers’ system is presented as a contribution to research practice; lower the cost of execution and you shift effort toward what actually matters: problem selection, ambition, interpretation, human judgment of the kind that is difficult to automate (at this time; who knows what may come). The authors even suggest that the paper factory might push science toward more ambitious questions, since the grind and sunk costs of the “procedural labor” of data management and project setup is what often induces self-censorship in research design. If you can test a risky idea cheaply, you’re more likely to try.

This argument makes sense and there may be, indeed, a way that multi-agent LLMs may lead to more efficient (and hence lower barrier) research workflows. But this view is also somewhat optimistic about the incentive structure that would be needed for a shift in this direction to actually occur. The paper factory doesn’t break publish-or-perish but dramatically lowers the cost of meeting the ever-increasing expectations of productivity. And the output it produces, while empirically competent and moderately interesting, is more or less exactly what the current monetized institutional machinery rewards. The concern isn’t that AI will produce obviously bad papers. It’s that it will produce, at scale and at speed, the kind of papers that are already crowding the journals: technically adequate, marginally novel, epistemically forgettable, those “modal” papers that get a handful of citations throughout their lives. Engzell and Wilmers acknowledge this directly: the most immediate risk is “mediocrity at scale,” a flood of work that isn’t wrong enough to dismiss but isn’t interesting enough to advance anything. SocArXiv recently had to institute an emergency moratorium on AI-topic papers and new identity-verification requirements for submitting authors, citing record submission rates and a large volume of what they called, pointedly, “AI slop.” Agentic AI may already be straining much of the competitive funding system. The flood is not coming; it’s here and we are knee deep in rising waters.

This matters doubly in light of what the metascience literature has been telling us about the trajectory of research quality independent of AI. Park, Leahey, and Funk’s Nature paper documented a clear and consistent decline in the disruptiveness of scientific and technological output — papers are increasingly less likely to break with the past in ways that push science in new directions, a pattern that holds universally across fields. A separate recent analysis in Science adds a neat demographic dimension: the probability of a researcher producing highly disruptive work decreases with academic age, and as the scientific workforce ages, the entire system is shifting toward consolidation of existing ideas rather than disruption of them. The workforce producing the papers is increasingly generating mountains of normal science: competent, careful, incremental, but especially surprising. I wonder when this mountain will transition into being the foundation of Lakatosian “degenerating” research programs in various domains. We may soon find out!

The defunded condition of the US research enterprise—which is currently being dismantled with a speed that is difficult to track in real time—makes these structural conditions all the more worrying. Less funding means fewer new investigators, a more conservative risk profile across research programs, and a more demographically senior field, all of which compound the age-and-disruption dynamics just described. The paper factory arrives at the worst possible moment: a tool optimized for the efficient production of incremental work, deployed into a system that was already producing too much incremental work relative to the attention and review resources available to evaluate it.

Put these things together and you have a picture that does not flatter the peer-reviewed journal article as an institution. We have a publishing system optimized for volume, financed by uncompensated labor, controlled by profit-seeking entities, producing increasingly incremental work, now facing an AI-driven capacity to generate competent manuscripts at near-zero marginal cost when the broader research enterprise is under severe fiscal constraint. What exactly is this enormous expenditure of collective intelligence through peer review buying us? What is it safeguarding?

The honest answer is: not as much as we tell ourselves. Indeed, maybe in its current form, the peer-reviewed article has simply outlived its primary function as a mechanism for certifying and communicating novel scientific advances.

This obviously doesn’t mean peer review is worthless: to a considerable degree, it’s central to the way we structure communities around knowledge, practices, puzzles, and instruments. Yet this fact should not stop us from rethinking the architecture of the system rather than continuing to patch a structure designed for a different era of science.

What might that look like?

One is through a revaluation and valorization of editorial labor. Journals, whether for-profit or not, need to either compensate reviewers or dramatically reduce the volume they process. Engzell and Wilmers suggest that as AI-generated papers become more technically polished and better organized, the editorial burden may actually shift from assessing whether something has the superficial look of publishable work, to engaging directly with its substance. That’s a reasonable hope. But it requires editorial infrastructure, and editorial time, that most journals don’t and can’t currently invest in because it eats into Elsevier/Springer et al’s profits. This can also imply a more involved shift from volume processing to curation where editors acquire a much more central role, operating perhaps with fewer rounds of review and taking direct command over their journals and their contents.

A second is a more serious and systematic use of open repositories not as pre-publication venues but as primary publication destinations for findings that are real and reproducible but, let us be honest, not necessarily paradigm-shifting. Although this might terrify some, in some fields it is already common practice for advances to be discussed years before their formal peer reviewed publication as pre-prints that make the rounds and spark conversation. Not everything that gets communicated needs to have passed through the same process of multiple rounds of review that is thought to genuinely move a field. Indeed, this infrastructure for tiered communication of scientific results is already here, but what’s lacking are the incentive structures to support it.

A third and most structurally consequential is a change in how academic institutions evaluate researchers. Hiring, promotion, and review processes that count papers rather than assessing demonstrable intellectual contributions to a field are part of how we got here. This also goes for funders that emphasize numbers over quality. The paper factory makes this particularly urgent. If the execution of a competent empirical study can be automated, then the execution of competent empirical studies is not, and probably never was, the thing we should have been rewarding (not the least because productivity is a proxy of resources access, and there’s a guy called Matthew that has an effect named after him that might also have a thought or two about what this says about quality versus rewards).

The AI paper factory is not going to wait for institutional reform. What is probably going to happen, in the near term, is that the existing system will become increasingly strained, contested, inefficient. Status signals (which are also of the best signals of quality and innovation) may further strengthen their role in our evaluation. And the science that will be out there will be noisier, ore uncertain, and concerningly less legitimate in the public’s eyes. Sometimes institutions have to fail visibly before change becomes possible and failure is what we might see very soon. The peer-reviewed article has had a good run. Whether what replaces it is better depends almost entirely on how we confront this moment of disruption to actually think about what scientific communication is for. Just something more to work on during this polycrisis.