Writing Like an Algorithm

In a recent post on BSky, Kieran Healy reminded us of the very practical realities of academics colliding with point-and-click large language models. Published in Surfaces and Interfaces, a paper starts with the tell-tale signs of an AI-generated text: “Certainly, here is a possible introduction for your topic”. While more evidence is needed to state anything with certainty, it would be a safe bet to claim that the remainder of the article—a riveting discussion on something or other connected to battery design—is likewise the product of artificial hands. The future is here, for better or for worse (thermodynamics would tell you: probably for worse).

Who’s to blame? Certainly, LLMs have lowered the cost for producing reasonably passable text leading to rapid proliferation in their use. Need to write a quick end-of-the-year email for your team? Need some text for a brochure that very few will actually read? Need ideas for how to write that message telling someone they did not get the job? Large Language Models are there to ‘help’. Where this help becomes more problematic, of course, is in areas where text matters more substantively. Sure, brochures and emails are important, but they are not necessarily sites for the production of factual, public knowledge. That’s the realm of newspapers and magazines and, similarly, scientific publications.

In all cases, the use of LLMs seems to respond to the same problem: it’s a simple means for creating a supply that deals with an increased demand for text, whether real or perceived. In science, we have known this demand for a while. We called it “Publish or Perish”. Standards for incoming assistant professors today are considerably higher than two or three decades ago; on average, researchers publish more than what they did a generation ago; even hyper-prolific authors are more prolific than ever (hundreds of articles per year!). In 2008, about 1.8 million scientific papers were published across the world. That number is now closer to 5 million, according to some estimates. The volumes only increase, with more papers published per journal and more journals hitting the shelves, year after year.

Who’s to blame? There is nothing essential in how knowledge is made that would explain these rather large numbers. And they are large. Throughout his entire career, Niels Bohr wrote fewer than 120 papers. Albert Einstein around 300. These are rather pitiful numbers when compared to the more than 2300 articles published by some of the most prolific particle physicists today. Sure, scientists want to share what they discover about the world, but how much they share and where the do so isn’t prescribed in some cosmic epistemic constitution. On the contrary, and seemingly breaking some undiscovered conservation principle, more and more gets published.

Like any other social institution, the way scientists behave is modulated by distinct organizational incentives. We know this all too well: every time we are up for review in our institutions, we count the beans, whether we like it or not. I’m not entirely sure where this incentive came from. It’s not mandated by legislatures and states. Lawmakers may be encroaching increasingly in what people write about, but as far as I know tend not to get into questions about how much academics should write. Some may well be self-imposed, as I’ve argued elsewhere. But wherever it comes from, it’s clearly part of a globalized habitus: quantification in various forms is a mainstay of academics anywhere you go.

Under these incentives, we have become well-tuned Turing machines. We write. A lot. Because we are rewarded to write. Quality matters, of course. But so does volume. Like an LLM. Individual outputs may be inadequate, false, perhaps even icky, but overall, they produce a corpus of output that seems relatively reasonable. A considerable proportion of the science we publish is similar, at least in spirit, to these AI-produced texts. It’s not merely normal science, in Kuhn’s sense, slowly exploring the limits and re-establishing the internal logic of some particular paradigm. Connecting income and careers to volume has resulted in something that we could better call “Dime a Dozen Science”. Scientific? Yes, I guess. But not entirely original. Not entirely false. Not entirely relevant. Not entirely useful, aside for adding another line to the CV.

Who’s to blame? Fraudulent authors and bad editors are certainly part of the problem. Publishers like Elsevier that charge $2,500 to publish AI-generated text in their journal Surfaces and Interfaces are also to blame. But when looking for culprits, let’s also recognize the problem of volume, the 5 million articles published every year, the ever expanding CVs, and the paradoxical decreasing originality and innovativeness of sciences as measured in various studies.

So long as we incentivize scholarly careers on the basis of how much people write, so long as we build scholarly infrastructures geared towards maximizing the number and role of specific outputs, we will merely be reflecting in our craft much of what LLMs already do quite well. Unsurprisingly, writing like an algorithm is a wonderful way to quickly and efficiently become quite irrelevant.