AI didn’t break science. It exposed it

On May 7, 2026, The Lancet published an interesting study. A group led by Maxim Topaz of Columbia University School of Nursing developed an automated system for verifying bibliographic references and applied it to nearly two and a half million open-access papers hosted on PubMed Central, published between January 1, 2023 and February 18, 2026. They examined 125.6 million references, 97.1 million of which had an identifier that could be checked against PubMed, Crossref, OpenAlex, and Google Scholar. The result is what people are starting to call the fabrication rate: one citation in 2,828 in 2023, one in 458 in 2025, and one in 277 in the first seven weeks of 2026.

The temptation, when faced with this figure, is to argue that AI is ruining scientific literature, and I imagine this is how many newspapers will soon frame it. But that is a mistaken and ineffective reading. What the study shows, if one reads it with even a minimum of intellectual honesty, is much more uncomfortable.

To identify false citations, Topaz’s group used artificial intelligence. They used Claude Haiku, Anthropic’s lighter-weight model, to scan two and a half million papers. AI was used in particular to distinguish fabricated references from simple formatting discrepancies, such as informally abbreviated titles or typographical errors.

The tool accused of polluting scientific literature is exactly the same tool that made it possible to measure that pollution. So why is it not being used by the people writing papers to run the same checks? The best-performing systems are, moreover, far more effective than the one used by the researchers.

Language models in 2026 hallucinate much less than those of 2023. This means that the rate of fabricated citations per query is falling, while the rate of fabricated citations per paper is rising. If errors per use are decreasing but the total number of errors is increasing, it means that the number of users in academia is growing much faster than the quality of the tools and awareness of how to use them. Not least because, if you ask a recent model to check a citation it has just produced, in most cases it notices the problem and corrects it. Humanity: still undefeated in finding the laziest possible workflow.

Mohammad Hosseini, a researcher at Northwestern University, explains this effectively in the STAT News interview that accompanied the publication of one of his studies. The presence of hallucinated citations, Hosseini argues, suggests that there are people who do not even want to spend half an hour checking the references in a paper. This rush to publish indicates that the academic evaluation model is flawed and places too much emphasis on peer-reviewed publications: the famous publish or die.

I find it hard to believe that the researchers who today publish papers with dozens of fabricated references were, two or three years ago, scrupulous bibliographic fact-checkers. AI has provided a faster tool to people who were already working badly, and has exposed, almost grotesquely, a practice that already existed in various forms.

A meta-analysis published in 2015 by Hannah Jergas and Christopher Baethge in the journal PeerJ, which examined twenty-eight studies and more than seven thousand references in medical literature between 1985 and 2013, estimated a total citation error rate of 25.4%: one in four. Of these, about half were “major” errors, meaning that the cited source did not support the authors’ claim at all. There is also a complementary observation, cited in the same work: an analysis of the physics literature by Simkin and Roychowdhury estimated that between 70 and 90% of citations are copied from other people’s bibliographies without the authors having actually read the papers they cite. The figure is inferential, derived from the propagation of typographical errors in bibliographies, but robust enough to have been taken up in numerous subsequent studies. Put differently, the phenomenon measured by The Lancet has deep roots in academic practice.

How often is a citation truly important, and how often is it an act of courtesy, bibliographic padding, or a form of social positioning? The crisis of fabricated citations is an opportunity to recognize that many of the citations we place in articles have never performed the evidentiary function that academic rhetoric attributes to them. They are added instead to signal that something has been read, to thank a colleague, to avoid a vindictive reviewer, or to bulk up a bibliography.

NeurIPS, the world’s leading machine learning conference, received 21,575 submissions in 2025, compared with around 18,000 in 2024 and fewer than ten thousand in 2020. The acceptance rate remained around 24.52%. Translation: more than five thousand papers accepted every year at that conference alone, each with an average of ten or fifteen authors, each reviewed by at least three experts. The global editorial apparatus is drowning in a volume of submissions that is growing exponentially, while the pool of competent reviewers and editors is growing, at best, linearly.

When GPTZero analyzed the papers accepted at NeurIPS 2025, it found at least one hundred hallucinated citations across fifty-three papers, each reviewed by at least three experts. If not even a top-tier conference, with its double-blind review system and all the prestige attached to it, can catch fabricated citations, it is reasonable to suspect that the problem does not lie with individual reviewers, many of whom, I imagine, did conscientious work, but with the scale of the task they are being asked to perform.

Publish or perish is an incentive structure that measures academic value in quantitative terms: number of publications, h-index, impact factor, presence at prestigious conferences. A young researcher who wants to secure a stable position must, on average, produce far more than their predecessors did at the same age, and must do so in less time, with fewer resources, and in an academic job market that has often shrunk. The logical consequence is that the attention devoted to each individual paper decreases. If all language models disappeared tomorrow, publish or perish would continue to generate mediocre papers. Without AI, it would simply do so more slowly and with different forms of sloppiness.

This is where a proposal enters the scene that seems to me the most interesting one to have appeared recently on the topic. Luciano Floridi, in a paper published on SSRN on April 28, 2026 and destined for Philosophy & Technology, the journal he edits, proposes a complete restructuring of the scientific editorial apparatus. The title is already a declaration of intent: The Editor’s Signature: A Proposal for AI-Born Journals.

The distinctive function of a scientific journal, Floridi argues, is the editorial signature on a credible claim: the act by which one or more responsible editors, embedded in a research community and supported by an institution, accept responsibility for what they publish under conditions of evidentiary uncertainty. Everything else, formatting, forms, integrity checks, reviewer matching, bibliographic conversion, is infrastructure. And AI, Floridi argues, is the first technology since the birth of the modern editorial system capable of absorbing this infrastructure without ruining the signature.

From this premise, Floridi derives a distinction between two categories of journals. AI-assisted journals are those that add AI tools to a workflow designed for a pre-AI era. AI-born journals are those designed from the outset around what AI agents can reliably do, and around the distinction between what is procedural, and therefore automatable, and what is substantive, and therefore irreducibly human.

Among the necessary features of an AI-born journal, Floridi lists an integrity agent that performs, before peer review, a series of automated checks that must be non-blocking, meaning that they produce a report for the editor without triggering an automatic rejection. The decision remains human, but the mechanical work that precedes it is absorbed by the machine.

This is exactly what was missing in the case of NeurIPS. An automated citation check against a database would have caught the one hundred hallucinated references before human reviewers ever saw them. It would not have replaced their judgment, but it would have relieved them of a mechanical and extremely boring task that human beings are incapable of performing at scale. The authors of the Lancet paper themselves argue that automated reference-verification tools already exist and could be incorporated into journal submission workflows before peer review.

It would be too easy, however, to conclude that all we need is to add a citation-verification agent and the problem is solved. Automated verification can catch the most blatant cases, but it still cannot easily distinguish between a relevant citation and a courtesy citation, between a reference that genuinely supports a claim and one that has been slipped into the text for reasons that have little to do with its evidentiary function.

One of the reasons fabricated citations proliferate, moreover, is that most users of generative AI do not know how the tools they use actually work. Without basic training, we will continue to have researchers who use AI imprecisely, and the stigma around the tool does not help, since it is then used in secret.

A verification pipeline like Topaz’s, applied systematically at the point of article submission, could drastically reduce the fabrication rate. An integrity agent like the one imagined by Floridi, integrated into the editorial architecture, could relieve human reviewers of a task they cannot perform. For this to happen, we need to stop treating AI as either an enemy or a shortcut and start treating it as what it is: a powerful, fallible, and very useful tool that requires training and supervision.

Francesco D’Isa