The Most Famous Metaphor About AI Is No Longer Valid

Five years have passed since the publication of On the Dangers of Stochastic Parrots, and Emily Bender has returned to the phrase she coined with a post titled Stochastic Parrots: Frequently Unasked Questions. The text presents itself as a philological clarification, a kind of public service for an audience that has, in the meantime, turned the expression into a slogan for a certain kind of critique of artificial intelligence.

Reading it calmly, however, I have the impression that it is not exactly a clarification; it seems rather like an exercise in rhetorical maintenance. It is worth studying, not in order to criticize Bender, whose positions I partly find solid and defensible, but because it shows how a phrase born in an academic paper can become difficult to control even for the person who coined it.

Bender reiterates that the target of her critique was never language models as such, although, as we shall see, the distinction does not always hold, but rather the industrial practices surrounding them: the exploitation of labor in annotation and reinforcement learning, the systemic opacity of datasets, the environmental impact. It is a position I share and one that, on the political level, remains one of the most rigorous contributions to the debate. I think it is right to acknowledge this, because what follows is meant to be an analysis of the distance between this defensible core and the rest.

The central move of the post is a reclassification. Bender argues that stochastic parrots was never an empirical hypothesis and was instead a metaphor: “an attempt to make vivid what these systems do.” From this she derives the consequence that the phrase is not falsifiable and that therefore all objections of the kind “this no longer applies in 2026” are badly framed, because they apply an empirical criterion to a rhetorical device. Paraphrasing Giorgio Gilestro, we might call this move an unfalsifiable retreat. A thesis that functioned as a description of how a system works is removed from the domain of true and false and relocated to the more protected territory of the rhetorical figure. It preserves its evocative power, but loses its exposure to refutation. The move works; the problem is that the person making it continues to use the descriptive force of the phrase as well.

Immediately after saying that stochastic parrots is only a figurative image, Bender reapplies it as though it were a description. She states that the stochastic parrots framing is still “extremely relevant” even for multimodal models, and that we need to maintain a clear-eyed view of how these systems actually work. But “actually work” is an empirical expression, not a metaphorical one. When I say that a pump actually works like a valve, I am making a claim about its mechanism, not offering a metaphor. The result is a constant oscillation between two mutually supporting positions: a minimal defensive one, according to which it is only a metaphor, and another according to which the metaphor accurately describes what these systems do.

This movement from description to metaphor is well exemplified by an expression in her text, emphasis mine: stochastic parrots (in my writing at least) isn’t an argument. It’s a description or a metaphor.

But if it is a description, and therefore falsifiable, it cannot be a metaphor, not falsifiable, and vice versa.

The argumentative logic recalled by this oscillation is what scholars of public controversies call motte and bailey, from Nicholas Shackel’s famous 2005 essay. The motte is the keep, the reduced but unassailable defensive position to which one retreats when under siege. The bailey is the surrounding courtyard, where one lives and operates when the defenses are not under threat. Anyone using this logic oscillates between the two positions: when attacking, they occupy the bailey; when attacked, they fall back into the motte; and by quickly returning to the bailey as soon as the pressure eases, they force the interlocutor to fight on ground that keeps shifting under their feet.

Bender’s post can be read, in several decisive passages, as a sequence of transitions between these two levels. When the objection arises that the phrase no longer describes current systems, there is a retreat into the motte: it is only a metaphor, not an empirical hypothesis. When it is time to maintain polemical pressure on the models, she moves out into the bailey: it continues to be relevant, and it is important not to lose sight of how they actually work. The movement is so fluid that a good-faith reader may end up not noticing it.

Justice should be done, in all this, to the one point where Bender acknowledges a real complication: multimodal models. She recognizes, in a significant concession, that text-image systems “could be argued to meet the Bender & Koller 2020 definition of understanding, though in an extremely thin way.” I would not call this a capitulation; on the contrary, it seems consistent with her 2020 definition, where understanding required an anchoring in something outside language, and where systems trained only on linguistic form could not possess it precisely for reasons of construction. Multimodal models do have that outside, at least in part, because they see images, associate them with linguistic strings, and build mappings that cross the boundary between what is inside and what is outside language. Bender is rigorously applying her own theory and is not renouncing it. The interesting point, if anything, is what this application implies for the debate; because if understanding is a property that admits degrees and depends on the type of grounding available, then the discussion is no longer binary, whether they understand or do not understand, but quantitative: to what extent, of what kind, and with what practical consequences. That is exactly the terrain that the slogan-formula had accumulated around itself in order to avoid.

There are two other passages in the post that deserve close attention, because they illuminate the overall rhetorical structure. The first concerns Bender’s defense against the accusation that she coined an insult. The response proceeds in three stages. First stage: models cannot be offended, so technically it is not an insult. Second stage: one can still insult a product, but the target of the critique was not the product itself, it was the industry that produces it. Third stage, presented as a neutral philological clarification: the English verb “to parrot” means “to repeat without understanding,” and this is the semantic nuance the phrase is meant to activate. The problem is that this final step reformulates the expression into a proposition that, when applied to an artifact sold on the market as cognitive technology, rightly or wrongly, constitutes by definition a devaluing judgment. The defense against the accusation of having coined an insult consists, in effect, in reiterating that the phrase means exactly the thing that, in the public context in which it circulates, sounds like an insult. They repeat without understanding, then; and this is precisely the substance of the metaphorical field that Bender has built around the phrase over five years of public interventions, from the “synthetic text extruding machines” of the 2024 UCLA conference to the Magic 8 Balls to which she compared AI that same year, all the way to the papier-mâché that returns in the book with Alex Hanna and in the very post I am reading. Or when she wrote that “what is currently being developed as ‘AI’ does not work, nor is it helpful, for an overwhelmingly large portion of people living on the earth today, especially people in the Majority World.”

These are not insults directed at the models, obviously; they are descriptions that Bender repeatedly proposes and that, in the public context in which they circulate, amount to a devaluing judgment.

A judgment that in some parts of the text she instead seems to want to circumscribe. In the section devoted to “just,” Bender conducts a small lexical analysis: the adverb “just” evokes a scale, a position on a hierarchy of capacities, and those who attribute this move to her are misunderstanding the meaning of her phrase; because she, she argues, is not measuring anything:

“I am not invested in the project of ‘AI’, do not see it as a goal that is worthwhile (nor feasible) to work towards, and am not measuring large language models against some scale of progress towards that goal.”

The reasoning has its own internal coherence; if you refuse to participate in the project, you cannot be accused of placing an object incorrectly on a scale you do not recognize. The problem is that this position is difficult to reconcile with the way Bender has described models in terms that imply a very precise judgment of capacity: “bullshit machines,” “nothing more than souped-up autocomplete,” “garbage in, garbage out.” These are assessments of the functioning of the models themselves, and they work rhetorically because they evoke exactly the scale that Bender now claims not to be using. To claim that one is not measuring anything on any hierarchy, after years of expressions that place these systems very low on any hierarchy of cognitive capacity, is a move that requires an explanation the post does not provide.

The second passage worth noting is the response to the objection “it was true in 2021, it is no longer true in 2026.” Bender classifies it as a recurring tic of hype supporters, who, with every new model, renew the announcement of an epochal breakthrough, the “real AI” that has finally arrived. As a sociological observation, this is apt; I too have witnessed many of these advertising-style proclamations, and I expect more. As an argument, however, it is insufficient, because it merges two things that should be kept separate. One is Big Tech’s promotional media cycle; the other, entirely different, is the fact that the measurable capabilities of 2026 models are incomparable with those of 2020 and that certain tasks that were impossible five years ago are now routine. Treating the second observation as a variant of the first is an operation that makes the difference between advertising and facts disappear. On this point, in my view, the post leaves the question unanswered; and it is here that the clarification shows its limit. It seems difficult to maintain today that a multimodal model with extended reasoning capabilities is adequately described by the image of a parrot rearranging pieces of language without understanding them; not because it has consciousness, certainly, but because the phrase is too coarse for what it tries to describe.

So what remains after this close reading? What remains is the critique around power, data, labor, and the environment; that part of the project which Bender herself identifies as the heart of her work and which is indeed the most solid. Less remains, much less, of the semantic thesis in its strong form; today that thesis is protected by a metaphorical shield that makes it invulnerable and almost inert, because anything that is at once unassailable and unverifiable ends up not meaning very much.

There is, however, a constructive conclusion to draw about the current state of public criticism of artificial intelligence. Bender’s post is valuable also because it records that the time has come to abandon the absolutist posture. Two things can be held together, and we must learn to do so without treating them as contradictory: we can say that current systems raise social, economic, and environmental problems, and that they are at the same time powerful, useful, and in many cases extraordinary tools. The opposite fiction, according to which if something is a product of capitalism it cannot also be a major technical achievement, is a shortcut that no longer holds. Everything is a product of capitalism in this system; so is the press that disseminates critiques of capitalism, so is electricity, so is academia itself in part, whose role in the birth of LLMs is as important as that of tech companies. A mature critique mitigates, regulates, redistributes, and reappropriates; it does not pretend that the tool does not work in order to avoid engaging with a nuanced judgment, because in the long run that fiction weakens the critique precisely where it is valid.

Bender’s parrots have flown far enough that they can no longer be hit by critics, but also far enough that we hear their voice less and less. We can now think about the problems that have remained unchanged, combining enthusiasm for a new cognitive technology with criticism of the social context and the way in which it is being disseminated.

Francesco D’Isa