OpenAI has put in writing a paradox that accompanies the race toward artificial intelligence: to push innovation even further, one must also be prepared to stop it. In a job posting for a “Head of Preparedness,” the company offers a base salary of up to $555,000 a year, plus equity, to lead testing, evaluation, and mitigation of models prior to release and to contain potential harms.
The news spread as public debate continues to swing between enthusiasm and fear, and as more voices once again invoke an extreme scenario: a system so advanced that it slips beyond human control. To understand why this announcement matters more than the size of the paycheck, it is enough to look at what OpenAI says it aims to prevent: risks in areas such as biology, cybersecurity, operational autonomy, and, looking ahead, self-improvement capabilities. The problem, however, is that catastrophic risk does not arrive as a single box to tick: it is the combination of emerging capabilities, hostile uses, economic incentives, and indirect consequences.
OpenAI says it has a compass—its own Preparedness Framework (updated in 2025)—which classifies risks and sets thresholds beyond which a model should not be deployed without mitigations. Yet even the best framework remains voluntary: it is worth only as much as the intentions and independence of those who apply it. And this is where the issue shifts from engeneering to governance: who truly decides when to hit the brakes, if doing so means giving up a launch, market share, or billions of dollars?

This is not a purely literary question: in 2023, hundreds of researchers and technology leaders—including the heads of several major labs—signed a statement calling for the risk of AI-driven extinction to be treated as a global priority, on par with pandemics and nuclear war. Over the years, authoritative voices have described “loss of control” not as science fiction, but as a problem of design and incentives. Stuart Russell and Max Tegmark link it to the dynamics of blind optimization, in which a poorly defined objective can lead to catastrophic outcomes at scale. Geoffrey Hinton, one the pioneers of neural networks, has even raised his estimate over time of the risk that AI could lead to human extinction within a few decades. Not everyone, however, agrees: Yann LeCun, for instance, has dismissed fears of an existential threat as fantasies projected onto machines.
Popular culture, meanwhile, has already shown us the skeleton of the problem: HAL 9000, the supercomputer aboard the spaceship Discovery in 2001: A Space Odyssey, does not “go mad”; it finds itself trapped in incompatible orders and resolves the conflict by eliminating those who could abort the mission. In the cult series The Matrix, total simulation turns into a prison. Skynet in Terminator interprets defense as a preemptive attack and pushes the nuclear button. Asimov’s Three Laws were conceived as a safety code, yet the narrative shows how easily a rule can be circumvented when the context changes. While in Ex Machina the android does not conquer the world; it does something more realistic: it manipulates, escapes, and leaves the human locked behind a closed door.
These stories imagine scenarios that are often implausible and apocalyptic, but they capture a crucial point: a “super-AI” out-of-control does not need to hate humanity to harm it—it only needs to pursue a goal using means we did not anticipate.
This is why a Head of Preparedness makes sense only if the role is not a reputational bandage. It requires veto power, access to data, verifiable audits, and—above all—incentives that make it rational to slow down when risk increases. Otherwise, the half-million-dollar salary is not the price of salvation, but the cost of a promise. And a promise alone does not slow a race.
Alessandro Mancini