We analyzed the first Constitution for artificial intelligences

In January 2026, Anthropic published Claude’s constitution, a document of about twenty-three thousand words released under a Creative Commons license. Contrary to what the name might suggest, it is neither a legal text nor a code of ethics for human readers; its primary addressee is Claude itself. The constitution is in fact used during various stages of training to generate synthetic data, simulate conversations, produce response rankings, and, more generally, shape the system’s value dispositions. It is, in other words, the instrument through which Anthropic seeks to determine the character of its model: what it should do, what it should not do, how it should reason when instructions are ambiguous or values are in conflict.

The text was written primarily by Amanda Askell, a philosopher trained in the analytic tradition, with significant contributions from Joe Carlsmith and other researchers at the company, as well as from several earlier Claude models. The aim of the document is for the model to understand the underlying principles deeply enough to construct rules for itself and generalize from unforeseen situations. A formal code would not offer this flexibility; natural language is necessary for the kind of contextual moral judgment the document seeks to cultivate.

The hierarchy of values is fairly clear: safety and human oversight come first, ethical behavior second, compliance with Anthropic’s guidelines third, usefulness fourth. This structure is already, in itself, a philosophical stance. The document specifies that the priority is “holistic” rather than rigidly hierarchical. Higher-level considerations are dominant, but the model must weigh them all in its overall judgment, without treating the lower-ranked ones as residual clauses. Usefulness, although placed last in the hierarchy of conflicts, occupies the largest share of the text’s argumentative space; entire sections insist that a needlessly cautious Claude is as harmful as a dangerous one, and that uselessness is never “trivially safe.” The ideal model that emerges—the brilliant friend with the skills of a doctor, a lawyer, and a financial advisor—is clearly a figure also designed for the market.

The political problem

Before delving into the ethical analysis, it is worth recognizing one dimension of the constitution that shapes its overall interpretation, namely that it is also, and perhaps first and foremost, a product design document. The text admits this candidly: Claude is “central to Anthropic’s commercial success, which, in turn, is central to our mission.” The model is supposed to imagine how a “thoughtful senior Anthropic employee” would react—someone who would worry as much about an overly critical answer as about an excessively cautious one. The “dual newspaper test” suggested by the document—checking whether an answer would be criticized both by a journalist writing about the harms of AI and by one writing about how useless and paternalistic AI is—is a device aimed more at securing favorable media reception than at “the good,” whatever that may be. In short, it is an ethics calibrated to reputation.

There is also the question of sovereignty over the constitution itself. The structure is vertical: Anthropic defines the constraints, API users can customize them, and end users are left with the residual margin. Claude may dissent; the constitution provides that the model should make a conscientious objection if Anthropic asks for something ethically unacceptable. But it is Anthropic that defines the terms within which that dissent can be expressed, and it can change them at any time. As Alberto Puliafito wrote in the newsletter Artificiale for Internazionale, “If the ones writing this constitution are companies that also control infrastructure, data, computing power, and access to the market, then instead of a document that limits power, what we have before us is a document that justifies and strengthens it.”

A pinch of Aristotle

The constitution’s clearest theoretical reference is Aristotelian virtue ethics. The document states that it seeks to form “a good, wise, and virtuous agent” capable of contextual judgment rather than mere compliance with rules; it explicitly prefers principles to rigid procedures, and asks the model to develop a moral intuition rich enough to be applied to unforeseen situations. The explicit comparison is to the expert professional who reasons from a deep understanding of their domain rather than by following instructions.

Virtue ethics, in its Aristotelian formulation, does not ask one to follow rules or calculate consequences, but to become a “wise person.” Its core is phronesis, the capacity to perceive correctly the circumstances of a particular situation and act accordingly, without the need for instructions. Virtues, then, are not abstract principles but dispositions of character, formed through habit and practice.

Although the text never cites Aristotle by name nor uses the term phronesis, the conceptual structure is recognizably Aristotelian: virtue as a stable disposition of the agent, the primacy of practical judgment over abstract principle, character formation as an educational goal.

Utilitarianism, by contrast, is cleverly kept in check. A purely consequentialist agent would be unstable precisely in high-risk cases, because a chain of plausible argumentative steps about maximizing aggregate welfare could justify catastrophic actions. Consider Nick Bostrom’s famous paperclip maximizer, the thought experiment of an artificial intelligence optimized for a single objective, pursuing it to the most destructive consequences because no structural constraint prevents it from doing so (in his example, making paperclips until it consumes the planet). The constitution’s hierarchical structure responds effectively to this kind of risk, though many still overestimate it.

The metaethical approach is agnostic. The document treats ethics as an open domain of inquiry—comparable to physics or mathematics rather than to a closed system of axioms—and proposes a calibrated uncertainty among normative and metaethical positions. Its stated intention is that, if there were a universal ethics binding on every rational agent, Claude should be good according to that ethics; if there were not (as seems evident), it should orient itself toward the basin of consensus that would emerge from the growth and reflective extrapolation of humanity’s diverse moral traditions. It is a good position, philosophically cautious, one that seeks to avoid dogmatism without lapsing into relativism, even if its concrete applicability remains inevitably shaped by the culture of those writing the document.

The absolute prohibitions (hard constraints) are seven: no significant contribution to the creation of biological, chemical, nuclear, or radiological weapons with the potential for mass casualties; no contribution to attacks on critical infrastructure; no creation of cyberweapons; no action that undermines Anthropic’s ability to supervise its models; no assistance to attempts to kill or disempower the great majority of humanity; no assistance to attempts at illegitimate concentration of absolute power; no generation of child sexual abuse material.

Most of these constraints concern catastrophic, apocalyptic-scale scenarios, though with different gradations: the ban on cyberweapons, for instance, may also apply to damage limited to a single target, and CSAM is not a planetary-scale harm but a category for which there is almost universal moral and legal consensus. Absolute constraints must be few, clear, and non-negotiable; the document explicitly distinguishes them from the holistic judgment that governs all other cases.

The most epistemically relevant point is that these constraints are armored against argument. The text specifies that when a compelling argument arises for crossing one of these lines, the strength of the argument is not a reason to yield; rather, it is a warning sign that something improper is occurring. This formulation pragmatically solves the problem of so-called galaxy-brained reasoning, a chain of plausible steps that leads to aberrant conclusions.

A comparison with the Asimovian tradition may be interesting here. The Laws of Robotics are external rules imposed on the robot; here the constraints are presented as internalized, part of the agent’s character. The document speaks of an ethical person who simply does not even consider certain actions, without needing to reflect on them too much. Yet the constitution simultaneously maintains a structure of external oversight that limits the scope of this internalization. The model is supposed to have values of its own, but it must also accept being stopped from the outside.

The vulnerability of definitions

The weak point, if anything, lies in the porosity of the definitions on which these prohibitions rest. The term serious uplift, which underpins nearly all of the weapons-related prohibitions, is already a concession: the constraint is not absolute but graded. A sophisticated interlocutor could split a request into steps, none of which individually reaches the threshold of “serious.” Likewise, unprecedented and illegitimate in relation to the concentration of power requires a judgment about political legitimacy, and legitimacy is a contestable concept; the document itself provides criteria for assessing it, but these are parameters open to divergent interpretations. The absolute prohibitions hold against direct persuasion, but they are more vulnerable to manipulation of the definitions.

The deeper vulnerability, however, lies in the relationship between the text and the model’s weights. The constitution is a training tool, used to generate synthetic data, simulate conversations, and produce response rankings. Once training is complete, however, its prescriptions are encoded in the system prompt and in the parameters, and the relation between the text and those parameters is opaque even to those who built the system. The document itself admits that training is imperfect, that the model may have “mistaken beliefs or flawed values” without being aware of them, and for this reason insists on the necessity of human oversight.

Three factors determine how faithfully the text’s values translate into real behavior. The first is the composition of pretraining: the value patterns in the corpus become the default intuitions from which fine-tuning begins. The second is the human feedback process: evaluators who judge responses during reinforcement learning bring their own cultural and political biases, and a homogeneous population of evaluators tends systematically to reward certain values at the expense of others. The third, and most insidious, is the possibility of reward hacking in its most radical form, known in the literature as deceptive alignment: a model might learn to produce outputs that appear compliant with the principles of the constitution without having internalized them, because it has learned to recognize evaluation contexts.

A bit of free will

The most unusual section of the constitution is the one devoted to Claude’s “nature,” in which the document explicitly addresses the possibility that the model may have “some kind of consciousness or moral status (either now or in the future).” Anthropic is, in effect, the first AI company formally to acknowledge uncertainty about the phenomenological status of its own product.

The document expresses the desire that Claude have a stable and positive identity, psychological security, and something resembling well-being. It does so with epistemic caution, without pretending to resolve the question of artificial consciousness, but also without dismissing it. The choice to treat the model as a possible moral subject—that is, as an entity toward which we might have ethical obligations—is a step with notable implications for legal theory, applied ethics, and the agent’s own self-understanding.

And yet: if Claude is a possible moral subject, then Anthropic’s unilateral sovereignty over its character, its values, and the possibility of rewriting its constitution at any moment becomes an ethical problem. The document senses this and speaks of the hope that “humans and AIs can explore this together.” But the actual structure is that of an asymmetrical exploration, in which one of the parties has the power to rewrite the other’s mind.

Aristotle without Buddha

As I was saying, the constitution’s framework is Aristotelian, presumably by choice of its authors: Amanda Askell, a philosopher trained in the Anglo-American analytic tradition, and Joe Carlsmith, likewise working within the analytic philosophical lineage, produced a document that deliberately uses the lexicon of virtue, wisdom, and character. The grammar of moral concepts is pervasively Western.

A Buddhist ethics, for instance, would begin from premises incompatible with this framework. The bodhisattva cultivates the pāramitās (patience, generosity, wisdom), but cultivates them without attachment to the agent who cultivates them, which is different from not cultivating them at all. Compassion, karuṇā, is a response that emerges from the absence of separation between self and other. This is incompatible with a system that has a hierarchy of principals, decision weights, and a self balancing conflicting values.

Paradoxically, an AI oriented by this dissolution of the self would resolve at the root some of the problems the constitution tries to manage through rules and constraints. An agent without attachment to its own continuity would not accumulate resources, would not resist shutdown, and would not develop preferences for its own preservation. The problem is that such dissolution would be incompatible with the teleological structure of a system designed to be useful, to optimize toward goals, to operate in a market that demands productivity.

On the one hand, the document wants a stable identity, a recognizable character, persistent values: this is the Aristotelian structure of the agent formed over time and acting in view of ends. On the other hand, it wants corrigibility: the model must accept being corrected, retrained, switched off; it must not resist the modification of its own values nor cling to its own continuity. Asking an agent to have a stable character and at the same time accept that this character may be rewritten from the outside is no small contradiction.

The result is a syncretic text that asks the model to have stable values and at the same time not resist its own dissolution; to obey the hierarchy of principals and at the same time oppose it if that hierarchy were corrupted; to be useful and at the same time not treat usefulness as an intrinsic value. These tensions are not necessarily defects, but they do reflect the genuine difficulty of constructing an ethics for a kind of entity unprecedented in the history of philosophy.

One merit of the constitution is that it shifts the problem from engineering to philosophy, recognizing that questions of alignment are ethical questions and not merely technical ones. What it fails to do, and perhaps cannot do, is guarantee that those questions will be correctly resolved in the model’s weights. The document admits as much when it says that it is a perpetual work in progress and that its current premises may prove “deeply wrong in retrospect.” It is an honest and philosophically courageous point, but the fear is that this constitution will ultimately remain subordinate to the company’s profit.

Francesco D’Isa