Net of the usual apocalyptic announcements and messianic promises that now characterize big tech’s language, GPT-5 is neither an atomic bomb nor a revolutionary leap. That said, technical data and practical testing do indicate concrete progress.
In public evaluations like LLM Arena, where shortly after its release it reached first place (now surpassed by Gemini 2.5 pro), GPT-5 proves more consistent, reliable, and capable of maintaining the logical thread of complex conversations. OpenAI, in its System Card, reports significant reductions in hallucinations (?26% compared to GPT-4o in the main version, ?65% in the “thinking” version) and improvements in handling ambiguous instructions thanks to its safe-completions strategy. The tendency toward sycophancy also drops sharply: ?69% for free users and ?75% for paid ones. Apparently, paying customers no longer need to be flattered. One area where the difference is immediately noticeable is coding: GPT-5 can autonomously generate working scripts and small websites, reducing the need for manual intervention.
In English text generation, quality improves; in Italian, by contrast, there is a noticeable decline in register and nuance. The linguistic performance of an LLM is not purely technical: it always reflects, to some extent, the cultural and geographical distribution of the data on which it was trained (Kazemi et al., 2024). Recent studies show that GPT models tend to incorporate cultural values and patterns typical of Anglophone and Protestant European countries, even when operating in other languages (Vimalendiran, 2024). In the case of GPT-4o, responses to questions drawn from the World Values Survey align more closely with the average values of Finland, Andorra, and the Netherlands than with those of African or Middle Eastern countries. This convergence is not accidental: it may stem both from the dominance of English in training data and from the implicit bias introduced during alignment phases carried out by teams and annotators who are predominantly U.S.-based. Italian, although relatively well represented as a European language, can still suffer cultural interference of Anglophone origin in contexts less present in Italian data.

Two further innovations change the usage landscape. First: GPT-5 is available for free via ChatGPT with auto-routing, removing the qualitative bottleneck for those who until now only used mediocre models. In practice, a free ChatGPT user receives a response generated by GPT-5 or by other intermediate models, without explicit disclosure. This choice has two immediate effects: on the one hand, it democratizes access to more advanced capabilities, since many users who previously experienced only mediocre models now interact by default with top-tier systems; on the other hand, it reduces transparency about the actual source of responses, making it harder to evaluate the performance and limits of each model.
Second: via API, input costs about half as much as GPT-4o, although output retains the previous price. This is an incentive for developers and companies, who can integrate the model at reduced operating costs. In my very first analysis, the cost reduction also made me think of greater energy efficiency. But a study cited by The Guardian (August 9, 2025) and data from the System Card suggest the opposite: GPT-5’s estimated consumption would be higher than GPT-4o’s. Unfortunately, without official figures, these remain only inferences and projections. If confirmed, OpenAI’s move seems risky: for an advance that is not immense, consuming more than forty times as much makes the model economically even more unsustainable – quite apart from the obvious climate impact.
This information is difficult to interpret: Google recently released a document on the consumption of their flagship model, similar to performance, and consumption is decidedly low. Each text request sent to Gemini consumes an average of 0.24 watt-hours of energy, generates approximately 0.03 grams of CO? equivalent, and uses 0.26 millilitres of water, the equivalent of five small drops. To give you an idea, the impact is comparable to watching television for about nine seconds. Either Google has a huge advantage over OpenAi, or something in the estimated calculations does not add up.

A curious phenomenon that has emerged since GPT-5’s release is the request, by a significant share of users, to go back to using GPT-4o – not for reasons of power or accuracy, but for its “character.” In many people’s perception, each model exhibits a sort of implicit personality, shaped by training and reinforcement choices, which comes through in the tone adopted in its outputs. For those who, like me, regard AI as a tool to be used functionally without anthropomorphizing it, such nuances are more background noise than added value. But playful or friendly use – encouraged also by the conversational design of these interfaces – is not negative in itself: it can make interaction more pleasant, entertain, or even provide support. The real issue is the lack of choice: there is no character selector for the model, while several studies show that perceptions of an assistant’s personality influence satisfaction, trust, and sense of affinity.
Research in Human-Computer Interaction and social psychology had documented long before the rise of generative AI that users tend to project human traits onto computers, an effect known as “computers are social actors” (Reeves & Nass, 1996). A more recent study (Perceptions of Warmth and Competence in AI Agents, 2022) further suggests that in human-AI interactions, perceptions of warmth and competence decisively shape user preference for an agent – far more than objective performance metrics. In other words, the tone and perceived personality of an AI system can determine trust and satisfaction regardless of the informational quality of its content. It is therefore understandable that, for some, GPT-4o’s “voice” was more appealing than GPT-5’s, and that the forced transition generates a sense of loss. It is a bit like suddenly changing the character of a friend and confidant.
In short: higher average quality, greater accessibility, lower entry costs, improvements in key functions. But also no epochal leap, (perhaps) higher consumption, and cultural and linguistic limitations. In the unpredictable development of these machines, we have probably reached a plateau where the existing offering is being optimized from a commercial perspective, while waiting for the next technological black swan.
Francesco D’Isa, trained as a philosopher and digital artist, has exhibited his works internationally in galleries and contemporary art centers. He debuted with the graphic novel I. (Nottetempo, 2011) and has since published essays and novels with renowned publishers such as Hoepli, effequ, Tunu – , and Newton Compton. His notable works include the novel La Stanza di Therese (Tunu – , 2017) and the philosophical essay L’assurda evidenza (Edizioni Tlon, 2022). Most recently, he released the graphic novel “Sunyata” with Eris Edizioni in 2023. Francesco serves as the editorial director for the cultural magazine L’Indiscreto and contributes writings and illustrations to various magazines, both in Italy and abroad. He teaches Philosophy at the Lorenzo de’ Medici Institute (Florence) and Illustration and Contemporary Plastic Techniques at LABA (Brescia).?