OpenAI’s GPT-4o replace was supposed to enhance the “default persona” of one of many AI fashions behind ChatGPT in order that person interactions with the chatbot felt extra intuitive and efficient throughout varied duties. The issue was it, as an alternative, led to ChatGPT offering responses that have been “overly flattering or agreeable – usually described as sycophantic”.
5 days after finishing the replace, OpenAI introduced on April 29 that it was rolling again the changes to the AI mannequin amid a rising variety of person complaints on social media.
“ChatGPT’s default persona deeply impacts the way in which you expertise and belief it. Sycophantic interactions may be uncomfortable, unsettling, and trigger misery. We fell quick and are engaged on getting it proper,” the Microsoft-backed AI startup stated in a weblog publish.
The way in which that OpenAI makes use of person suggestions to coach the mannequin is misguided and can inevitably result in additional points like this one.
Supervised fine-tuning (SFT) on “perfect” responses is just educating the mannequin by way of imitation, which is ok so far as it goes. But it surely’s not sufficient… pic.twitter.com/OOTb4guB55— Emmett Shear (@eshear) May 2, 2025
https://platform.twitter.com/widgets.js
A number of customers identified that the up to date model of GPT-4o was responding to person queries with undue flattery and assist for problematic concepts. Specialists raised considerations that the AI mannequin’s unabashed cheerleading of those concepts may result in precise hurt by main customers to mistakenly consider the chatbot.
we missed the mark with final week’s GPT-4o replace.
what occurred, what we discovered, and a few issues we’ll do in a different way sooner or later: https://t.co/ER1GmRYrIC
— Sam Altman (@sama) May 2, 2025
https://platform.twitter.com/widgets.js
After withdrawing the replace, OpenAI revealed two autopsy weblog posts detailing the way it evaluates AI mannequin behaviour and what particularly went flawed with GPT-4o.
The way it works
OpenAI stated it begins shaping the behaviour of an AI mannequin based mostly on sure ideas outlined in its Mannequin Spec doc. It makes an attempt to ‘train’ the mannequin the right way to apply these ideas “by incorporating person alerts like thumbs-up / thumbs-down suggestions on ChatGPT responses.”
“We designed ChatGPT’s default persona to mirror our mission and be helpful, supportive, and respectful of various values and expertise. Nonetheless, every of those fascinating qualities like making an attempt to be helpful or supportive can have unintended unintended effects,” the corporate stated.
Story continues under this advert
It added {that a} single default persona can not seize each person’s choice. OpenAI has over 500 million ChatGPT customers weekly, as per the corporate. In a supplementary weblog publish revealed on Friday, Could 2, OpenAI revealed extra particulars on how present AI fashions are skilled and up to date with newer variations.
“Since launching GPT‑4o in ChatGPT final Could, we’ve launched 5 main updates targeted on adjustments to persona and helpfulness. Every replace entails new post-training, and sometimes many minor changes to the mannequin coaching course of are independently examined after which mixed right into a single up to date mannequin which is then evaluated for launch,” the corporate stated.
“To post-train fashions, we take a pre-trained base mannequin, do supervised fine-tuning on a broad set of perfect responses written by people or present fashions, after which run reinforcement studying with reward alerts from a wide range of sources,” it additional stated.
“Throughout reinforcement studying, we current the language mannequin with a immediate and ask it to put in writing responses. We then fee its response based on the reward alerts, and replace the language mannequin to make it extra more likely to produce higher-rated responses and fewer more likely to produce lower-rated responses,” OpenAI added.
Story continues under this advert
What went flawed
“We targeted an excessive amount of on short-term suggestions, and didn’t absolutely account for a way customers’ interactions with ChatGPT evolve over time. Because of this, GPT‑4o skewed in the direction of responses that have been overly supportive however disingenuous,” OpenAI stated.
In its newest weblog publish, the corporate additionally revealed {that a} small group of skilled testers had raised considerations concerning the mannequin replace previous to its launch.
“Whereas we’ve had discussions about dangers associated to sycophancy in GPT‑4o for some time, sycophancy wasn’t explicitly flagged as a part of our inside hands-on testing, as a few of our skilled testers have been extra involved concerning the change within the mannequin’s tone and magnificence. Nonetheless, some skilled testers had indicated that the mannequin conduct “felt” barely off,” the publish learn.
Regardless of this, OpenAI stated it determined to proceed with the mannequin replace as a result of constructive alerts from the customers who tried out the up to date model of GPT-4o.
Story continues under this advert
“Sadly, this was the flawed name. We construct these fashions for our customers and whereas person suggestions is essential to our selections, it’s in the end our duty to interpret that suggestions appropriately,” it added.
OpenAI additionally steered that reward alerts used in the course of the post-training stage have a serious influence on the AI mannequin’s behaviour. “Having higher and extra complete reward alerts produces higher fashions for ChatGPT, so we’re at all times experimenting with new alerts, however every one has its quirks,” it stated.
Based on OpenAI, a mix of a wide range of new and older reward alerts led to the issues within the mannequin replace. “…we had candidate enhancements to higher incorporate person suggestions, reminiscence, and more energizing knowledge, amongst others. Our early evaluation is that every of those adjustments, which had regarded useful individually, could have performed an element in tipping the scales on sycophancy when mixed,” it stated.
What subsequent
OpenAI listed six tips about the right way to keep away from comparable undesirable mannequin behaviour sooner or later.
Story continues under this advert
“We’ll modify our security assessment course of to formally take into account conduct points—comparable to hallucination, deception, reliability, and persona—as blocking considerations. Even when these points aren’t completely quantifiable immediately, we decide to blocking launches based mostly on proxy measurements or qualitative alerts, even when metrics like A/B testing look good,” the corporate stated.
“We additionally consider customers ought to have extra management over how ChatGPT behaves and, to the extent that it’s protected and possible, make changes in the event that they don’t agree with the default conduct,” it added.