The worth of AI for many customers immediately lies in its means to generate coherent, conversational language by making use of likelihood idea to large datasets. Nonetheless, a future the place AI fashions drive advances in fields like cryptography and house exploration by fixing advanced, multi-step mathematical issues, is now one step nearer to actuality.
OpenAI on Saturday, July 19, introduced that its experimental AI reasoning mannequin earned sufficient factors on this 12 months’s Worldwide Math Olympiad (IMO) to win a gold medal.
Began in 1959 in Romania, the IMO is broadly thought of to be one of many hardest, most prestigious math competitions on this planet for high-school college students. It’s held over two days. Individuals of the Olympiad take two exams, the place they’re anticipated to unravel three math issues in every session inside four-and-a-half hours.
OpenAI’s unreleased AI mannequin took the IMO 2025 underneath these identical situations with no entry to the web or exterior instruments. It learn the official math drawback statements and generated pure language proofs. The mannequin solved 5 out of a complete of six issues, attaining a gold medal-worthy rating of 35/42, in keeping with Alexander Wei, a member of OpenAI’s technical employees.
“This underscores how briskly AI has superior lately. In 2021, my PhD advisor @JacobSteinhardthad me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everybody else was too optimistic). As a substitute, we have now IMO gold,” Wei wrote in a publish on X.
This isn’t the primary time an organization has claimed that its AI mannequin can match the efficiency of IMO gold medallists. Earlier this 12 months, Google DeepMind launched AlphaGeometry 2, a mannequin particularly designed to unravel advanced geometry issues at a degree akin to a human Olympiad gold medallist.
Nonetheless, the efficiency of OpenAI’s experimental mannequin is seen as a step ahead for normal intelligence, not simply task-specific AI programs. “We attain this functionality degree not by way of slender, task-specific methodology, however by breaking new floor in general-purpose reinforcement studying and test-time compute scaling,” Wei stated.
1/N I’m excited to share that our newest @OpenAI experimental reasoning LLM has achieved a longstanding grand problem in AI: gold medal-level efficiency on the world’s most prestigious math competitors—the Worldwide Math Olympiad (IMO). pic.twitter.com/SG3k6EknaC
— Alexander Wei (@alexwei_) July 19, 2025
https://platform.twitter.com/widgets.js
Story continues beneath this advert
The mannequin’s success marks progress past conventional reinforcement studying (RL), which is a course of used to coach AI fashions by a system of clear, verifiable rewards and penalties. As a substitute, the mannequin probably demonstrates extra versatile, normal problem-solving skills because it “can craft intricate, watertight arguments on the degree of human mathematicians.”
Wei additionally acknowledged that “IMO submissions are hard-to-verify, multi-page proofs.” Math proofs are made up of smaller, minor theorems known as lemmas. OpenAI stated that the AI-generated proofs to the issues had been independently graded by three former IMO medalists, who finalised the mannequin’s rating unanimously.
Scorching tackle OpenAI’s IMO gold
What does it imply? I don’t know (but).
The truth that no instruments, coding or web was used is genuinely spectacular.
That stated, my total impression is that OpenAI has informed us the consequence, however not the way it was achieved.
That leaves me with many…
— Gary Marcus (@GaryMarcus) July 19, 2025
https://platform.twitter.com/widgets.js
Nonetheless, Gary Marcus, a professor at New York College (NYU) and well-known critic of OpenAI, identified that the outcomes haven’t but been independently verified by the organisers of the IMO.
OpenAI’s claims additionally come months after the US Protection Superior Analysis Initiatives Company DARPA launched a brand new initiative that appears to enlist researchers to search out methods to conduct high-level arithmetic analysis with an AI “co-author.” Prior to now, DARPA was answerable for driving analysis that led to the creation of ARPANET, the precursor to the web.
Story continues beneath this advert
An AI mannequin that might reliably verify proofs would save monumental quantities of time for mathematicians and assist them be extra artistic. Whereas a few of these fashions may appear geared up to unravel advanced issues, they may be liable to stumbling on easy questions like whether or not 9.11 is larger than 9.9. Therefore, they’re stated to have ‘jagged intelligence’, which is a time period coined by OpenAI co-founder Andrej Karpathy.
Reacting to the mannequin’s gold medal-worthy IMO rating, OpenAI CEO Sam Altman stated, “That is an LLM doing math and never a selected formal math system; it’s a part of our primary push in the direction of normal intelligence.”
we achieved gold medal degree efficiency on the 2025 IMO competitors with a general-purpose reasoning system! to emphasise, that is an LLM doing math and never a selected formal math system; it’s a part of our primary push in the direction of normal intelligence.
once we first began openai,… https://t.co/X46rspI4l6
— Sam Altman (@sama) July 19, 2025
https://platform.twitter.com/widgets.js
Nonetheless, the ChatGPT-maker doesn’t plan on releasing the experimental analysis mannequin not less than for the subsequent a number of months regardless of its math capabilities.

