
Models from OpenAI and DeepMind achieved gold medal scores in the International Mathematical Olympiad.Credit: MoiraM/Alamy
Google DeepMind announced on 21 July that its software had cracked a set of maths problems at the level of the world’s top high-school students, achieving a gold-medal score on questions from the International Mathematical Olympiad. At first sight, this marked only a marginal improvement over the prevous year’s performance. The company’s system had performed in the upper range of silver medal standard at the 2024 Olympiad, while this year it was evaluated in the lower range for a human gold medallist.
DeepMind AI crushes tough maths problems on par with top human solvers
But the grades this year hide a “big paradigm shift,” says Thang Luong, a computer scientist at DeepMind in Mountain View, California. The company achieved its previous feats using two artificial intelligence (AI) tools specifically designed to carry out rigorous logical steps in mathematical proofscalculations, called AlphaGeometry and AlphaProof. The process required human experts to first translate the problems’ statements into something similar to a programming language, and then to translate the AI’s solutions back into English.
“This year, everything is natural language, end to end,” says Luong. The team employed a large language model (LLM) called DeepThink, which is based on its Gemini system but with some additional developments that made it better and faster at producing mathematical arguments, such as handling multiple chains of thought in parallel. “For a long time, I didn’t think we could go that far with LLMs,” Luong adds.
DeepThink scored 35 out of 42 points on the 6 problems that had been given to participants in this year’s Olympiad. Under an agreement with the organizers, the computer’s solutions were marked by the same judges who evaluated the human participants.
Separately, ChatGPT creator OpenAI, based in San Francisco, California, had its own LLM solve the same Mathematical Olympiad problems at gold medal level, but had its solutions evaluated independently.
Impressive performance
For years, many AI researchers have fallen in one of two camps. Until 2012, the leading approach for was to code the rules of logical thinking into the machine by hand. Since then, neural networks — which train automatically by learning from vast troves of data — have made a series of sensational breakthroughs, and tools such as OpenAI’s ChatGPT have now entered mainstream use.
DeepMind AI solves geometry problems at star-student level
Gary Marcus, a neuroscientist at New York University (NYU) in New York City, called the results by DeepMind and OpenAI “Awfully impressive.” Marcus is an advocate of the ‘coding logic by hand’ approach — also known as neurosymbolic AI — and a frequent critic of what he sees as hype surrounding LLMs. Still, writing on Substack with NYU computer scientist Ernest Davis, he commented that “to be able to solve math problems at the level of the top 67 high school students in the world is to have really good math problem solving chops”.
It remains to be seen whether LLM superiority on IMO problems is here to stay, or if neurosymbolic AI will claw its way back to the top. “At this point the two camps still keep developing,” says Luong, who works on both approaches. “They could converge together.”