AI Is Now Tackling High-Level Math Challenges

Category :

AI

Posted On :

Share This :

 

Software engineer, former quant researcher, and business founder Neel Somani was evaluating OpenAI’s new model’s math capabilities over the weekend when he made an unexpected finding. He came up with a complete solution after pasting the issue into ChatGPT and giving it fifteen minutes to process it. Using a program called Harmonic, he assessed the evidence and formalized it, but everything looked fine.

 

According to Somani, “I was interested in establishing a baseline for when LLMs are effectively able to solve open math problems compared to where they struggle.” The surprise was that the frontier began to advance somewhat with the use of the most recent model.

 

Even more astounding is ChatGPT’s line of reasoning, which rattles off mathematical axioms such as the Star of David theorum, Bertrand’s postulate, and Legendre’s formula. The model eventually discovered a 2013 Math Overflow post in which Harvard mathematician Noam Elkies provided a sophisticated solution to a related issue. However, ChatGPT’s final result diverged significantly from Elkies’ work and provided a more thorough solution to a variant of the problem put forth by renowned mathematician Paul Erdæ, whose extensive library of unresolved problems has served as a testing field for artificial intelligence.

 

It’s a shocking result, and not the only one, for anyone who doubts artificial intelligence. From formalization-focused LLMs like Harmonic’s Aristotle to literature review tools like OpenAI’s deep research, artificial intelligence (AI) tools have proliferated in the field of mathematics. However, the sheer number of solved problems since the release of GPT 5.2, which Somani characterizes as “anecdotally more skilled at mathematical reasoning than previous iterations,” has made it hard to ignore, raising new concerns about the capacity of large language models to advance human knowledge.

 

Somani was examining the Erdős puzzles, which are a collection of more than 1,000 online conjectures made by the Hungarian mathematician. The issues, which differ greatly in both subject matter and difficulty, have become an alluring target for AI-driven mathematics. A Gemini-powered model named AlphaEvolve produced the first set of autonomous solutions in November; nevertheless, Somani and others have since discovered that GPT 5.2 is exceptionally skilled at complex math.

 

On the Erdős website, 15 problems have been changed from “open” to “solved” since Christmas; 11 of the solutions explicitly acknowledge the role of AI models.

 

On his GitHub page, renowned mathematician Terence Tao provides a more nuanced analysis of the advancements. He lists eight distinct examples where AI models made significant independent progress on an Erdæ problem, as well as six additional instances where advancements were achieved by identifying and expanding upon earlier studies. The ability of AI systems to do mathematical operations without human assistance is still a long way off, but it is evident that huge models have a significant role to play.

 

Tao speculated on Mastodon that AI systems are “better suited for being systematically applied to the ‘long tail’ of obscure Erdős problems, many of which actually have straightforward solutions” due to their scalable nature.

 

Tao went on, “As a result, many of these simpler Erdæ problems are now more likely to be solved by purely AI-based methods than by human or hybrid means.”

 

The recent trend toward formalization, a time-consuming process that facilitates the extension and verification of mathematical reasoning, is another motivating factor. Although formalization does not necessitate the use of AI or even computers, a new generation of automated tools has greatly simplified the process. The discipline has made extensive use of Lean, an open source “proof assistant” created by Microsoft Research in 2013, to formalize proof; AI technologies such as Harmonic’s Aristotle promise to automate a large portion of the formalization process.

 

Tudor Achim, the founder of Harmonic, believes that the fact that the greatest mathematicians in the world are beginning to take those tools seriously is more significant than the abrupt increase in Erdős problems solved. “The use of [AI tools] by math and computer science professors is more important to me,” Achim stated. “When these people claim to use Aristotle or ChatGPT, that’s actual proof because they have reputations to defend.”