AI

Google DeepMind’s AlphaProof Nexus Cracks Open Decades-Old Maths Problems

AlphaProof Nexus is significant not because it “does math,” but because it produces formally verified proofs that can be trusted with compiler-level certainty. If this approach continues to scale, it could fundamentally change scientific research by turning AI into a reliable engine for exploring and validating new ideas rather than simply generating plausible explanations.

By I F · 5 min read

TL;DR

Google DeepMind’s AlphaProof Nexus uses AI plus Lean 4 verification to produce machine-checked mathematical proofs.
The system reportedly solved 9 open Erdős problems and proved 44 OEIS conjectures.
Two of the Erdős problems had remained unsolved for 56 years.
Unlike standard AI outputs, every proof was formally verified to reduce hallucinations and logical errors.
The breakthrough points toward AI becoming a scalable research tool for mathematics and other formal sciences.

Executive Summary

Google DeepMind has announced AlphaProof Nexus, an AI system that can search for and verify mathematical proofs in Lean 4 with machine-checked certainty. In its latest evaluation, the system reportedly solved 9 of 353 open Erdős problems — including two that had remained open for 56 years — and proved 44 of 492 OEIS conjectures. The headline is not just that the model found answers, but that it produced proofs a compiler could verify, eliminating the usual risk of AI-generated mathematical hallucination.

The result matters because it points to a new model of scientific discovery: not a chatbot guessing at answers, but an AI system that explores huge proof spaces, tests ideas against formal logic, and iterates until the compiler accepts the result.

What Happened

The work, published on arXiv in late May 2026, describes an agentic framework that combines frontier language models with formal proof checking. DeepMind applied the system to a large benchmark of open problems, then released the proofs and supporting material publicly.

Among the claims:

9 Erdős problems solved out of 353 attempted
44 OEIS conjectures proved out of 492 attempted
Breakthroughs spanning combinatorics, algebraic geometry, optimisation, graph theory, and quantum optics
Proofs checked in Lean 4, meaning every successful result was mechanically verified

That last point is crucial. In maths, a plausible explanation is not enough; the logic must hold step by step. AlphaProof Nexus is designed precisely for that environment.

Why This Matters

AI has been impressive at writing explanations of mathematics, but explanations are not the same as proofs. One incorrect lemma can invalidate an entire argument.

AlphaProof Nexus matters because it suggests a route around that weakness:

Generate candidate proof ideas.
Test them inside a formal system.
Use compiler feedback to correct the next attempt.
Repeat until the proof type-checks.

If the approach scales, it could change how mathematicians explore open questions. Instead of spending weeks checking dead ends manually, researchers could use AI to sweep through a wider search space and concentrate human attention on the most promising routes.

How AlphaProof Nexus Works

At a high level, the system couples large language model reasoning with formal verification. The model proposes proof steps, but Lean 4 is the final judge.

The paper describes multiple agent configurations, ranging from a basic loop to more advanced multi-agent search. The core pattern is the same:

Start with a theorem statement and a partial proof
Ask the model to propose changes
Run the proof through the Lean compiler
Feed any errors back into the next iteration
Continue until the proof is accepted or the search is exhausted

This is a powerful design choice. It removes the need to trust the model’s prose. Instead, trust is delegated to the proof assistant.

The Results

The best-known headline is the set of 9 Erdős problems that were solved. The list includes long-standing questions from the 1970s and 1990s, including two problems open for 56 years.

The broader benchmark also showed results beyond Erdős:

44 OEIS conjectures proved
Progress on algebraic geometry questions involving Hilbert functions
An improved convergence result in optimisation theory
Contributions in additive combinatorics
Collaboration on problems in quantum optics and graph theory

What makes this notable is not just the raw count, but the breadth. The system is not limited to one narrow puzzle class; it is being used as a general proof-search engine across several mathematical domains.

Costs and Performance

One of the most important practical findings is cost.

The report suggests that individual successful attempts can be achieved for roughly a few hundred dollars per problem in inference and compute, with some estimates putting the AlphaProof portion around $60 per problem in TPU time.

That is still expensive in absolute terms, but it is cheap relative to the labour cost of sustained expert research effort — especially when multiplied across hundreds of candidate problems.

There is also an interesting systems insight: the basic agent reportedly solved all 9 problems that the more advanced system solved. The richer multi-agent machinery helped on some harder cases and reduced costs in those cases, but the result suggests a future in which simpler proof loops, paired with stronger foundation models, may become enough for many tasks.

Competitive Context

The announcement arrives in the middle of a broader race in AI mathematics.

OpenAI has also claimed progress on hard mathematical problems, but the strategic difference is clear:

OpenAI-style approach: more general reasoning, often natural-language-first, still requiring human validation
DeepMind-style approach: formal proof search, with compiler-level verification built in

Both approaches are significant. The open question is which will become the more scalable research workflow.

DeepMind’s approach has the advantage of certainty: if the proof compiles, it is correct within the formal system. That makes it especially attractive for domains where reliability matters more than fluent explanation.

Limitations

This is an important milestone, but it is not a universal solver.

Current limits include:

Success concentrated in domains where formal libraries are mature
Heavy dependence on the quality and completeness of the underlying theorem corpus
Large variance in compute cost across problems
Many failures still involve dead ends, circular lemmas, or overconfident reasoning paths
Most open problems remain unsolved

In other words, this is a major step forward — not a finish line.

The Bigger Picture

The real significance of AlphaProof Nexus may be less about any single theorem and more about the workflow it introduces.

If AI can reliably generate formally verified reasoning, the role of the machine shifts from “assistant that explains things” to “research engine that explores and checks possibilities.” That is a much more useful capability in mathematics, physics, and any domain that can be formalised.

It also changes the economics of curiosity. Problems that once required years of careful manual work may now be attacked at scale, with human experts focusing on interpretation, strategy, and framing rather than brute-force proof checking.

That said, the system still depends on human-defined problem selection and human judgment about significance. The machine can help find a proof; it cannot yet decide which ideas matter most for mathematics as a field.

Sources

DeepMind research preprint on AI-driven formal proof search, arXiv (May 2026)
Google DeepMind results repository on GitHub
Reporting on the AlphaProof Nexus benchmark and related Erdős problem breakthroughs
Public tracking pages and community discussion on AI contributions to Erdős problems