
AlphaGeometry2: A Deep Dive into a Gold-Medalist AI Geometry Solver
Last Updated on February 20, 2025 by Editorial Team
Author(s): Jesus Rodriguez
Originally published on Towards AI.
I recently started an AI-focused educational newsletter, that already has over 175,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:
TheSequence | Jesus Rodriguez | Substack
The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and dataβ¦
thesequence.substack.com
DeepMindβs journey toward mathematical AI dominance took a major leap last year when AlphaProof and AlphaGeometry nearly clinched gold at the International Math Olympiad (IMO). Now, with the latest upgrade, AlphaGeometry2 (AG2) has officially surpassed top human competitors in geometry, marking a milestone in AI-driven mathematical reasoning. The general consensus among IMO competitors is that geometry problems are among the toughest in each day of the Olympiad.
AG2 represents a significant advancement in AI-driven mathematical reasoning, particularly in solving Olympiad geometry problems. Building on its predecessor, AlphaGeometry, AG2 surpasses the performance of an average gold medalist in the International Mathematical Olympiad (IMO). This essay provides a technical overview of AG2βs architecture, key improvements, and broader contributions to AI.
so we have an idea of the complexiity of the geometry problems in IMO, look at the example below from the paper:
Core Architecture: A Neuro-Symbolic Hybrid
AG2 employs a neuro-symbolic approach that combines neural networks and symbolic engines. Its architecture consists of three primary components:
- Language Model (LM): Based on the Gemini architecture, the LM interprets problem statements, generates auxiliary constructions, and proposes proof steps.
- Symbolic Engine (DDAR): The Deductive Database Arithmetic Reasoning (DDAR) engine verifies proof steps using predefined rules and axioms.
- Search Algorithm: The Shared Knowledge Ensemble of Search Trees (SKEST) algorithm runs multiple beam searches in parallel, sharing knowledge to improve the search process.
Key Improvements and Contributions
Expanded Domain Language
AG2 extends the original domain language of AlphaGeometry, enhancing its coverage of IMO geometry problems from 66% to 88%. Key additions include:
- Find x predicates: acompute and rcompute predicates solve for specific angles or ratios.
- Linear equations: distmeq, distseq, and angeq predicates express linear equations involving geometric quantities.
- Locus problems: The * token represents fixed-point placeholders.
- Diagram checks: sameclock, noverlap, and lessthan predicates ensure valid deductions.
- Non-constructive problems: Allows points defined by three or more predicates.
Stronger and Faster Symbolic Engine
AG2βs DDAR engine incorporates several optimizations:
- Double points: Handles points with identical coordinates but different names.
- Faster algorithm: The DDAR2 algorithm uses hard-coded searches and hashing techniques for efficiency.
- Faster implementation: Core computations use C++, achieving a 300x speed improvement.
Enhanced Language Model
The LM, based on the Gemini architecture, is trained on a diverse dataset of 300 million synthetic theorems. Improvements include:
- Larger and more complex diagrams
- Theorems with greater complexity and longer proofs
- More balanced data distribution across problem types
- New βlocusβ type theorems
During inference, AG2 uses top-k sampling with a temperature of 1.0 and k = 32, generating diverse auxiliary construction proposals.
Novel Search Algorithm
SKEST runs multiple beam searches in parallel with different configurations, sharing knowledge through a central database. Search tree variations include:
- Classic search tree (similar to AG1)
- Trees predicting multiple auxiliary points
- Deep, narrow trees
- Shallow, wide trees
This approach accelerates the search process and improves overall performance.
Results and Performance
AG2 achieves an 84% solve rate on IMO geometry problems (2000β2024), surpassing the average gold medalist. It also solves 20 of the 30 hardest IMO shortlist problems.
Ablation Studies
- Increasing model size reduces perplexity loss.
- High temperature and multiple samples are essential for success.
- Optimal search tree configuration: beam size 128, depth 4, and 32 samples.
Broader Impact and Future Directions
AG2βs success highlights the potential of neuro-symbolic approaches in complex reasoning tasks. Future research may explore:
- Expanding domain language for more advanced geometric concepts
- Using reinforcement learning for subproblem decomposition
- Automating geometry problem-solving with natural language input
Is Multi-modality the Future?
Multi-modal training using Gemini 1.5 (combining text and diagrams) did not improve solve rates, likely due to the complexity of IMO diagrams and limitations in image tokenization. However, combining multi-modal models with other frameworks still enhances overall performance.
Creative Solutions and Human-Like Reasoning
AG2βs solutions often exhibit superhuman creativity, discovering unconventional auxiliary constructions and elegant solutions. Examples include the IMO 2024 P4 problem.
Tokenizers and Domain-Specific Languages
Performance remains consistent across different tokenizers. Translating AG2βs data into natural language yields comparable results, suggesting modern LLM tokenizers are flexible enough for mathematical tasks.
Generating Full Proofs with LMs
While AG2 relies on the symbolic engine for proof verification, its LM can generate partial solutions independently, hinting at future LMs becoming more self-sufficient in mathematical reasoning.
Conclusion
AlphaGeometry2 marks a significant milestone in AI-driven mathematical reasoning. Its neuro-symbolic architecture, enhanced domain language, optimized symbolic engine, and novel search algorithm collectively enable gold-medalist performance in solving IMO geometry problems. As research continues, AIβs capabilities in mathematics and other domains are expected to grow, unlocking new possibilities for both AI and human collaboration.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI