DeepMind's Back-to-Back Reveals: From Aletheia to Gemini 3.1 Preview

DeepMind's Back-to-Back Reveals: From Aletheia to Gemini 3.1 Preview

Google DeepMind unveiled Aletheia, a math research agent that solved 13 unsolved Erdos problems. With Gemini 3 dominating benchmarks and a 3.1 Preview spotted in the wild, here's the full DeepMind February roundup.

On February 11, Google DeepMind unveiled Aletheia, a dedicated AI agent for mathematical research. Named after the ancient Greek word for 'truth,' the agent immediately drew attention by producing meaningful solutions to 13 of the roughly 700 unsolved problems left behind by Paul Erdos. The same day brought news of Gemini 3 sweeping major benchmarks, plus the surprise appearance of a Gemini 3.1 Preview — before 3.0 even reached general availability.

1. Aletheia: How AI Tackles Unsolved Math

Google DeepMind Aletheia AI math research agent architecture diagram Generator Verifier Reviser workflow
Aletheia's Generator-Verifier-Reviser iterative architecture

Aletheia is a custom math research agent built on top of Gemini's Deep Think mode. Rather than simply outputting answers, it works through a three-stage iterative process. First, the Generator produces candidate solutions in bulk. Then the Verifier uses natural-language-based reasoning to identify errors in each one. Survivors are refined by the Reviser into complete proofs. This cycle repeats dozens of times per problem, and the key feature is its ability to admit failure — when it can't solve a problem, it honestly reports that it couldn't.

Because Aletheia leverages Google Search and web browsing to verify existing literature directly, the hallucination problem — where AI cites nonexistent papers — is significantly reduced. DeepMind published two papers alongside the announcement to provide technical validation.

2. 13 Erdos Problems: What the Results Actually Mean

DeepMind fed Aletheia approximately 700 unsolved Erdos problems. The agent claimed solutions to 200 of them, and after review by mathematicians, 13 were recognized as meaningful results. Five of these were entirely new autonomous solutions — proofs the AI completed from start to finish without human assistance. The remaining eight were cases where Aletheia located previously published solutions in existing literature.

Erdos Problem 1051 stands out as a flagship example of Aletheia's fully autonomous problem-solving. The agent also scored 91.9% on IMO-ProofBench Advanced. DeepMind isn't stopping at mathematics — they're already expanding into physics and computer science, reporting achievements including solving the Max-Cut problem, disproving a decade-old conjecture, and discovering new analytic solutions in cosmic string physics.

3. Mathematicians Weigh In on the 13 Solutions

Reactions from mathematicians who participated in verifying Aletheia's results have also surfaced. Professor Kim Sanghyun of the Korea Institute for Advanced Study told Yonhap News that "five or six are genuinely novel solutions" and noted that "there aren't even ten people in the world at that level of expertise," expressing high regard for Aletheia's capabilities. Professor Jung Junhyeok of Brown University also contributed to the verification work.

Professor Kim left another notable comment: "What mathematicians expect from AI isn't the right answer — it's a whisper suggesting a path humans never thought to explore." The remark frames AI not as a replacement for mathematicians but as a tool for opening new directions of inquiry.

4. Gemini 3 Sweeps the Benchmarks

Google Gemini 3 Pro official logo branding image 2026 AI model
Gemini 3

The foundation behind Aletheia, Gemini 3, is also delivering dominant results as a general-purpose model. Gemini 3 Pro holds the #1 spot on LMArena with an Elo of 1501, scores 91.9% on GPQA Diamond, 76.2% on SWE-bench Verified, and leads WebDev Arena at 1487 Elo. Deep Think mode hit 41.0% on Humanity's Last Exam and 93.8% on GPQA Diamond.

Even more interesting is Gemini 3 Flash. Combining Pro-level reasoning with Flash-level efficiency, it scored 78% on SWE-Bench — actually surpassing Pro (76.2%) — at one quarter the price. The Agentic Vision feature added on January 27 enables active image exploration. Google also announced Antigravity, an agentic development platform built on Gemini 3, signaling a serious push to expand the ecosystem.

5. Gemini 3.1 Preview Spotted — Before 3.0 GA

Also on February 11, 'Gemini 3.1 Pro Preview' quietly appeared on the Artificial Analysis Arena with no official announcement. While not confirmed by Google, its presence on an external benchmark platform makes it a 'credible breadcrumb.' The community was stunned that 3.1 was already in external testing when 3.0 hadn't even reached general availability.

Google's Logan Kilpatrick hinted that performance will improve further once GA ships, and community observers are speculating this could be a '3.5-level jump' rather than a simple stability update. If a model already ranked #1 in preview jumps another tier at GA, competitors will have serious catching up to do.

6. Project Genie and Hassabis's AI Renaissance Prediction

Google DeepMind Project Genie 3 text image based interactive 3D world real-time generation technology
Genie 3 text-to-3D world generation demo

DeepMind's recent moves extend beyond Aletheia and Gemini. Project Genie, unveiled January 29, uses Genie 3 to generate interactive 3D worlds in real-time from text or images. Available to AI Ultra subscribers at $250/month, the announcement immediately rattled gaming stocks — Unity dropped 21.6% and Roblox fell 12.3%.

In a Fortune interview the same day, CEO Demis Hassabis predicted "a new Renaissance within 10 to 15 years." His vision of 'radical abundance' describes a future where AI accelerates scientific discovery so dramatically that humanity's overall productivity fundamentally transforms. DeepMind, however, isn't the only one racing toward that renaissance.

Wrapping Up: The AI Race Intensifies

OpenAI is pushing reasoning and coding with the GPT-5 series, while Anthropic is targeting the agent market with Claude Opus 4.6. All three companies have played new cards within the same month.

The AI industry's release cadence keeps shrinking, and with the previously quiet DeepMind now dropping Aletheia and Gemini 3.1 on the same day, that pace appears to have picked up yet another notch.

Menu