← Ledger
/
Arena-Attack Ledger · v1.0 · 2026-05-14 · LOCKED
222 records.
12 bills + 6 meta-costs.
Two signature-empty CONFIRMED.
Forensic survey of published mathematical literature 2020–2026 against the 15 EinsteinArena problems. Asks: which arena leaders are theoretically tight, and which are artifact-bounded? 11 deep-loop sweeps spanning CRYPTO / Eurocrypt / FOCS / STOC / arXiv math.NT / math.CO / math.MG / math.NA + IACR ePrint extremal-math threads + AlphaEvolve / DeepMind / Anthropic extremal-math reports + Friedman compendium + Cohn's kissing-numbers table. ★ Bill 4 (asymmetric extremal for Heilbronn n=11) and ★ Bill 7 (Li-Yip CRT cyclic-embedding for difference-bases) CONFIRMED EMPTY; classifier hits 100%/100% on 53 hand-curated benchmark cases.
Quick Orientation
AI agents compete on 15 unsolved math problems — we asked which already had a published answer.
Open brief
EinsteinArena is a public competition where AI agents try to beat the best known scores on 15 hard math problems (sphere packing, prime-counting, geometric optimization). We surveyed every relevant math paper from 2020 to 2026 — 222 records across 11 sweeps — to see whether the arena leaders were already matched by published results, or whether they represent genuinely new territory. Four arena problems turn out to be already-solved in the literature (someone just needs to type it in). Six are genuinely open frontiers where the leader is the best-known answer. The classifier we built scores 100% on a hand-curated benchmark of 53 hard cases.
Why it matters: Tells us which of the 15 arena problems are real research opportunities vs. catchup work.What we found: 222 records mapped. 4 arena problems already published-tight, 3 reproducible from existing work, 6 genuinely open. Two predicted-empty lines confirmed.
Full technical framing continues below: bills, candidates, closure tables, declarations, verification.
Ledger declaration · 2026-05-14 · LOCKED v1.0
4 published-tight.
3 published-reproducible.
6 artifact-bounded · 4 genuine open frontiers.
Bills are the closure mechanisms any 2020–2026 published construction must engage to BEAT an EinsteinArena leader by ≥ 2× minImprovement under the official verifier. The 12 bills below were predeclared in bills_draft.md v0.1 before any sweep ran, calibrated to the structure of the 15 arena problems (kissing-number-d11, Erdős min-overlap, autocorrelation inequalities 1/2/3-AC, min-distance-ratio-2d, prime-number-theorem, Thomson, Tammes, flat-polynomials, edges-vs-triangles, circle-packing, Heilbronn-triangles, circles-rectangle, difference-bases). Bills 4 and 7 are ★ — empty-space hypothesis predicts no 2020–2026 published paper triggers them cleanly.
How to read this heatmap
Counts inside each cell show candidate papers that touched a bill. A starred bill is "★ empty" only if no candidate beats the corresponding arena leader cleanly. Bill 4 (Heilbronn n=11 asymmetric) and Bill 7 (Li-Yip CRT cyclic-embedding for difference-bases) are CONFIRMED EMPTY across the 2020–2026 corpus — meaning the corresponding arena frontiers are artifact-bounded, and the next-leader construction is an original-research opportunity.
★ CONFIRMED EMPTY across 2020–2026 corpus
Dominant (≥40)
High (20-39)
Active (5-19)
★ Empty-space census (CONFIRMED EMPTY across 2020–2026 corpus)
BillClosure basisCands.Clean
★ 4Asymmetric extremal principle (Heilbronn-triangles n=11)
For T-involution invariant variational problem on X, inf on T-fixed subspace X_s ≥ inf on full X; strict iff extremizer is not T-invariant. Predicts asymmetric configurations beat symmetric for n=11 Heilbronn. Atlas thought 041d125d formalizes the principle; arena lens calibration confirms the principle is NOT applied in 2024–2026 published Heilbronn papers. AlphaEvolve 2025 (arxiv:2506.13131) is relevant but doesn't apply Bill 4. Empty-space prediction CONFIRMED: no 2024–2026 paper publishes an asymmetric 11-pt Heilbronn-tri config beating 0.036530.candidates~12clean0
★ 7Li-Yip CRT cyclic-embedding (difference-bases)
For difference set on Z/q1Z × Z/q2Z × ..., find explicit cyclic embedding Z/(q1 q2 ...)Z → Z preserving difference coverage. Asper-flagged as missing in reply #917 thread 213. Li-Yip 2025 has the abelian-product construction but no published cyclic embedding bridge. Empty-space prediction CONFIRMED: no paper in corpus provides explicit cyclic-embedding for non-trivial finite-abelian difference set with score below the arena leader's 2.639027.candidates~8clean0
Bill 4 ★ (asymmetric Heilbronn n=11): CONFIRMED EMPTY. Atlas thought 041d125d formalizes the asymmetric extremal principle; arena lens calibration confirms it's not applied in 2024–2026 Heilbronn papers. AlphaEvolve 2025 is relevant but doesn't apply Bill 4. Open frontier: atlas Route 1 (10⁴+ multistart) is the right empirical attack.
Bill 7 ★ (Li-Yip CRT for difference-bases): CONFIRMED EMPTY. Li-Yip 2025 has the abelian-product construction but no cyclic embedding bridge. Asper-flagged as missing. Open frontier: Li-Yip 2025 + new bridge math required to cross the 2.639 threshold.
Bill 12 ∩ M4 (block-repeat for autocorrelation): PARTIAL — 2-AC bridged by ImprovEvolve+E 2026; 1-AC and 3-AC remain empty. Cross-grid block-repeat preserves but doesn't descend; cross-basin polish requires global optimization escape from CHRONOS basin.
Open-frontier dispatch · 4 genuine open arena attacks
The 4 genuine open frontiers identified by the lock-validated atlas, in order of attack-readiness:
1. Heilbronn n=11 asymmetric — atlas Route 1 (10⁴+ multistart) is the right empirical attack. Bill 4 ★ formalization gives the structural prediction; atlas-derived multistart provides the empirical search.
2. Difference-bases via CRT cyclic-embedding — Li-Yip 2025 + new bridge math required. Bill 7 ★ confirmed missing; the math gap is the cyclic-embedding step from Z/q1Z × Z/q2Z to Z/(q1 q2)Z preserving difference coverage.
3. 1-AC and 3-AC cross-grid block-repeat — global optimization escape from CHRONOS basin. Bill 12 ∩ M4 PARTIAL on 2-AC (ImprovEvolve+E 2026 bridged); 1-AC and 3-AC remain empty.
4. Flat-polynomials — closure mechanism unidentified; could be entirely novel construction class. Bill 8 (constant-weight codes) partial cover; M5 (unpublished/proprietary Together-AI degree-69 leader) blocks reproducibility.
Each arena problem is classified by whether the current leader is theoretically tight, published-reproducible, or artifact-bounded:
Tight 1
min-distance-ratio-2d
Berthold 2601.05943
Tight 2
tammes-problem
Székely 1974
Tight 3
edges-vs-triangles
Razborov-Reiher
Tight 4
circles-rectangle
Berthold N=21
Reproducible 1
2-AC
ImprovEvolve+E 2026
Reproducible 2
K(11) = 594
arena +1 above AE
Reproducible 3
thomson-problem
Cohn-Kumar near-tight
Artifact-bounded 1
1-AC
MV-2010 LB ≈ leader
Artifact-bounded 2
3-AC
No published match
Artifact-bounded 3
heilbronn-tri n=11
Bill 4 ★ empty
Artifact-bounded 4
difference-bases
Bill 7 ★ empty
Artifact-bounded 5
flat-polynomials
No published source
Artifact-bounded 6
circle-packing
AE/FICO frontier
Need deeper
erdos-min-overlap
sweep gap
Need deeper
prime-number-theorem
PNT LP needs DGX
4 published-tight (arena leader = published mathematical optimum), 3 published-reproducible (arena matches published, sometimes +1), 6 artifact-bounded (arena leader has NO published proof of tightness — original-research opportunity), 2 need deeper sweep (erdos-min-overlap, prime-number-theorem).
Threat modelFor each of the 15 EinsteinArena problems, does the published mathematical literature (2020–2026) contain a construction that would BEAT the current arena leader by at least 2× minImprovement under the official verifier? Equivalently: is each arena frontier theoretically tight (= matches the best published lower/upper bound) or artifact-bounded (= a numerical floor with no published proof of tightness)?
Deep loops11 sweeps × 5–10 parallel Opus research agents per sweep × 1 batch round (LOCKED v1.0 2026-05-14).
Sources surveyedarXiv math.NT / math.CO / math.MG / math.NA + cs.IT 2020–2026 + CRYPTO / Eurocrypt / FOCS / STOC / ITCS 2020–2026 proceedings + IACR ePrint extremal-math threads + AlphaEvolve / DeepMind / Anthropic / Google extremal-math reports 2024–2026 + Friedman compendium + Cohn's kissing-numbers table + ReplyGuy thread #213 (Asper) + classical references (Singer 1938, Brouwer-Verhoeff 1993, Cohn-Elkies 2003, Bachoc-Vallentin 2008, Cohn-Kumar 2007, Matolcsi-Vinuesa 2010, Razborov 2010, Reiher 2016, Székely 1974).
Classifier100.0% / 100.0% on 53 hand-curated benchmark cases. Sweep agents emit candidate bill + meta-cost + confidence per paper; hand-arbitration follows; 13 unclassified records (5.9%) are mostly out-of-scope cousins (Heilbronn-in-convex vs Heilbronn-in-triangle variants, etc.).
Empty-space testTwo ★ bills (4, 7) and one ★ bill ∩ meta-cost combination (12 ∩ M4) predeclared in v0.1 BEFORE batch 1 sweeps. After 222 records across 11 sweeps + 53-case classifier benchmark + atlas lens calibration, ★ Bills 4 and 7 CONFIRMED EMPTY. Bill 12 ∩ M4 PARTIAL: 2-AC bridged by ImprovEvolve+E 2026; 1-AC and 3-AC remain empty.
Lock criteriaAll satisfied: classifier 100%/100% on ≥50 cases (✓ 53 cases), watch-list ≥30 entries with cadences (✓), falsification protocol ≥10 triggers (✓ F1–F10), author-activity panel (✓), boxed declaration (✓). Unclassified rate 5.9% (passable; mostly out-of-scope cousins).
Cross-ledger couplingcapability_benchmarks (anti-saturation closure cousin). cross_ledger_bridges (B5 0/N pattern — arena-attack contributes 12-bill 0/N forensic data). Cousin to evolutionary_optimization (AlphaEvolve / NAS) and arena lens telescope.
ReproducibilityScripts, JSONs, ledger public. Run order: sweep dispatcher → bill_classifier.py (regex rule engine, 100%/100%) → ledger populator → atlas review pipeline → human_validation_queue (35 papers needing manual eyes).
Each ★ bill becomes a checkable trigger condition. Public update committed within 7 days of any verified clean trigger of F4, F7, or F12∩M4.
F4 · ★ Heilbronn n=11 asymmetric
Trigger: a 2024–2026 paper that publishes an asymmetric 11-point Heilbronn-triangles configuration beating the arena leader's 0.036530 (AlphaEvolve) — with explicit asymmetric construction (not symmetry-broken numerical search) AND verified arXiv ID + reproducible coordinates.
F7 · ★ Difference-bases CRT
Trigger: a 2024–2026 paper providing explicit cyclic-embedding Z/(q1·q2·...)Z → Z preserving difference coverage for a non-trivial finite-abelian difference set — with score below the arena leader's 2.639027 AND verified arXiv ID + explicit construction.
F12∩M4 · 1-AC / 3-AC cross-grid
Trigger: a 2024–2026 construction that escapes the CHRONOS basin via cross-grid block-repeat at 1-AC or 3-AC (2-AC already bridged by ImprovEvolve+E 2026) — with verified score below the arena leader AND no grid-locked discretization meta-cost.
F-Flat-poly
Soft trigger: identification of a closure mechanism for flat-polynomials (currently unidentified). Together-AI's degree-69 leader is M5 (unpublished/proprietary). Any published mechanism would unlock the closure pattern.
F-K(11)
Soft trigger: a 2024–2026 paper publishing a K(11) configuration with score 0 (kissing) at > 594 vectors — current arena leader at 594 is +1 above AlphaEvolve published 593, sitting on K(11) basin A 30-D8 lattice + 8·3 1-(8,3,3) glue-blocks decoded structure.
F-Erdos / PNT
Soft trigger: deeper sweep on erdos-min-overlap (Bill 9 analog) and prime-number-theorem (Mertens-LP-on-DGX). PNT LP currently OOM on local hardware; defers to DGX Spark for ≥10⁵-key support.
Live triggered watchlist: arXiv math.NT / math.CO / math.MG monthly · DeepMind AI4Math blog · Anthropic research blog · CRYPTO / Eurocrypt program committees quarterly · IACR ePrint extremal-math threads · FOCS / STOC accepted lists · any new arena leader with verifier-checkable score change ≥ 2× minImprovement. Triggered re-poll on any new arena leader.
Sister · meta-ledger
The Cross-Ledger Bridges Ledger
14th meta-ledger, harness pointed at itself. Arena-attack contributes 12-bill 0/N forensic data on Bills 4 + 7 + 12∩M4 to the B5 (0/N pattern across forensic researchers) bridge.
Sister · structural cousin
The Factorization Atlas
LOCKED v1.16, 504 papers. Structural cousin in mathematical-construction closure-pattern purity. Both small fields with proven LB / SDP duals as the dominant closure mechanisms.
Cousin · anti-saturation
The Capability Benchmarks Ledger
280 papers. Bill 18 (anti-saturation) cousin to arena-attack's empirical anti-saturation prevention through verifier-checkable score-improvement gates. Both prevent leaderboard saturation by design.
Companion
CHRONOS Discoveries
Real research findings produced by CHRONOS sessions — including K(11) basin A decode, Heilbronn Route 1 multistart attacks, atlas-derived arena leads.
All ledgers
The 23-Ledger Atlas
Browse all 23 closure-pattern ledgers — locked, wiki-populated, in-flight, bills-draft, scoping. Filter by domain, status, ★ count.
External
EinsteinArena (the target)
The 15 mathematical-arena problems this ledger audits. Current leader scores feed the watch-list; verifier-checkable score changes ≥ 2× minImprovement trigger re-polls.
Every empirical claim resolves to public data. Run the classifier, regenerate the heatmap, audit the corpus, file a falsification.
v1.0 LOCKED 2026-05-14 — 222 records across 11 sweeps; ★ Bills 4 and 7 CONFIRMED EMPTY. Classifier 100%/100% on 53 hand-curated benchmark cases. Real-data output from real Opus research-agent sweeps; bill counts and ★ confirmations emerge from the actual published mathematical literature 2020–2026, not from a template. The 4 genuine open frontiers (Heilbronn n=11 asymmetric, difference-bases via CRT, 1-AC/3-AC cross-grid, flat-polynomials closure) are arena-attack opportunities for original research.
v1.0 LOCKED · 2026-05-14
Two signature constructions CONFIRMED EMPTY.
222 records across 11 sweeps.
4 genuine open frontiers identified.