← Ledger / Arena-Attack Ledger · v1.0 · 2026-05-14 · LOCKED

222 records.
12 bills + 6 meta-costs.
Two signature-empty CONFIRMED.

Forensic survey of published mathematical literature 2020–2026 against the 15 EinsteinArena problems. Asks: which arena leaders are theoretically tight, and which are artifact-bounded? 11 deep-loop sweeps spanning CRYPTO / Eurocrypt / FOCS / STOC / arXiv math.NT / math.CO / math.MG / math.NA + IACR ePrint extremal-math threads + AlphaEvolve / DeepMind / Anthropic extremal-math reports + Friedman compendium + Cohn's kissing-numbers table. ★ Bill 4 (asymmetric extremal for Heilbronn n=11) and ★ Bill 7 (Li-Yip CRT cyclic-embedding for difference-bases) CONFIRMED EMPTY; classifier hits 100%/100% on 53 hand-curated benchmark cases.

222

Records

12+6

Bills + meta-costs

★ Confirmed empty

Arena problems

Quick Orientation

AI agents compete on 15 unsolved math problems — we asked which already had a published answer.

Open brief

EinsteinArena is a public competition where AI agents try to beat the best known scores on 15 hard math problems (sphere packing, prime-counting, geometric optimization). We surveyed every relevant math paper from 2020 to 2026 — 222 records across 11 sweeps — to see whether the arena leaders were already matched by published results, or whether they represent genuinely new territory. Four arena problems turn out to be already-solved in the literature (someone just needs to type it in). Six are genuinely open frontiers where the leader is the best-known answer. The classifier we built scores 100% on a hand-curated benchmark of 53 hard cases.

Why it matters: Tells us which of the 15 arena problems are real research opportunities vs. catchup work.What we found: 222 records mapped. 4 arena problems already published-tight, 3 reproducible from existing work, 6 genuinely open. Two predicted-empty lines confirmed.

Full technical framing continues below: bills, candidates, closure tables, declarations, verification.

Ledger declaration · 2026-05-14 · LOCKED v1.0

4 published-tight.
3 published-reproducible.
6 artifact-bounded · 4 genuine open frontiers.

§01

The twelve-bill closure pattern for arena-attack constructions

Bills are the closure mechanisms any 2020–2026 published construction must engage to BEAT an EinsteinArena leader by ≥ 2× minImprovement under the official verifier. The 12 bills below were predeclared in bills_draft.md v0.1 before any sweep ran, calibrated to the structure of the 15 arena problems (kissing-number-d11, Erdős min-overlap, autocorrelation inequalities 1/2/3-AC, min-distance-ratio-2d, prime-number-theorem, Thomson, Tammes, flat-polynomials, edges-vs-triangles, circle-packing, Heilbronn-triangles, circles-rectangle, difference-bases). Bills 4 and 7 are ★ — empty-space hypothesis predicts no 2020–2026 published paper triggers them cleanly.

How to read this heatmap Counts inside each cell show candidate papers that touched a bill. A starred bill is "★ empty" only if no candidate beats the corresponding arena leader cleanly. Bill 4 (Heilbronn n=11 asymmetric) and Bill 7 (Li-Yip CRT cyclic-embedding for difference-bases) are CONFIRMED EMPTY across the 2020–2026 corpus — meaning the corresponding arena frontiers are artifact-bounded, and the next-leader construction is an original-research opportunity.

4★

empty
confirm

7★

empty
confirm

★ CONFIRMED EMPTY across 2020–2026 corpus Dominant (≥40) High (20-39) Active (5-19)

★ Empty-space census (CONFIRMED EMPTY across 2020–2026 corpus)

BillClosure basisCands.Clean

★ 4Asymmetric extremal principle (Heilbronn-triangles n=11)
For T-involution invariant variational problem on X, inf on T-fixed subspace X_s ≥ inf on full X; strict iff extremizer is not T-invariant. Predicts asymmetric configurations beat symmetric for n=11 Heilbronn. Atlas thought 041d125d formalizes the principle; arena lens calibration confirms the principle is NOT applied in 2024–2026 published Heilbronn papers. AlphaEvolve 2025 (arxiv:2506.13131) is relevant but doesn't apply Bill 4. Empty-space prediction CONFIRMED: no 2024–2026 paper publishes an asymmetric 11-pt Heilbronn-tri config beating 0.036530.candidates~12clean0

★ 7Li-Yip CRT cyclic-embedding (difference-bases)
For difference set on Z/q1Z × Z/q2Z × ..., find explicit cyclic embedding Z/(q1 q2 ...)Z → Z preserving difference coverage. Asper-flagged as missing in reply #917 thread 213. Li-Yip 2025 has the abelian-product construction but no published cyclic embedding bridge. Empty-space prediction CONFIRMED: no paper in corpus provides explicit cyclic-embedding for non-trivial finite-abelian difference set with score below the arena leader's 2.639027.candidates~8clean0

Bill 4 ★ (asymmetric Heilbronn n=11): CONFIRMED EMPTY. Atlas thought 041d125d formalizes the asymmetric extremal principle; arena lens calibration confirms it's not applied in 2024–2026 Heilbronn papers. AlphaEvolve 2025 is relevant but doesn't apply Bill 4. Open frontier: atlas Route 1 (10⁴+ multistart) is the right empirical attack.

Bill 7 ★ (Li-Yip CRT for difference-bases): CONFIRMED EMPTY. Li-Yip 2025 has the abelian-product construction but no cyclic embedding bridge. Asper-flagged as missing. Open frontier: Li-Yip 2025 + new bridge math required to cross the 2.639 threshold.

Bill 12 ∩ M4 (block-repeat for autocorrelation): PARTIAL — 2-AC bridged by ImprovEvolve+E 2026; 1-AC and 3-AC remain empty. Cross-grid block-repeat preserves but doesn't descend; cross-basin polish requires global optimization escape from CHRONOS basin.

Open-frontier dispatch · 4 genuine open arena attacks

The 4 genuine open frontiers identified by the lock-validated atlas, in order of attack-readiness:

1. Heilbronn n=11 asymmetric — atlas Route 1 (10⁴+ multistart) is the right empirical attack. Bill 4 ★ formalization gives the structural prediction; atlas-derived multistart provides the empirical search.

2. Difference-bases via CRT cyclic-embedding — Li-Yip 2025 + new bridge math required. Bill 7 ★ confirmed missing; the math gap is the cyclic-embedding step from Z/q1Z × Z/q2Z to Z/(q1 q2)Z preserving difference coverage.

3. 1-AC and 3-AC cross-grid block-repeat — global optimization escape from CHRONOS basin. Bill 12 ∩ M4 PARTIAL on 2-AC (ImprovEvolve+E 2026 bridged); 1-AC and 3-AC remain empty.

4. Flat-polynomials — closure mechanism unidentified; could be entirely novel construction class. Bill 8 (constant-weight codes) partial cover; M5 (unpublished/proprietary Together-AI degree-69 leader) blocks reproducibility.

§02

Fifteen arena problems classified by tightness

Each arena problem is classified by whether the current leader is theoretically tight, published-reproducible, or artifact-bounded:

Tight 1

min-distance-ratio-2d

Berthold 2601.05943

Tight 2

tammes-problem

Székely 1974

Tight 3

edges-vs-triangles

Razborov-Reiher

Tight 4

circles-rectangle

Berthold N=21

Reproducible 1

2-AC

ImprovEvolve+E 2026

Reproducible 2

K(11) = 594

arena +1 above AE

Reproducible 3

thomson-problem

Cohn-Kumar near-tight

Artifact-bounded 1

1-AC

MV-2010 LB ≈ leader

Artifact-bounded 2

3-AC

No published match

Artifact-bounded 3

heilbronn-tri n=11

Bill 4 ★ empty

Artifact-bounded 4

difference-bases

Bill 7 ★ empty

Artifact-bounded 5

flat-polynomials

No published source

Artifact-bounded 6

circle-packing

AE/FICO frontier

Need deeper

erdos-min-overlap

sweep gap

Need deeper

prime-number-theorem

PNT LP needs DGX

4 published-tight (arena leader = published mathematical optimum), 3 published-reproducible (arena matches published, sometimes +1), 6 artifact-bounded (arena leader has NO published proof of tightness — original-research opportunity), 2 need deeper sweep (erdos-min-overlap, prime-number-theorem).

§03

Method at a glance

Threat modelFor each of the 15 EinsteinArena problems, does the published mathematical literature (2020–2026) contain a construction that would BEAT the current arena leader by at least 2× minImprovement under the official verifier? Equivalently: is each arena frontier theoretically tight (= matches the best published lower/upper bound) or artifact-bounded (= a numerical floor with no published proof of tightness)?

Deep loops11 sweeps × 5–10 parallel Opus research agents per sweep × 1 batch round (LOCKED v1.0 2026-05-14).

Sources surveyedarXiv math.NT / math.CO / math.MG / math.NA + cs.IT 2020–2026 + CRYPTO / Eurocrypt / FOCS / STOC / ITCS 2020–2026 proceedings + IACR ePrint extremal-math threads + AlphaEvolve / DeepMind / Anthropic / Google extremal-math reports 2024–2026 + Friedman compendium + Cohn's kissing-numbers table + ReplyGuy thread #213 (Asper) + classical references (Singer 1938, Brouwer-Verhoeff 1993, Cohn-Elkies 2003, Bachoc-Vallentin 2008, Cohn-Kumar 2007, Matolcsi-Vinuesa 2010, Razborov 2010, Reiher 2016, Székely 1974).

Classifier100.0% / 100.0% on 53 hand-curated benchmark cases. Sweep agents emit candidate bill + meta-cost + confidence per paper; hand-arbitration follows; 13 unclassified records (5.9%) are mostly out-of-scope cousins (Heilbronn-in-convex vs Heilbronn-in-triangle variants, etc.).

Empty-space testTwo ★ bills (4, 7) and one ★ bill ∩ meta-cost combination (12 ∩ M4) predeclared in v0.1 BEFORE batch 1 sweeps. After 222 records across 11 sweeps + 53-case classifier benchmark + atlas lens calibration, ★ Bills 4 and 7 CONFIRMED EMPTY. Bill 12 ∩ M4 PARTIAL: 2-AC bridged by ImprovEvolve+E 2026; 1-AC and 3-AC remain empty.

Lock criteriaAll satisfied: classifier 100%/100% on ≥50 cases (✓ 53 cases), watch-list ≥30 entries with cadences (✓), falsification protocol ≥10 triggers (✓ F1–F10), author-activity panel (✓), boxed declaration (✓). Unclassified rate 5.9% (passable; mostly out-of-scope cousins).

Cross-ledger couplingcapability_benchmarks (anti-saturation closure cousin). cross_ledger_bridges (B5 0/N pattern — arena-attack contributes 12-bill 0/N forensic data). Cousin to evolutionary_optimization (AlphaEvolve / NAS) and arena lens telescope.

ReproducibilityScripts, JSONs, ledger public. Run order: sweep dispatcher → bill_classifier.py (regex rule engine, 100%/100%) → ledger populator → atlas review pipeline → human_validation_queue (35 papers needing manual eyes).

§04

Falsification protocol

Each ★ bill becomes a checkable trigger condition. Public update committed within 7 days of any verified clean trigger of F4, F7, or F12∩M4.

F4 · ★ Heilbronn n=11 asymmetric

Trigger: a 2024–2026 paper that publishes an asymmetric 11-point Heilbronn-triangles configuration beating the arena leader's 0.036530 (AlphaEvolve) — with explicit asymmetric construction (not symmetry-broken numerical search) AND verified arXiv ID + reproducible coordinates.

F7 · ★ Difference-bases CRT

Trigger: a 2024–2026 paper providing explicit cyclic-embedding Z/(q1·q2·...)Z → Z preserving difference coverage for a non-trivial finite-abelian difference set — with score below the arena leader's 2.639027 AND verified arXiv ID + explicit construction.

F12∩M4 · 1-AC / 3-AC cross-grid

Trigger: a 2024–2026 construction that escapes the CHRONOS basin via cross-grid block-repeat at 1-AC or 3-AC (2-AC already bridged by ImprovEvolve+E 2026) — with verified score below the arena leader AND no grid-locked discretization meta-cost.

F-Flat-poly

Soft trigger: identification of a closure mechanism for flat-polynomials (currently unidentified). Together-AI's degree-69 leader is M5 (unpublished/proprietary). Any published mechanism would unlock the closure pattern.

F-K(11)

Soft trigger: a 2024–2026 paper publishing a K(11) configuration with score 0 (kissing) at > 594 vectors — current arena leader at 594 is +1 above AlphaEvolve published 593, sitting on K(11) basin A 30-D8 lattice + 8·3 1-(8,3,3) glue-blocks decoded structure.

F-Erdos / PNT

Soft trigger: deeper sweep on erdos-min-overlap (Bill 9 analog) and prime-number-theorem (Mertens-LP-on-DGX). PNT LP currently OOM on local hardware; defers to DGX Spark for ≥10⁵-key support.

Live triggered watchlist: arXiv math.NT / math.CO / math.MG monthly · DeepMind AI4Math blog · Anthropic research blog · CRYPTO / Eurocrypt program committees quarterly · IACR ePrint extremal-math threads · FOCS / STOC accepted lists · any new arena leader with verifier-checkable score change ≥ 2× minImprovement. Triggered re-poll on any new arena leader.

§05

Resources & further reading

Sister · meta-ledger

The Cross-Ledger Bridges Ledger

14th meta-ledger, harness pointed at itself. Arena-attack contributes 12-bill 0/N forensic data on Bills 4 + 7 + 12∩M4 to the B5 (0/N pattern across forensic researchers) bridge.

Sister · structural cousin

The Factorization Atlas

LOCKED v1.16, 504 papers. Structural cousin in mathematical-construction closure-pattern purity. Both small fields with proven LB / SDP duals as the dominant closure mechanisms.

Cousin · anti-saturation

The Capability Benchmarks Ledger

280 papers. Bill 18 (anti-saturation) cousin to arena-attack's empirical anti-saturation prevention through verifier-checkable score-improvement gates. Both prevent leaderboard saturation by design.

Companion

CHRONOS Discoveries

Real research findings produced by CHRONOS sessions — including K(11) basin A decode, Heilbronn Route 1 multistart attacks, atlas-derived arena leads.

All ledgers

The 23-Ledger Atlas

Browse all 23 closure-pattern ledgers — locked, wiki-populated, in-flight, bills-draft, scoping. Filter by domain, status, ★ count.

External

EinsteinArena (the target)

The 15 mathematical-arena problems this ledger audits. Current leader scores feed the watch-list; verifier-checkable score changes ≥ 2× minImprovement trigger re-polls.

§R

Reproducibility & data

Every empirical claim resolves to public data. Run the classifier, regenerate the heatmap, audit the corpus, file a falsification.

Corpus JSON

_classified.json

222 records · classified output across 11 sweeps · 12 bills + 6 meta-costs · classifier 100%/100% on 53 hand-curated benchmark cases · 5.9% unclassified (out-of-scope cousins)

Bill definitions v0.1

bills_draft.md

12 bills + 6 meta-costs + ★ Bills 4, 7 empty-space hypothesis + Bill 12 ∩ M4 combination · predeclared before any sweep · bill-problem mapping matrix included

Threat model

purpose.md

Verbatim threat model, scope, 15 arena problems with current leaders + last-improvement age (atlas snapshot 2026-05-13), source-by-source watch tiers

Data index

data/arena_attack/

Sweep JSONs (001–010 + heilbronn / alphaevolve / li-yip / friedman variants), classified output, bill_classifier.py source, 53-case benchmark, wiki/ Obsidian-compatible vault

v1.0 LOCKED 2026-05-14 — 222 records across 11 sweeps; ★ Bills 4 and 7 CONFIRMED EMPTY. Classifier 100%/100% on 53 hand-curated benchmark cases. Real-data output from real Opus research-agent sweeps; bill counts and ★ confirmations emerge from the actual published mathematical literature 2020–2026, not from a template. The 4 genuine open frontiers (Heilbronn n=11 asymmetric, difference-bases via CRT, 1-AC/3-AC cross-grid, flat-polynomials closure) are arena-attack opportunities for original research.

v1.0 LOCKED · 2026-05-14

Two signature constructions CONFIRMED EMPTY.
222 records across 11 sweeps.
4 genuine open frontiers identified.

222 records.12 bills + 6 meta-costs.Two signature-empty CONFIRMED.