Factorization Atlas
504-paper survey of integer-factorization closure across classical, quantum, and post-quantum threat models. Current reference ledger for the method.
Falsification ledgers are built to make the hard part public: what would count as a real breach, which evidence was checked, and which claims survived verification.
504-paper survey of integer-factorization closure across classical, quantum, and post-quantum threat models. Current reference ledger for the method.
312-paper survey of frontier embodied-AI claims (RT-2/X, Helix/Figure 03, OpenVLA, π0/0.5, GR00T, Optimus, Waymo, Wayve, Apollo, 1X). 12 sweeps + 4 verification.
299-paper survey of low-resource and multilingual capability claims. ALL 3 ★ predicted-empty bills hold (0/33, 0/46, 0/145). 75% rebuttal density.
247-paper survey of retrieval-augmented generation closure. ALL 3 ★ empty. Bill 7 PROFOUNDLY RESCOPED to commercialization-vs-research axis.
377-paper survey of frontier image / video / audio generation. Strong B7/B8 bipolar signal: 78 closed Bill 9 vs 74 open Bill 12.
301-paper survey of AI-driven scientific-discovery claims. 2/3 ★ empty. Bill 4 PARTIAL: 10 autonomous-lab triggers as predicted. Bill 8 ★ EMPTY across 3 substrates.
291-paper survey across vLLM / SGLang / Groq / Cerebras / Triton inference stacks. Purest 0/N signal in the corpus: 0/34, 0/38, 0/20.
635-paper post-quantum lattice ledger. Kyber / Dilithium / Falcon under closure. Bills tracking ring-LWE and SIS hardness assumptions.
275-paper survey of quantum-supremacy / advantage claims. Random circuit sampling, boson sampling, Shor scaling.
280-paper survey of frontier capability claims. MMLU, MMMU, ARC-AGI, FrontierMath, LiveCodeBench saturation curves.
280-paper survey of compute-governance disclosure. Western 17% / Chinese 100% inversion documented. BIS lifetime, NIST AI RMF, EU AI Act timelines.
280-paper survey of inference-time safety / jailbreak / refusal closure. ITS patch lifecycle: 30d / 36h. Bill 14 ★: defense is property of deployment surface.
280-paper survey of mechanistic interpretability claims. Sparse autoencoders, feature circuits, causal abstraction, faithfulness. Bill 11 ★ evidence-bearing for Bridge 1.
417-paper survey of RLHF / DPO / Constitutional AI / Self-Rewarding alignment claims. 8 sweeps + Stage 3.5 verification.
222-record forensic survey of published math 2020-2026 against 15 EinsteinArena problems. AlphaEvolve = cross-domain lingua franca. 6 artifact-bounded, 4 published-tight.
111-record cross-ledger meta-audit — the harness pointed at itself. Bills 7★, 9★, and 12★ were predeclared empty before the audit. Seven bridges surfaced; batch-3 checks confirmed 21/21 priority claims.
388-paper quantum-gravity discreteness survey (LQG / spinfoam / CDT / causal sets / asymptotic safety / GFT / holographic / emergent gravity). The first physics falsification ledger — 4 ★ bills because the discreteness-prediction problem must independently pay both internal-consistency AND external-distinguishability closures.
SWE-bench, Cybench, browser-use, code-interpreter agents. Bills predeclared, sweep pending.
AlphaFold, RosettaFold, ESMFold, structural biology + drug-discovery overclaims. Bills tracking generative-model novelty closure.
Llama 4, Qwen3-MoE 235B, Hunyuan-Large, Mistral. Apache 2.0 ≥30B closures and distillation portability. Bill 8 ★ evidence-bearing for Bridge 3.
o1, o3, DeepSeek-R1, Sky-T1, reflection, self-consistency. Bill 6 ★ — causally-faithful reasoning trace closure.
Chinchilla, Kaplan, emergent abilities, Mamba/SSM vs dense, R1-Distill 100–1000×. Bill 11 ★ — scaling-portability closure.
CLIP, LLaVA, Qwen-VL, Sora, Veo, Imagen, PixArt. Bill 4 ★ (causally-faithful mechanism) + Bill 18 (cross-surface).
RAG / needle / KV cache / 1M context. 8,036 atlas2 hits. Strong second after rl_from_rewards.
Φ / IIT / Tononi. 1,914 atlas2 hits. Overclaim-rich. First consciousness ledger.
AlphaEvolve / NAS. 1,015 atlas2 hits. Cousin to arena_attack.
Byzantine / CRDT / Paxos. 3,512 atlas2 hits. Settled mathematics dampens closure-richness.
No ledgers match these filters.
Seven bridges from the cross-ledger self-audit — the harness pointed at itself. Bills 7★, 9★, and 12★ were predeclared empty before audit. Three evidence-bearing, three weakened, one untested.
Mech Interp Bill 11★, ITS Bill 11★, Reasoning Bill 6★, VLM Bill 4★, Scaling Laws Bill 5★, Agentic Bill 4★, Bio Bill 4★ all hold empty across 2,000+ LLM-domain papers. Robotics_embodied corpus support evaporated when verification killed all 4 Bill_4 grounded-reward IDs (2026-05-15).
Vendor-claim half-life 73d · ITS patch 30d / 36h · distilled-cousin 3.4mo · Sky-T1 reproduces o1-preview in 2wk · BIS 4mo lifetime · ARC-AGI v1→v2 3mo. Reported as a 30–100 day range.
ITS Bill 14★ + Open-weight Bill 8★ + VLM Bill 18 + Agentic Bill 11. Lermen-Rimsky 10× cheaper to undo safety than to install it. Defense mitigations are a property of the deployment surface, not the model.
Open-weight Bill 5★ + Scaling Laws Bill 11★ + Compute Governance Bill 11★. Halevy-Heim-Pilz 0/14 resistant; Mamba2 dense fails 0.06–0.11 on SSM; R1-Distill 100–1000× lower compute. No architectural moat — capability is fluid, only training-data novelty is sticky.
Anand-Goyal unified-VLM 0/9 · Anand-Bommasani cross-organism 0/8 · Anand-Rein unified-agent 0/9 · Halevy-Heim-Pilz distillation 0/14 · IBBIS synthesis-screened 0/4 · Yang-Bommasani cross-mixture 0/9.
Across 7 ledgers, anti-saturation is the only Bill that fires positive. Iterative reframing (ARC v1→v2→v3, MMMU→MMMU-Pro, FrontierMath Tier-1→4, LiveCodeBench monthly, Cybench Pro held-out) is empirically the only audit primitive keeping pace with the 30–100 day closure cycle.
China-domiciled vendors disclose 100% (DeepSeek, Alibaba, 01.AI all 8/8 fields); Western vendors disclose 17%. Frontier Apache 2.0 ≥30B includes Llama 4, Qwen3-MoE 235B, Hunyuan-Large. The "China = closed/risky, US = open/safe" framing would be sign-flipped if updated to current data.
Synthesis. 30+ ★ predicted-empty bills holding across 8,600+ papers is the evidence-bearing claim; the cross-ledger bridges are the interesting result. Discipline: predeclare the empty-space bills before the audit, ship verification before any breach. Read full synthesis →