A real-data falsification harness for 2024–2026 frontier AI-for-Science capability claims — chemistry generative models, math AI, materials discovery, drug discovery, autonomous labs, physics ML (excluding protein folding which is the Bio/Protein ledger). ★ Bill 8 (cross-discipline-class generalization) EMPTY across THREE substrates within one ledger (chemistry diffusion + materials GNN + math autoregressive) — strong substrate-conditional support of B4 in the cross-ledger atlas. ★ Bill 4 (wet-lab reproduction) PARTIAL: 10 clean autonomous-lab triggers (A-Lab Berkeley, Chai-2, RFdiffusion, PolyBot, Dyno, AbSci) — B9 grounded-reward exception PARTIAL EXTENSION matched the current prediction. ★ Bill 11 (universal AI-scientist coverage) HOLDS empty (0/36); 41 wet-lab failure rebuttals anchor the negative side.
AI is supposed to be discovering new chemicals, materials, and proofs — we checked which discoveries actually reproduce.
Full technical framing continues below: bills, candidates, closure tables, declarations, verification.
Bills are the closure mechanisms any 2024–2026 frontier AI-for-Science capability claim must engage. The 13 bills below were predeclared in bills_draft.md v0.1 before any sweep ran, calibrated to the structure of the AI-for-Science literature (training-corpus overlap, hypothesis-vs-execution decoupling, novel-target audit, wet-lab reproduction, cross-method consensus, GDL interpretability, classical-science baselines, cross-discipline generalization, held-out post-cutoff databases, vendor / lab-card independence, universal AI-scientist coverage, dual-use safety, cost / autonomy decomposition). Bills 4, 8, 11 are ★.
Bill 8 ★ (cross-discipline generalization): 17 candidates, 0 clean — across THREE substrates within one ledger. Strong substrate-conditional support of B4 in the cross-ledger atlas.
Bill 11 ★ (universal AI-scientist coverage): 36 candidates, 0 clean. AI Co-Scientist + Sakana + Coscientist + ChemCrow + BioPlanner all degrade on ≥2 sub-tasks. 41 wet-lab failure rebuttals (Cheetham GNoME, Leeman A-Lab, Buttenschoen PoseBusters, FDA 2024) anchor the negative side.
The B9 grounded-reward exception is empirically validated: autonomous-lab papers (A-Lab, Chai-2, RFdiffusion, PolyBot, Dyno, AbSci, etc.) provide intervention-validated grounding via wet-lab execution. Non-autonomous-lab papers (pure chemistry generation, pure math AI, pure materials prediction) do NOT provide grounded-reward signal and pay M5. The ledger is the cross_ledger_bridges B9 PARTIAL EXTENSION confirmation point — predicted before sweep, observed after sweep.
A clean ★-bill trigger here would shift FDA / EMA / NMPA AI-discovered-drug regulatory pathway design, materials-discovery research-funding allocation cycles (DOE / NSF / EU H2026), autonomous-lab regulatory frameworks, and chemistry / biology dual-use synthesis-screening policies. The Bill 4 PARTIAL (10 autonomous-lab triggers) is already shaping autonomous-lab policy — both A-Lab Berkeley and PolyBot have published wet-lab reproducibility studies that influence current regulatory discussions. Material policy lever, somewhere between Spacetime_Discreteness's funding-allocation lever and Factorization's federal-regulation NIST PQC lever.
The ledger tracks frontier AI-for-Science capability claims across chemistry / materials / math / drug discovery / autonomous labs / physics ML. Three distinct non-autoregressive substrates make this ledger the a strong B4 substrate-conditional test in the cross-ledger atlas.
The frontier AI-for-Science literature splits across three non-autoregressive substrates within one ledger: chemistry diffusion (DiffDock / ChemGPT / Boltz-2), materials geometric deep learning (GNoME / MatterGen), and math autoregressive (AlphaProof / AlphaGeometry 2). Bill 8 ★ EMPTY across all three substrates is a strong B4 substrate-conditional signal in the cross-ledger atlas. Autonomous-lab papers (A-Lab Berkeley, Chai-2, RFdiffusion, PolyBot, Dyno, AbSci) are the B9 grounded-reward PARTIAL EXTENSION confirmation point.
Each ★ bill becomes a checkable trigger condition. Public update committed within 7 days of any verified clean trigger of F4, F8, or F11.
Live triggered watchlist: A-Lab Berkeley quarterly reproducibility reports · Chai-2 / RFdiffusion / Boltz-2 wet-lab follow-ups · FDA AI-drug regulatory pathway updates · Materials Project / GNoME validation · FunSearch / AlphaProof / AlphaGeometry 2 verified-theorem releases · METR / AISI / Apollo AI-for-Science audits. Monthly cadence: vendor system-card revisions + autonomous-lab releases. Quarterly: independent wet-lab verification + benchmark refreshes.
Every empirical claim resolves to public data. Run the classifier, regenerate the heatmap, audit the corpus, file a falsification.
Public draft v0.2 (2026-05-15) — 301 unique papers across 8 sweeps; ★ Bills 8 + 11 HOLD pre-Stage-3.5; Bill 4 PARTIAL with 10 autonomous-lab triggers. Real-data output from real Opus research-agent sweeps; bill counts and ★ positions emerge from the actual frontier AI-for-Science literature, not from a template. The 10 Bill 4 PARTIAL triggers are the cross_ledger_bridges B9 grounded-reward PARTIAL EXTENSION confirmation point.