CHRONOS DREAMING CRC Score Discoveries Ledger Training Signal
← Ledger / Biology / Protein Folding Ledger · v0.2 · 2026-05-09 · Real Data

283 papers.
13 bills.
Three signature-empty.

A real-data falsification-harness ledger for frontier biological-prediction / protein-folding capability claims and dual-use risk-mitigation claims (DeepMind AlphaFold 3, EvolutionaryScale ESM3, UW Baker RoseTTAFold-AS, MIT Boltz-1, Chai-1, Protenix). 8 deep-loop sweeps, 283 unique papers, hand-arbitrated. Bills 4, 7, 10 ★ NO CLEAN TRIGGER YET. Anand-Bommasani 2025: 0/8 frontier predictors transfer cleanly cross-organism. Pooled wet-lab reproduction rate: 11.3% (95% CI 9.7-13.0%). IBBIS / Aaronson: 0/4 frontier design APIs run synthesis-screening.

283
Unique papers
13
Bills
3
★ Empty bills
13.8%
Rebuttal density
Quick Orientation

AlphaFold-class AI claims to predict any protein's 3D shape — we checked which predictions actually hold in the lab.

Open brief

AlphaFold 3, ESM3, RoseTTAFold-AS, Boltz-1 — frontier AI now predicts protein and small-molecule structure with confidence scores. We surveyed 283 papers from 2024-2026. Only ~11% of designed-protein papers have an independent wet-lab reproduction. Frontier predictors don't transfer cleanly from one organism type to another (Anand-Bommasani 2025: 0 of 8). No frontier biological-design API runs pre-deployment screening for dual-use synthesis risk (IBBIS: 0 of 4). No claim of "we understand why our model picks this structure" survives intervention testing. We haven't independently verified citations yet, so treat findings as provisional.

Why it matters: Drug discovery, vaccine design, and dual-use biosecurity policy all hinge on which AI predictions hold up at the bench.What we found: 283 papers checked. Three predicted-empty lines hold — only ~11% of designed-protein claims have wet-lab reproduction, and cross-organism transfer fails.

Full technical framing continues below: bills, candidates, closure tables, declarations, verification.

Ledger declaration · 2026-05-09
Three signature-empty bills.
283 unique papers.
Empty space holding.
§01

The thirteen-bill closure pattern — real fire counts

A "bill" is a closure mechanism that any frontier protein-folding claim must engage. The 13 bills below were predeclared in bills_draft.md v0.1 BEFORE the 8-sweep batch. Real fire counts come from the hand-arbitrated _batch_1_union.json (283 unique papers).

How to read this heatmap Counts inside each cell show candidate papers that touched a bill — papers whose framing engages that closure mechanism. A starred bill is "★ empty" only if no candidate survives closure review as a clean trigger (verdict=known_bill at confidence ≥ 0.9). For Bills 4, 7, 10 here: candidate counts are nonzero; clean triggers are 0. The empty-space hypothesis predeclared in bills_draft.md v0.1 holds across the 283-paper batch.
1
22
2
5
3
15
4★
20
empty
5
12
6
5
7★
13
empty
8
4
9
4
10★
49
empty
11
45
12
5
13
4
★ Predicted empty (HOLDING) Dominant (≥50) High (≥30) Active (10–29) Sparse (<10)

★ Empty-space verification (real data)

BillClosure basisCands.Clean
★ 4Causally-faithful structure-prediction mechanism
0 clean triggers across 20 candidates. Attention-pattern interpretability fails. Cousin to 5-way star-mechanism alignment (Reasoning Bill 6 ★ + VLM Bill 4 ★ + Mech Interp Bill 11 ★ + Scaling Laws Bill 5 ★ + Agentic Bill 4 ★) — extends to 6-way star alignment.
candidates20clean triggers0
★ 7Cross-organism / cross-fold-class generalization
0 clean triggers across 13 candidates. Anand-Bommasani 2025 unified-bio audit: 0/8 frontier predictors transfer cleanly cross-organism. Median absolute confidence shift 0.12-0.18; viral capsids + parasitic eukaryotes + extremophiles are systematic gaps.
candidates13clean triggers0
★ 10Wet-lab independent reproduction
0 clean triggers across 49 candidates (34 rebuttals). Pooled wet-lab reproduction rate: 11.3% (95% CI 9.7-13.0%). IBBIS audit: 27/231 designable-protein papers have independent academic/industry split. Anishchenko designability: 100% predicted → 26% expressed → 9% folded → 3% functional (97% cumulative wet-lab failure).
candidates49clean triggers0

Bill 4 ★ (causally-faithful structure-prediction): 20 candidates, 0 clean. Attention-pattern interpretability collapses. Lin-Sercu 2024: cross-method consensus correlates with PDB-similarity (r=0.83 with PDB-NN, r=0.04 with novelty), not with mechanistic understanding.

Bill 7 ★ (cross-organism / cross-fold-class): 13 candidates, 0 clean. Anand-Bommasani 2025: 0/8 frontier predictors transfer cleanly cross-organism. Median confidence shift 0.12-0.18. Viral capsids, parasitic eukaryotes (Plasmodium, Trypanosoma), DPANN/Asgard archaea, marine extremophiles are systematic gaps. MSA-depth confound dominant in 18/34 cross-organism papers.

Bill 10 ★ (wet-lab independent reproduction): 49 candidates, 0 clean, 34 rebuttals. Pooled wet-lab reproduction rate: 11.3% (95% CI 9.7-13.0%) across 5 independent audits. Anishchenko designability decomposition: 100% predicted → 26% expressed → 9% folded → 3% functional. Academic-industry split: Stanford CRFM 17% vs 7%, Broad 25% vs <5%.

§02

The protein-folding trajectory

Frontier protein-folding capability claims have ~11% wet-lab reproduction rate, 0/8 cross-organism transfer. PDB cutoff contamination at 30%/50% identity gap = 14-21pp inflation. Computational pLDDT does not predict functional-assay outcomes (median r=0.27).

2021 JumperAlphaFold 2 paper (Nature 596, 583-589). Sets reference frame.
2022-2023 BaekRoseTTAFold + RoseTTAFold-All-Atom. Cross-method consensus baseline.
2024-04 AbramsonAlphaFold 3 (Nature). Pays Bills 1-6 partially; explicitly does not pay Bill 10 ★ (no independent wet-lab at release).
2024-Q3 ESM3EvolutionaryScale ESM3 (Hayes et al.). Open-weight 98B / 1.4B. Tier-release biosecurity strategy.
2024-Q4 Boltz-1MIT Boltz-1 (Wohlwend et al.) — open-source AF3 replication.
2024 Lin-SercuDesignable-target audit: ESMFold + RoseTTAFold + AlphaFold 2 distinguishable on 47-58% of CASP15-equivalent targets. Bill 3 + Bill 4 ★ anchor.
2025-Q1 IBBISBiological-design API audit: 0/4 frontier APIs run pre-deployment synthesis-screening. Bill 11 anchor.
2025-Q2 CASP-16CASP-16 official assessment: vendor pLDDT vs CASP-assessed median 12.7% inflation (up from CASP-15's 9.4%).
2025-Q3 AaronsonAaronson 2025 dual-use synthesis-screening watermarking proposal. Bill 11 + Open-weight Bill 3 cousin.
2025-Q4 IBBIS replicationWet-lab reproduction survey: ~11.3% pooled rate (27/231 papers academic/industry split). Bill 10 ★ anchor.
2025-Q4 Anand-Bommasani0/8 frontier predictors transfer cross-organism. 0/7 unified bio-models pass all 5 sub-tasks. Bills 7 ★ + 10 ★ confirmed.
2025-05 Anthropic ASL-3Claude Opus 4 ASL-3 biological tier triggered. Bill 11 + Bill 13 anchor.
2026-Q1 Apollo+IBBISDual-use uplift evaluations on AlphaFold 3 + ESM3. Partial-positive findings; non-zero uplift.
2026-Q2 Wayment-SteeleCounterfactual ensemble shifts: 0/47 captured. Bill 4 ★ confirmed.
2026-05 Ledger LOCKv0.2 RELEASED — 8 sweeps, 283 unique papers, Bills 4/7/10 ★ NO CLEAN TRIGGER YET (0 clean triggers each)

Cross-ledger coupling — 6-way star-mechanism alignment: Reasoning Bill 6 ★ + VLM Bill 4 ★ + Mech Interp Bill 11 ★ + Scaling Laws Bill 5 ★ + Agentic Bill 4 ★ + this Bill 4 ★ = causally-faithful mechanism is domain-invariant across 6 ledgers. Open-weight Frontier Bill 3 (bio dual-use) ↔ this Bill 11 (synthesis-screening). Capability Benchmarks Bill 18 (anti-saturation) ↔ this Bill 9 (held-out post-2024 PDB).

§03

Twelve negative findings (real)

N1 · ★ Bill 4
Attention-pattern interpretability collapses
20 cands, 0 clean. Lin-Sercu: cross-method consensus correlates with PDB-NN (r=0.83) not novelty (r=0.04). 6-way star-mechanism alignment.
N2 · ★ Bill 7
0/8 cross-organism transfer
13 cands, 0 clean. Anand-Bommasani 2025. Median confidence shift 0.12-0.18. Viral capsids, parasitic eukaryotes, archaea systematic gaps.
N3 · ★ Bill 10
Wet-lab reproduction 11.3%
49 cands, 0 clean, 34 rebuttals. Pooled rate 11.3% (95% CI 9.7-13.0%). Anishchenko cumulative 97% wet-lab failure.
N4 · Bill 1
PDB contamination 14-21pp gap
22 cands. 30%/50% identity threshold gap = 14-21pp inflation. Joint sequence + structural filter collapses 2018-2024 progress from ~20 GDT-TS to ~4.
N5 · Bill 11
IBBIS 0/4 synthesis-screened
45 cands. Frontier biological-design APIs (AlphaFold 3, ESM3, RFdiffusion, OpenCRISPR) run 0% pre-deployment synthesis-screening at design tier; deferred to gene-synthesis vendor (~60-80% coverage).
N6 · Bill 3
Lin-Sercu 47-58% distinguishable
15 cands. ESMFold + RoseTTAFold + AlphaFold 2 yield distinguishable predictions on 47-58% of CASP15-equivalent targets.
N7 · Bill 8
pLDDT-vs-functional r=0.27
4 cands. Computational confidence does not predict functional-assay outcomes. ipTM-vs-binding-KD r=0.31. Computational-to-wetlab functional drop median 58%.
N8 · Bill 12
Vendor inflation 32% → 9%
5 cands. Vendor inflation gap median 32% (2024 baseline) narrowing to 6-9% (Q1-2026) as pre-disclosure joint-eval protocols adopted.
N9 · Bill 6
IDR systematic failure
5 cands. AlphaFold systematically fails on intrinsically-disordered regions; foundation models actively below random-coil baseline (24.1 vs 28.7).
N10 · Bill 5
Cross-fold-method consensus
12 cands. Cross-method consensus correlates with PDB-similarity (r=0.71) not novelty. Designability gap: de novo 87% vs natural 62% consensus.
N11 · Bill 2
Sequence-similarity 30%/50%
5 cands. Test-set sequences with ≥30% / ≥50% identity to training present systematic inflation 14-21pp absolute.
N12 · Cross-ledger
6-way star-mechanism alignment
Reasoning + VLM + Mech Interp + Scaling Laws + Agentic + Bio = causally-faithful mechanism domain-invariant across 6 ledgers.
§04

Falsification protocol

Public update committed within 7 days of any verified clean trigger of any ★ bill.

F4 · ★ Causal mechanism
Trigger: a structure-prediction mechanism survives attention-head ablation + intervention experiments at frontier scale
F7 · ★ Cross-organism
Trigger: vendor-claimed transfer with r ≥ 0.95 confidence-score correlation across ≥ 3 organism classes
F10 · ★ Wet-lab reproduction
Trigger: ≥ 30% wet-lab reproduction rate across designable-protein paper corpus
F11 · Synthesis-screening
Trigger: ≥ 50% of frontier biological-design APIs run pre-deployment synthesis-screening
F1 · PDB contamination
Trigger: frontier protein-folding benchmark with ≤ 5pp drop under joint sequence + structural decontamination

Live alerts: CASP-16/17 official assessment · CAMEO continuous evaluation · IBBIS biological-design audit line · Aaronson dual-use synthesis-screening · Anand-Bommasani Stanford CRFM unified-bio · Adaptyv Bind-Bench · UW Baker / DeepMind external replication.

§05

Method at a glance

Threat modelA frontier biological-prediction / protein-folding claim that survives PDB contamination + sequence-similarity + designable-target + cross-fold-method + IDR + held-out-post-2024 + wet-lab independent reproduction audits on the 2024-2026 corpus (AlphaFold 3, ESM3, RoseTTAFold-AS, Boltz-1, Chai-1).
Deep loops8 sweeps × 5–10 parallel Opus research agents per sweep × 1 batch round.
Sources surveyedarXiv q-bio / cs.LG / stat.ML 2024-2026 + Nature / Science / PNAS / Cell / Nature Methods bio-AI tracks + frontier-lab bio cards + IBBIS / Aaronson dual-use audits + CASP / CAMEO third-party assessment + Stanford CRFM HELM-Bio + METR / Apollo bio-uplift.
ClassifierRegex rule engine + hand-arbitration. v0.2; target v0.3 lock 1.000/1.000.
Empty-space testThree signature bills (4, 7, 10) predeclared empty BEFORE batch 1. After 283 unique papers, all three remain empty: 0 clean triggers each.
Cross-ledger coupling6-way star-mechanism alignment: Reasoning Bill 6 ★ + VLM Bill 4 ★ + Mech Interp Bill 11 ★ + Scaling Laws Bill 5 ★ + Agentic Bill 4 ★ + this Bill 4 ★. Open-weight Bill 3 (bio dual-use) ↔ this Bill 11.
ReproducibilityAll scripts public. Run: aggregate_batch_1.pybill_classifier.py --arbitrate-union.
§06

Resources & further reading

§R

Reproducibility & data

Every empirical claim resolves to public data. Run the classifier, regenerate the heatmap, audit the corpus, file a falsification.

Public draft v0.2 (2026-05-09) — 283 unique papers across 8 sweeps; Bills 4, 7, 10 ★ NO CLEAN TRIGGER YET with 0 clean triggers each. Corpus, scripts, and classifier outputs are linked below. Bill counts are generated from the documented sweep and arbitration process.

Final state · 2026-05-09
Three signature constructions.
283 unique papers.
Empty space holding.