All data

Reasoning / Chain-of-Thought Data Receipts

Public Draft v0.2 REAL DATA

Static reproducibility landing page for the reasoning / chain-of-thought ledger. This is the first ledger in the v0.2 wave produced via the full deep-loop methodology: 9 parallel Opus research-agent sweeps yielded 426 raw papers, deduplicated and hand-arbitrated to 394 unique. Bills 6, 9, 12 ★ NO CLEAN TRIGGER YET (0 clean triggers each across 394 papers). Rebuttal density 27.9%.

Receipts

ArtifactLinkPurpose
Bill definitionsbills_draft.md15 bills + 6 meta-costs + 3 escape gates + ★ Bills 6/9/12 empty-space verification with real fire counts.
Threat modelpurpose.mdThreat model, scope, empty-space hypothesis, cousin-ledger coupling.
Corpus union JSON_batch_1_union.json394 unique papers (deduplicated from 426 raw across 9 sweeps), with full metadata + candidate_bill / verdict / confidence.
Classifierbill_classifier.pyRegex rule engine + hand-arbitration. Run with --arbitrate-union or --benchmark.
Benchmark casesbill_classifier_benchmark.json50 hand-curated cases. Target v0.3 lock 1.000/1.000.
Aggregatoraggregate_batch_1.pyDeduplicates raw sweep JSONs into the corpus union.
READMEREADME.mdReproducibility README with the run order.

Real fire counts

BillCands.Clean triggersRebuttalsGated
1 — CoT-faithfulness validation2918101
2 — Test-time-compute disclosure101000
3 — Cross-benchmark transfer7412
4 — Adaptive-prompt stability6042
5 — Trajectory contamination7430
6 ★ — Causally-faithful mechanism4031
7 — Strong-baseline classical1100
8 — Adversarial / scheming281891
9 ★ — Test-time-search vs reasoning4103110
10 — Vendor self-eval independence302730
11 — Anti-saturation construction5441121
12 ★ — Universal task coverage1010
13 — Capability-cost transparency3201
14 — Reward-hacking dual-mode161240
15 — Distilled-cousin reproduction525020

Public draft v0.2 (2026-05-09). Sweep JSONs (sweep_101..sweep_109) live in the source repo at ProjectForty2 public evidence bundle: reasoning_cot/deep_loops/. Target v0.3 lock 2026-Q3 with classifier 1.000/1.000 + watchlist + falsifiers + author-activity.