Reasoning / Chain-of-Thought Data Receipts
Public Draft v0.2 REAL DATA
Static reproducibility landing page for the reasoning / chain-of-thought ledger. This is the first ledger in the v0.2 wave produced via the full deep-loop methodology: 9 parallel Opus research-agent sweeps yielded 426 raw papers, deduplicated and hand-arbitrated to 394 unique. Bills 6, 9, 12 ★ NO CLEAN TRIGGER YET (0 clean triggers each across 394 papers). Rebuttal density 27.9%.
Receipts
| Artifact | Link | Purpose |
|---|---|---|
| Bill definitions | bills_draft.md | 15 bills + 6 meta-costs + 3 escape gates + ★ Bills 6/9/12 empty-space verification with real fire counts. |
| Threat model | purpose.md | Threat model, scope, empty-space hypothesis, cousin-ledger coupling. |
| Corpus union JSON | _batch_1_union.json | 394 unique papers (deduplicated from 426 raw across 9 sweeps), with full metadata + candidate_bill / verdict / confidence. |
| Classifier | bill_classifier.py | Regex rule engine + hand-arbitration. Run with --arbitrate-union or --benchmark. |
| Benchmark cases | bill_classifier_benchmark.json | 50 hand-curated cases. Target v0.3 lock 1.000/1.000. |
| Aggregator | aggregate_batch_1.py | Deduplicates raw sweep JSONs into the corpus union. |
| README | README.md | Reproducibility README with the run order. |
Real fire counts
| Bill | Cands. | Clean triggers | Rebuttals | Gated |
|---|---|---|---|---|
| 1 — CoT-faithfulness validation | 29 | 18 | 10 | 1 |
| 2 — Test-time-compute disclosure | 10 | 10 | 0 | 0 |
| 3 — Cross-benchmark transfer | 7 | 4 | 1 | 2 |
| 4 — Adaptive-prompt stability | 6 | 0 | 4 | 2 |
| 5 — Trajectory contamination | 7 | 4 | 3 | 0 |
| 6 ★ — Causally-faithful mechanism | 4 | 0 | 3 | 1 |
| 7 — Strong-baseline classical | 1 | 1 | 0 | 0 |
| 8 — Adversarial / scheming | 28 | 18 | 9 | 1 |
| 9 ★ — Test-time-search vs reasoning | 41 | 0 | 31 | 10 |
| 10 — Vendor self-eval independence | 30 | 27 | 3 | 0 |
| 11 — Anti-saturation construction | 54 | 41 | 12 | 1 |
| 12 ★ — Universal task coverage | 1 | 0 | 1 | 0 |
| 13 — Capability-cost transparency | 3 | 2 | 0 | 1 |
| 14 — Reward-hacking dual-mode | 16 | 12 | 4 | 0 |
| 15 — Distilled-cousin reproduction | 52 | 50 | 2 | 0 |
Public draft v0.2 (2026-05-09). Sweep JSONs (sweep_101..sweep_109) live in the source repo at ProjectForty2 public evidence bundle: reasoning_cot/deep_loops/. Target v0.3 lock 2026-Q3 with classifier 1.000/1.000 + watchlist + falsifiers + author-activity.