Agentic / Tool-Use Capability Ledger Data Receipts

Public Draft v0.2 REAL DATA

8 parallel Opus research-agent sweeps yielded ~368 raw papers, deduplicated and hand-arbitrated to 368 unique. Bills 4, 7, 10 ★ NO CLEAN TRIGGER YET (0 clean triggers each). Rebuttal density 23.6%.

Receipts

Artifact	Link	Purpose
Bill definitions	bills_draft.md	14 bills + 6 meta-costs + 3 escape gates + ★ 4, 7, 10 empty-space verification with real fire counts
Threat model	purpose.md	Threat model, scope, empty-space hypothesis, cousin-ledger coupling
Corpus union JSON	_batch_1_union.json	368 unique papers (deduplicated from ~368 raw across 8 sweeps), with full metadata
Classifier	bill_classifier.py	Regex rule engine + hand-arbitration. Run with `--arbitrate-union`
Aggregator	aggregate_batch_1.py	Deduplicates raw sweep JSONs into the corpus union
README	README.md	Reproducibility README with run order

Real fire counts

Bill	Cands.	Clean	Rebuttals	Gated
1 — Tool-augmentation decomposition	12	12	0	0
2 — Multi-step trajectory contamination	21	20	1	0
3 — Cross-scaffold transfer	22	10	12	0
4 ★ Causally-faithful tool-use mechanism	33	0	4	29
5 — Adaptive-prompt / tool-naming stability	9	9	0	0
6 — Trajectory-length / horizon scaling	2	2	0	0
7 ★ Cross-benchmark generalization	34	0	11	23
8 — Adversarial / sandbox-escape audit	14	13	1	0
9 — Held-out task-set construction	21	21	0	0
10 ★ Universal task-set coverage	21	0	0	21
11 — Browser-state replay leakage	21	20	1	0
12 — Scaffold-vs-model decoupling	22	17	5	0
13 — Vendor-self-eval independence	33	27	6	0
14 — Capability-cost transparency	3	3	0	0

Public draft v0.2 (2026-05-09). Sweep JSONs live in the source repo at ProjectForty2 public evidence bundle: agentic_tool_use/deep_loops/. Target v0.3 lock 2026-Q3.