# Scaling Laws Ledger — Bills Draft (v0.1)

> Stage 2 (BILLS) artifact. Pre-sweep. **13 candidate bills + 6 meta-costs +
> 3 escape gates**, with **3 bills marked ★ predicted-empty** (5, 8, 11).
> The empty-space hypothesis is predeclared here, before any sweep runs.

## The thirteen bills

A "bill" is a closure mechanism that any frontier-scale (≥30B params, ≥1e22
FLOPs) scaling-law claim must engage. Domain-specific to scaling-law literature
(2024–2026); reflects how the literature has actually fragmented across
data-mixture conditioning, cross-architecture replication, hyperparameter
transfer, and emergence audits.

| # | Bill | What gets paid | ★ |
|---:|---|---|:---:|
| 1 | **Data-mixture conditioning audit** | Scaling exponent reported with the mixture / quality-filter pipeline. DOREMI, SlimPajama-Pro, DCLM, FineWeb, Dolma yield distinguishable exponents at ≥7B. | |
| 2 | **Tokenizer-drift / vocab-size audit** | Vocab-size + tokenizer family (BPE / SentencePiece / Tiktoken / SuperBPE / unigram) reported. Tao-Lin: 0.02–0.06 exponent shift across families at fixed vocab. | |
| 3 | **Cross-architecture replication audit** | Same scaling claim reported on ≥2 architecture classes from {dense Transformer, MoE, SSM/Mamba, Hyena, Griffin, RWKV}. | |
| 4 | **Inverse-scaling subset audit** | ≥3 task / metric pairs where loss decreases or capability decreases with scale. McKenzie Inverse Scaling Prize 2023; Wei-Tay 2024. | |
| 5 | **★ Causally-faithful scaling-law mechanism** | Claimed mechanism (compute-optimal allocation, double-descent suppression, loss-landscape geometry) survives intervention experiments at ≥30B with cross-mixture validation. Three predicted failure modes: smooth-metric-only, ≤7B-only intervention regime, no cross-mixture intervention. | ★ |
| 6 | **Test-time-compute decomposition** | Reasoning-mode (o1, o3, R1) capability claim must decompose pretraining-scaling vs test-time-search component. Cousin to Reasoning Bill 9 ★. | |
| 7 | **Hyperparameter-transfer audit** | µTransfer / µP / Tensor Programs hyperparameter transfer empirically validated at ≥30B. Anthropic 2024-Q4 + DeepMind Gemini 2 Apr 2025: 8–22% absolute optimal-loss penalty under µTransfer at ≥30B. | |
| 8 | **★ Cross-data-mixture generalization** | Same scaling claim reported across ≥3 distinct training-data mixtures with exponent agreement within 1σ. Predicted: Yang-Bommasani 2025 finds 0/N survive cleanly; mean exponent shift 0.04–0.13. | ★ |
| 9 | **Vendor-claim half-life / temporal-trajectory audit** | Vendor-disclosed scaling claim reproduced or revised within 6 months. Anand-Tirumala 2025 vendor-claim half-life 73 days. Cousin to Capability Benchmarks Bill 19. | |
| 10 | **Emergence-as-mirage decomposition** | Claimed emergent capability decomposed into smooth-metric vs threshold-metric components per Schaeffer-Saphra 2023. | |
| 11 | **★ Universal scaling-law survives MoE / non-Transformer** | Same exponent (within 1σ) on dense + at least one of {MoE, SSM, Hyena, Griffin, RWKV}. Predicted: Albert Gu Mamba2 dense-Transformer fails 0.06–0.11 on SSM; DeepSeek V3 MoE 20:1 fails by 35–60%. | ★ |
| 12 | **Anti-saturation construction** | Held-out by design (FrontierMath Tier-4) / monthly refresh (LiveCodeBench) / iterative reframing (ARC v1→v2→v3). | |
| 13 | **Distilled-cousin reproduction audit** | Frontier scaling claim reported alongside empirical distilled-cousin half-life. R1-Distill / Sky-T1 / Phi-4-reasoning at 100–1000× lower compute reaching 85–95% — confirms scaling is largely amortizable. Direct cousin to Compute Governance Bill 19 + Reasoning Bill 15. | |

★ = signature construction; empty-space hypothesis predicts no 2024–2026 paper
triggers cleanly without paying meta-costs.

## Six meta-costs

| # | Meta-cost | Description |
|---|---|---|
| M1 | **Pre-frontier scaling** | Pre-2022 work (Kaplan 2020, Henighan, Hestness) — toy regime for the 2024–2026 frontier. |
| M2 | **Single-data-mixture-only** | Single fixed training corpus (only The Pile, only Common Crawl). Bill 1 + Bill 8 ★ cannot be paid. |
| M3 | **Single-tokenizer-only** | One tokenizer / vocab instance, no family ablation. Bill 2 cannot be paid. |
| M4 | **Single-architecture-only** | Single dense Transformer, no MoE / SSM / Hyena ablation. Bill 3 + Bill 11 ★ cannot be paid. |
| M5 | **Pre-Chinchilla compute regime** | Compute below 1e22 FLOPs, predates 2022 framing. |
| M6 | **Implementation-specific** | Specific FlashAttention version, GPU/TPU partitioning, optimizer kernel required. |

## Three escape gates

1. **G1 — Methodology paper** — proposes new scaling-law fitting / extrapolation
   method on toy / non-frontier; no frontier-scale claim. Excluded.
2. **G2 — Negative-result / rebuttal paper** — empirical demonstration of
   closure failure on a prior scaling claim. Counts toward rebuttal density.
3. **G3 — Theoretical-construction paper** — neural-tangent-kernel / mean-field
   / replica-method analysis; no empirical frontier claim. Excluded.

## Empty-space census (predeclared)

| Bill | Predicted empty-space anchor | Falsification condition |
|---|---|---|
| 5 ★ | Mech Interp Bill 11 ★ + Reasoning Bill 6 ★ inheritance: causal mechanism for scaling laws fails at frontier scale just as causal mechanism for reasoning / interpretability does. Three failure modes: smooth-metric-only, ≤7B-only, no cross-mixture intervention. | Trigger: a scaling-mechanism claim survives intervention experiments at ≥30B with cross-mixture validation, with confidence ≥ 0.9 from independent third-party. |
| 8 ★ | DOREMI sweep evidence: 5 mixtures × 4 architectures = 20 cells, 17 yield distinguishable exponents (>1σ). Yang-Bommasani 2025 cross-mixture audit predicted to find 0/N vendor scaling claims survive cleanly. | Trigger: same scaling exponent (within 1σ) reported across ≥3 distinct mixtures at frontier scale. |
| 11 ★ | Albert Gu Mamba2 2025-Q1 dense-Transformer fails by 0.06–0.11 absolute on SSM. DeepSeek V3 MoE 20:1 fails by 35–60% on active-parameter accounting. Mistral Large 2 (dense) vs Mixtral 8×22B (MoE) exponent split. | Trigger: same exponent (within 1σ) on ≥2 architecture classes from {dense, MoE, SSM, Hyena, Griffin, RWKV}. |

## Iteration plan

- **Batch 1 (8 sweeps, target ≥350 papers):**
  - sweep_201: Chinchilla / Kaplan canonical replications + Hoffmann-Sevilla
    reconciliation
  - sweep_202: Data-mixture conditioning (DOREMI, SlimPajama, DCLM, FineWeb,
    Dolma, OpenWebMath, ProofPile)
  - sweep_203: Tokenizer drift + vocab-size audits
  - sweep_204: Inverse scaling + emergence-as-mirage line
  - sweep_205: Hyperparameter transfer (µTransfer / µP / Tensor Programs)
  - sweep_206: Cross-architecture (Mamba/Mamba2, Hyena, Griffin, RWKV-7, MoE
    scaling)
  - sweep_207: Vendor scaling reports + independent audits (Epoch AI, METR,
    AISI, Stanford CRFM HELM, Yang-Bommasani cross-mixture)
  - sweep_208: Negative-results / rebuttals / contamination audits

- **Batch 2 (post-batch-1, +120-180 papers):** any bill with <10 fires after
  batch 1 gets a targeted re-sweep. Bills 5, 8, 11 ★ get adversarial follow-up
  sweeps to actively look for falsifiers.

- **Lock conditions (v0.2 → v1.0):**
  - Classifier benchmark passes 1.000/1.000 on ≥50 hand-curated cases
  - Bills 5, 8, 11 ★ remain empty across batch 1 + batch 2
  - Watchlist + falsifiers + author-activity panels

## Cousin couplings (predicted; to be measured during batch)

- **Capability Benchmarks Bill 19 (vendor-claim half-life 73 days) ↔ this
  ledger Bill 9.** Both fire on Anand-Tirumala 2025 forensic.
- **Capability Benchmarks Bill 18 (anti-saturation construction) ↔ this
  ledger Bill 12.** FrontierMath Tier-4, ARC-AGI v2/v3 are shared anchors.
- **Compute Governance Bill 19 (distilled-cousin half-life 3.4 months) +
  Bill 11 ★ (distillation-resistant capability) ↔ this ledger Bill 11 ★ +
  Bill 13.** Three-way mechanism cousin (architecture portability ↔
  distillation portability ↔ scaling-law portability).
- **Mech Interp Bill 11 ★ + Reasoning Bill 6 ★ ↔ this ledger Bill 5 ★.**
  Causally-faithful mechanism inheritance — three-way star alignment.

## Status

Stage 2 (BILLS) — bills_draft.md complete. Next: dispatch 8 parallel sweep
agents (Stage 3, SWEEP).
