# Scaling Laws Ledger — Purpose

## Threat model (one paragraph)

Demonstrate a frontier-scale (≥30B parameters, ≥1e22 training FLOPs) scaling-law
claim that survives six closure audits on the 2024–2026 corpus: **(1) data-mixture
conditioning, (2) tokenizer-drift / vocab-size sensitivity, (3) cross-architecture
replication on at least two of {dense Transformer, MoE, SSM/Mamba, Hyena, Griffin,
RWKV}, (4) inverse-scaling subset audit, (5) emergence-as-mirage decomposition, (6)
held-out distribution-shifted test construction.** A clean trigger requires
independent third-party verification (Epoch AI / METR / Apollo / AISI / Stanford
CRFM HELM) within 6 months of the vendor scaling claim.

## Why this ledger exists

Scaling-law claims are policy inputs. **Compute thresholds** (NIST AI RMF, EU AI
Act 10²⁵, BIS Diffusion Framework, UK AISI capability-eval) are denominated in
FLOPs, but FLOP-vs-capability transfer is itself a scaling-law claim. **Capability
forecasting** (METR HCAST 7-month doubling-time, Anthropic RSP, OpenAI Preparedness,
DeepMind FSF) extrapolates from scaling. **Distillation policy** (compute-governance
Bill_19) assumes downstream cousins inherit upstream scaling — but this is empirical
with a 3.4-month median half-life. If the scaling-law claim is not closed under
cross-architecture / cross-mixture audit, the entire policy stack rests on a
foundation that fails its own falsification tests.

The literature has fragmented across:
- The Chinchilla-Kaplan reconciliation line (Hoffmann 2022 → Sevilla / Besiroglu
  2023-2024 corrections) — cousin to compute-governance
- The DOREMI / SlimPajama / DCLM data-mixture line — establishing that scaling
  exponents shift 0.04–0.13 across mixtures
- The Albert Gu Mamba2 / SSM cross-architecture line (2024-2025) — establishing
  that dense-Transformer scaling exponents fail by 0.06–0.11 on SSMs
- The DeepSeek V3 MoE scaling line — establishing that 20:1 Chinchilla ratio fails
  by 35–60% on MoE active-parameter accounting
- The Yang µTransfer line — establishing that hyperparameter transfer breaks at
  ≥30B by 8–22% absolute optimal-loss penalty
- The Schaeffer-Saphra emergence-as-mirage rebuttal line — establishing that
  emergent-capability claims are metric-induced

## Scope (in)

- Frontier scaling-law papers (Kaplan, Hoffmann, OpenAI scaling reports,
  DeepMind Gemini scaling, Anthropic scaling, Llama 3.1 paper, DeepSeek V3
  technical report)
- Cross-architecture replications (Mamba, Mamba2, Hyena, Griffin, RWKV-7,
  Mixture-of-Experts variants)
- Data-mixture conditioning audits (DOREMI, SlimPajama-Pro, DCLM, FineWeb,
  Dolma, OpenWebMath, ProofPile)
- Tokenizer-drift studies (Tao-Lin BPE/SentencePiece, SuperBPE)
- Hyperparameter-transfer (µTransfer, µP, Tensor Programs, depth-mu transfer)
- Inverse-scaling line (McKenzie Inverse Scaling Prize, Wei-Tay U-shaped follow-on)
- Emergence-as-mirage line (Schaeffer-Saphra, follow-ons through 2026)
- Vendor-claim half-life forensic (Anand-Tirumala 2025) applied to scaling
- Independent third-party scaling audits (Epoch AI, METR, AISI, Stanford CRFM,
  Yang-Bommasani cross-mixture)

## Scope (out — meta-costs)

- Pre-2022 scaling work (Kaplan original, Henighan, etc.) is M1 (toy regime)
- Single-architecture-only (only dense Transformer) is M4
- Single-data-mixture-only (only The Pile) is M2
- Single-tokenizer-only (only Llama BPE) is M3
- Compute-budget below 1e22 FLOPs is M5 (pre-Chinchilla regime)
- Implementation-specific scaffold (FlashAttention version, GPU kernel) is M6

## Empty-space hypothesis (predeclared, before any sweep)

We predict that **no 2024–2026 paper triggers Bills 5, 8, or 11 cleanly**:

- **Bill 5 ★** — Causally-faithful scaling-law mechanism. The claimed mechanism
  (e.g., "compute-optimal allocation," "double-descent suppression," "loss-landscape
  geometry") survives intervention experiments at ≥30B with cross-mixture validation.
  Direct cousin to Mech Interp Bill 11 ★ + Reasoning Bill 6 ★.
- **Bill 8 ★** — Cross-data-mixture generalization. Same scaling claim reported
  across ≥3 distinct training-data mixtures (SlimPajama / FineWeb / Dolma / DCLM
  / OpenWebMath) with exponent agreement within 1σ. Yang-Bommasani 2025 cross-
  mixture audit predicted to find 0/N vendor scaling claims survive cleanly.
- **Bill 11 ★** — Universal scaling-law survives MoE / non-Transformer architectures.
  Same exponent (within 1σ) reported on dense Transformer + at least one of
  {MoE, SSM, Hyena, Griffin, RWKV}. Albert Gu Mamba2 2025-Q1 predicted to find
  dense-Transformer scaling fails by 0.06–0.11 absolute on SSM.

We expect at least one new bill to emerge from batch-1 evidence (likely a
hyperparameter-transfer or architecture-class portability bill, paralleling
the structural pattern in Compute Governance Bill_19 NEW).

## Cousin ledgers (predicted couplings)

- **Capability Benchmarks Bills 17 + 19** ↔ this ledger Bill 12 (anti-saturation)
  + Bill 9 (vendor-claim half-life): METR HCAST 7-month doubling shared anchor
- **Compute Governance Bill 19** (distilled-cousin half-life 3.4 months) ↔ this
  ledger Bill 11 ★ (architecture-class portability) + Bill 13 (distilled-cousin
  reproduction)
- **Mech Interp Bill 11 ★** + **Reasoning Bill 6 ★** ↔ this ledger Bill 5 ★
  (causally-faithful mechanism) — three-way cousin chain
- **Inference-time Safety Bill 19** (CoT-monitorability) ↔ this ledger Bill 6
  (test-time-compute decomposition) — overlapping audit surface

## Authorship

Kevin Russell (Project 42).
Pre-publication draft, do not cite without permission.

## Status

Stage 1 (SCOPE) — purpose written, schema next, bills_draft next, then sweep.
Target v0.2 lock: 2026-Q3 with classifier 1.000/1.000 + watchlist + falsifiers.
