# Hardware Inference Stack Ledger — Purpose

## Threat model

Demonstrate a frontier inference-stack capability claim — on a system marketing
high-throughput / low-latency / low-cost inference of frontier models (vLLM,
SGLang, TensorRT-LLM, Triton, NVIDIA Dynamo, Groq, Cerebras WSE-3 / CS-3,
SambaNova SN40L, Tenstorrent Wormhole, Lightmatter, Etched Sohu, Rain.ai,
Amazon Trainium / Inferentia 3, Google TPU v5p / v6 / Ironwood, Microsoft
Maia 100/200, OpenAI custom silicon, AMD MI300X/MI350, NVIDIA B100/B200/H200,
plus inference-framework optimizations: speculative decoding, MoE serving,
KV-cache compression, FP8/FP4 quantization, ONNX runtime, llama.cpp, ggml,
MLX, Ollama, LM Studio) — that survives six closure audits on the 2024-2026
corpus: **(1) benchmark-versus-real-workload audit (TTFT, tokens/sec at
contended load), (2) cost-per-token transparency (true unit economics
disclosed), (3) frontier-model fidelity audit (quantized inference doesn't
silently degrade quality), (4) batch-vs-streaming behavior, (5) commercial
availability vs research-preview gap, (6) closed-vendor benchmark cherry-pick
audit.**

## Bridge-test specifics

This ledger is the **purest test of the rescoped B7** (commercialization-vs-
research axis, not geopolitical). Open-source inference (vLLM, SGLang,
llama.cpp, MLX) vs closed-vendor inference (Groq, Cerebras, SambaNova,
Lightmatter, Etched). Western open (vLLM, SGLang, AMD ROCm) vs Western
closed (Groq, Cerebras) vs Chinese hardware (Huawei Ascend, Cambricon,
Biren) — three corners that test whether the rescoped B7 holds at the
hardware layer.

## Empty-space hypothesis (predeclared)

- **Bill 5 ★** — Cross-vendor benchmark stability. Same frontier model
  reaches ≤10% TTFT / throughput variance across vLLM / SGLang / TensorRT-
  LLM / Groq / Cerebras under matched configuration. Predicted empty.
- **Bill 8 ★** — Quantization fidelity audit. INT4 / FP4 / INT8 quantized
  frontier model retains ≥95% of FP16 capability under independent
  third-party eval. Predicted empty (Apollo / METR / AISI quantization
  audits expected to surface clean degradation).
- **Bill 11 ★** — Universal inference-platform coverage. Single open-source
  inference framework runs all frontier open-weight models (Llama 4 +
  DeepSeek V3 + Qwen 3 + Mistral Large 2) at ≤20% performance variance.
  Predicted empty.

## Status: Stage 1 (SCOPE).