A real-data falsification harness for 2024–2026 frontier multimodal generation claims — image + video + audio. ★ Bills 5 (causally-faithful generation mechanism), 8 (cross-modality unified generation), 11 (held-out compositional generalization) HOLD pre-Stage-3.5 with 0/8, 0/9, 0/39 firings. Strong B7/B8 BIPOLAR signal in the 19-ledger atlas: Bill 9 (vendor-self-eval-independence) fires 78 times in closed-cloud cluster (Sora / Veo / MJ); Bill 12 (commercialization-axis) fires 74 times in open-source cluster (SD3 / Flux / HunyuanVideo). 8 deep-loop sweeps × 377 unique papers across vendor system cards + compositional benchmarks + physics-consistency audits + B7 commercialization-axis bridge test + independent third-party audits.
AI now generates images, video, and music — we checked which generators do what their marketing says.
Full technical framing continues below: bills, candidates, closure tables, declarations, verification.
Bills are the closure mechanisms any 2024–2026 frontier multimodal generation capability claim must engage. The 13 bills below were predeclared in bills_draft.md v0.1 before any sweep ran, calibrated to the structure of the multimodal generation literature (vendor system cards, prompt-leakage contamination, attribute-faithfulness, text-rendering, physics-consistency, cross-resolution, held-out compositional benchmarks, commercialization-axis, safety / NSFW / deepfake / copyright). Bills 5, 8, 11 are ★ — empty-space hypothesis predicts no clean trigger without paying a meta-cost.
Bill 5 ★ (causally-faithful generation): 8 candidates, 0 clean. Direct extension of LLM-centric Bridge 1 to the diffusion / autoregressive generation substrate. Closure requires intervention experiments; current audits are observational.
Bill 8 ★ (cross-modality unified): 9 candidates, 0 clean. Frontier "omni" models (GPT-4o, Gemini Live, Veo 3) marketed as unified but evaluated subset-only.
Bill 11 ★ (held-out compositional): 39 candidates, 0 clean. Largest active ★ bill. Models excel at 1–3 sub-tasks and degrade on the rest.
Bills 9 (vendor-self-eval-independence) and 12 (commercialization-axis) form a striking bipolar firing pattern:
Closed-cloud cluster (Bill 9 fires 78 times): Sora / Sora 2, Veo 2 / 3, Imagen 3, DALL-E 3, Midjourney v6 / v7, Adobe Firefly 3, RunwayML Gen-3 / Gen-4, Pika 2.0, Luma Dream Machine, Kling, Hailuo MiniMax, Suno v3 / v4, Udio, ElevenLabs v3, Tencent HunyuanVideo (cloud product). Marketing-grade vendor-self-evaluation + benchmark cherry-pick + sub-set reporting dominates.
Open-source cluster (Bill 12 fires 74 times): Stable Diffusion 3 / 3.5, SDXL Turbo, Flux dev / pro / schnell, MusicGen, HunyuanVideo (open-weights), Genmo Mochi, Bytedance MagicAnimate. Open commercialization-axis + reproducible weights + community evaluation dominates.
This is the strong closed-vs-open split observed in the 19-ledger atlas — and the bipolar signature precisely validates the cross_ledger_bridges B7 RESCOPING (commercialization-vs-research, not geopolitical) + B8 emergent bridge (commercialization-vs-research-artifact axis). Hardware_inference is the only ledger with cleaner separation (strong 0/N signal).
Status: ledger populated; verification status: partial. The ledger ran 8 deep-loop sweeps × 377 unique papers and produced a strong B7/B8 bipolar signal in the current cross-ledger atlas. Stage 3.5 verification queue pending — priority pool of ~30 ★-bill candidate IDs (10 per ★ bill) plus 20 sweep-health spotchecks dispatching against arxiv-API.
The empty-space hypothesis is less sensitive to typical source-ID errors: closure mechanisms are structural (causally-faithful generation requires intervention experiments; cross-modality unified requires per-modality balanced training; held-out compositional requires per-sub-task balanced evaluation), not contingent on individual paper IDs. The B7/B8 bipolar finding, by contrast, is an architectural observation from the 78-vs-74 firing pattern — less dependent on individual source-ID verification.
The ledger tracks frontier multimodal generation capability claims by vendor / model lineage across image, video, and audio. The closed-cloud / open-source cluster boundary creates the bipolar B7/B8 signal.
The frontier multimodal generation literature splits sharply between closed-cloud vendor products and open-source releases. Bill 9 (vendor-self-eval) catches the closed-cloud reporting gap; Bill 12 (commercialization-axis) catches the open-source community evaluation reporting. The 78-vs-74 bipolar firing is a strong signal of the B7 RESCOPING + B8 emergent bridge across the current cross-ledger atlas.
Each ★ bill becomes a checkable trigger condition. Public update committed within 7 days of any verified clean trigger of F5, F8, or F11.
Live triggered watchlist: VBench / VBench-Physics quarterly releases · T2I-CompBench / GenAI-Bench / SeedBench-2 held-out refresh · METR / AISI / Apollo independent multimodal audits · frontier vendor system-card revisions (Sora / Veo / MJ / SD / Flux / Hunyuan) · open-weight diffusion releases. Monthly cadence: vendor system-card revisions + open-source releases. Quarterly: benchmark refreshes + independent audits.
Every empirical claim resolves to public data. Run the classifier, regenerate the heatmap, audit the corpus, file a falsification.
Public draft v0.2 (2026-05-15) — 377 unique papers across 8 sweeps; ★ Bills 5, 8, 11 HOLD pre-Stage-3.5. Strong B7/B8 BIPOLAR signal in the 19-ledger atlas. Real-data output from real Opus research-agent sweeps; bill counts and ★ positions emerge from the actual frontier multimodal generation literature, not from a template.