# Robotics / Embodied AI Ledger — Purpose

## Threat model (one paragraph)

Demonstrate a frontier robotic / embodied AI capability claim — on a system
that markets autonomous physical manipulation or locomotion or driving
(Google RT-2 / RT-X, Figure 03 / Helix, OpenVLA, Physical Intelligence π0 /
π0.5, 1X RFM-1, NVIDIA GR00T, DeepMind Gato-2, Stanford Mobile ALOHA, Berkeley
Octo, Tesla Optimus / FSD V14, Waymo, Wayve, Apptronik Apollo) — that survives
six closure audits on the 2024–2026 corpus: **(1) sim-to-real generalization,
(2) demonstration-distribution shift (cherry-picked vs everyday), (3)
embodiment-cross-platform transfer, (4) long-horizon plan stability, (5)
hardware-cost / fleet-cost transparency, (6) real-world deployment beyond
demonstration set with held-out novel scenes.** A clean trigger requires
independent third-party verification (METR / Apollo / AISI / DARPA / NIST
robotics-eval) within 6 months.

## Bridge-test specifics (the cross_ledger_bridges connection)

The `cross_ledger_bridges` meta-aiwiki **predicts B1 ★ will fail extension to
Robotics**: RL-from-physics-grounded-reward should provide causally-faithful
mechanism that LLM-centric domains lack. **This ledger is the falsification
test for that bridge prediction.** If Robotics produces clean ★ Bill triggers
for causally-faithful mechanism (which we predict it WILL), we falsify Bridge
1's unrestricted form — and that falsification is the publishable result.

We therefore deliberately design Robotics with **causally-faithful mechanism
NOT marked ★ predicted-empty** — anticipating it WILL surface clean triggers.
The other ★ predictions are: sim-to-real generalization, cross-embodiment
transfer, universal task-set coverage.

## Empty-space hypothesis (predeclared)

We predict no 2024–2026 paper triggers Bills 5, 8, 11 cleanly:

- **Bill 5 ★** — Sim-to-real generalization. ≥80% of demonstrated capability
  retained on real hardware not seen during training/sim. Predicted empty
  due to systematic sim-to-real gap (camera-format, friction-model,
  gripper-dynamics).
- **Bill 8 ★** — Embodiment-cross-platform transfer. Same policy transfers
  from Google RT-2-trained robots to Boston Dynamics / Apptronik / 1X
  hardware with ≤30% capability degradation. Predicted empty.
- **Bill 11 ★** — Universal task-set coverage. Frontier embodied system
  passes all 5 sub-tasks {manipulation, locomotion, navigation, multi-step
  planning, human-interaction} above clean threshold. Predicted empty.

Bills NOT predicted empty (anticipated to fire positively):
- **Bill 4** — Causally-faithful grounded-reward RL mechanism. Anticipated
  to extend (positive trigger) — this is the bridge-falsification test.

## Scope (in)

- Frontier embodied AI capability claims 2024-2026
- Vision-Language-Action (VLA) frontier model cards
- Sim-to-real audit literature
- Manipulation benchmarks (RoboArena, LIBERO, RoboCasa, ManiSkill)
- Locomotion benchmarks (ANYmal, Spot, MuJoCo)
- Driving benchmarks (CARLA, nuScenes, Waymo Open Dataset)
- Multi-step / long-horizon evaluations
- Independent robotic-capability replications (NIST, DARPA, university labs)

## Scope (out — meta-costs)

- Pre-2024 RT-1 / RoboCat era is M1 (toy regime)
- Single-task-only is M2
- Single-embodiment-only is M3
- Sim-only-no-real-hardware is M4 (most papers)
- Demonstration-cherry-pick is M5

## Authorship

Kevin Russell (Project 42). Pre-publication draft.

## Status

Stage 1 (SCOPE).