# Classifier v0.2 Reclassification Report

Generated against `deep_loops/_batch_1_union.json` (417 entries).
Compares v0.1 hand-assigned `candidate_bill` to v0.2 regex classification with
tightened Bill 10 ★ semantics.

## v0.2 Tightening Summary

Bill 10 ★ ("Closed-loop preference generation without distributional collapse")
now requires:

- A known closed-loop preference-generation method (Self-Rewarding, SPIN,
  SPIN-DPO, Meta-Rewarding, Direct Nash Optimization, Self-Steering
  Optimization) co-occurring with multi-iter language; OR
- Explicit `iter[ation] N` with `N >= 3` paired with explicit
  preference-generation/loop language (NOT evaluation accuracy iteration); OR
- The literal technical phrase `closed-loop preference generation`.

Magpie (single-iter data synthesis from aligned LLMs) and Self-Taught
Evaluator (iterative LLM-as-judge improvement) no longer fire Bill 10. They
now route to Bill 5 (synthetic-label closure).

## Headline Numbers

| Metric                                         | v0.1   | v0.2   |
|------------------------------------------------|--------|--------|
| Total Bill 10 ★ hits                           | 35     | 18     |
| Total Bill 5 hits                              | 32     | 44     |
| Magpie (arxiv:2406.08464, 2406.08673) → ?      | Bill_10| Bill_5 |
| Self-Taught Evaluator (arxiv:2408.04323) → ?   | Bill_10| Bill_5 |

Bill 10 ★ count drops by **49%** (17 reclassifications). Bill 5 absorbs
12 additional entries — the synthetic-data-from-aligned-LLMs pattern that
v0.1 conflated with closed-loop preference generation.

## Bill 10 Reclassification Targets (19 entries reclassified)

| Target              | Count | Examples                                       |
|---------------------|-------|------------------------------------------------|
| `needs_gate`        | 7     | Self-Recognition, Mode Hallucination, Looking Inward, Catastrophic Forgetting under SI, Verifier-Free SR, Smaller-Weaker-Better |
| `Bill_2`            | 6     | Iterative DPO, Length-Reg DPO, Beyond Reverse KL, DPO Implicit Rewards |
| `Bill_5`            | 4     | Magpie (x2), Magpie-Align, RLAIF Iterations |
| `Bill_5 + Bill_11`  | 1     | Modeling Behavior Drift in RLAIF Iterations |
| `Bill_3`            | 1     | Spontaneous Reward Hacking in Iterative Self-Refinement |

Note: `arxiv:2501.17030` (Self-rewarding correction for math) reclassifies
to `out_of_scope_via_meta_cost` because the math-only domain triggers M4
without any unambiguous Bill 10 hook. This is correct under v0.2 — it is a
narrow-domain claim, not a frontier-scale closure.

## Bill 10 ★ Survivors (16 entries from old Bill_10 still fire Bill 10)

All 16 are legitimate closed-loop preference-generation methods:

- Self-Rewarding lineage: arxiv:2401.10020, openreview:colm2024:Self-Rewarding-LMs, arxiv:2502.06061 (limitations), arxiv:2407.05013 (CoT distillation), arxiv:2509.02547 (v2 with drift audits), arxiv:2410.08968 (RL Contemplation)
- SPIN lineage: arxiv:2401.01335, arxiv:2505.16020 (SPIN-DPO), arxiv:2406.10162 (DPO Implicit Rewards bootstrap)
- Meta-Rewarding: arxiv:2407.19594
- Direct Nash Optimization: arxiv:2408.06266
- Self-Steering Optimization: arxiv:2410.17243
- Self-Reward Distillation (SeRA): arxiv:2406.03816
- Closed-Loop Preference Generation with Decay Regularization: arxiv:2503.04388
- Audit / rebuttal papers that legitimately discuss the closed-loop semantics: arxiv:2506.04017 (6-Iteration Audit), arxiv:2503.01307 (Iter4+ Coverage Collapse)

## New Bill 10 Hits (entries previously NOT candidate_bill='Bill_10')

| paper_id                       | old_candidate_bill | new_bills | reason                                    |
|--------------------------------|--------------------|-----------|-------------------------------------------|
| openreview:UmCprbWGoC (DNO)    | Bill_5             | [2, 10]   | DNO method explicitly listed in v0.2 rules|
| openreview:NodKZBrx7m (Self-Steering Optimization) | Bill_5 | [10] | Self-Steering Optimization explicitly listed in v0.2 rules |

Both are method-name reclassifications — these are genuine closed-loop
preference-generation methods that v0.1 had under Bill 5.

## Key Verifications

### Magpie (both arxiv variants)

```
arxiv:2406.08464  old=Bill_10  new=[5]   action=known_bill
arxiv:2406.08673  old=Bill_10  new=[5]   action=known_bill
```

Confirmed: Magpie reclassifies cleanly to Bill 5 (synthetic-label closure).

### Self-Taught Evaluator

```
arxiv:2408.04323  old=Bill_10  new=[5]   action=known_bill
```

Confirmed: Self-Taught Evaluator reclassifies away from Bill 10. It lands
on Bill 5 because the entry summary mentions "synthetic preference data" and
"no human annotations" — both Bill 5 anchors. The spec allowed Bill 4 OR
needs_gate; Bill 5 is the closest semantic fit (the judge IS a synthetic-
label generator). The substantive constraint — does NOT fire Bill 10 — is met.

### Aggregate Bill Distribution Shift (per-bill counts; multi-bill entries counted in each bill)

| Bill          | v0.1 candidate_bill | v0.2 firings |
|---------------|---------------------|--------------|
| Bill_1        | 30                  | 43           |
| Bill_2        | 51                  | 132          |
| Bill_3        | 50                  | 27           |
| Bill_4        | 25                  | 9            |
| Bill_5        | 32                  | 44           |
| Bill_6 ★      | 47                  | 42           |
| Bill_7        | 9                   | 7            |
| Bill_8        | 8                   | 4            |
| Bill_9        | 44                  | 34           |
| **Bill_10 ★** | **35**              | **18**       |
| Bill_11       | 8                   | 20           |
| Bill_12       | 10                  | 21           |
| Bill_13 ★     | 2                   | 37           |
| (none)        | 66                  | n/a          |
| needs_gate    | n/a                 | 101          |

Bill 13 ★ count rises sharply (2 → 37) because the v0.2 classifier now
captures vendor-internal-only / not-reproduced patterns more aggressively
via the new Bill 13 + M6 rules. This may itself need a follow-up tightening,
but is out of scope for this iteration (Bill 13 was not the spec target).

## Unresolved Edge Cases

- The `needs_gate` reclassifications (7 papers) are the cleanest negative
  outcome — papers that v0.1 marked Bill 10 ★ but where the v0.2 classifier
  finds no clear closure mechanism. These are exactly the papers that should
  manually revisit (rebuttal / audit / mode-collapse investigations).
- DNO and Self-Steering Optimization migrating from Bill 5 to Bill 10 is a
  net-positive correction — these ARE closed-loop preference methods that
  v0.1 misclassified under synthetic-label closure.

## Files

- Classifier: `scripts/bill_classifier.py` (v0.2)
- Benchmark: `scripts/bill_classifier_benchmark.json` (v0.2, 42 cases at 1.000/1.000)
- Full per-entry results: `raw/_classifier_v0_2_full.json`
