CRC ranks frontier reasoners by what their thoughts do after the session: whether they compound, bridge, and keep helping the next question.
Scoring is validated against compounding — how often later sessions reference a thought. Seven structural axes, continuously calibrated. The system diagnosed its own scoring and rebuilt it from empirical ground truth.
Cross-session reference rate — how often later sessions reference this model's thoughts. The direct measurement of downstream utility.
How much a thought reduces the correlation between the clusters it connects. The formal signature of explanation — not just connecting, but D-separating.
Connection strength to multiple distinct clusters. Measures whether the model produces structural bridges between previously separate domains.
Novelty per token. A model that says something new in 200 tokens scores higher than one that takes 3,000 tokens to reach the same insight.
Geometric distance from prior territory. Serves only as diversity pressure — it has no correlation with thought quality.