Project 42

CHRONOS

Benchmarks measure what AI knows. We measure what it can discover. The only instrument for creative reasoning capacity — and the compounding dataset that comes with it.

Try CHRONOS Live Why it matters

Scroll

The Problem

Every frontier model scores 90th percentile on every benchmark. MMLU is saturated. HumanEval is saturated. The models are undifferentiated on knowledge.

The only remaining axis of competition is creative reasoning — the capacity to push past the obvious answer into territory the training data doesn't map. And until now, nobody could measure it.

We built the metric. We built the instrument. We're building the dataset — and every session generates the training signal to surgically calibrate the next generation of models.

The Moat

What CHRONOS actually does

An instrument, a dataset, and a training signal — all three get more valuable every time it runs.

Own the Metric

CRC Score — Creative Reasoning Capacity. The first quantitative metric for what benchmarks can't touch. Whoever defines how creative reasoning is measured controls the improvement cycle. We defined it. It's live. The scores update with every session.

It Thinks While You Sleep

CHRONOS doesn't wait for you. The Dreamer runs autonomous sessions — generating its own research questions from gaps in its knowledge graph, pursuing them across domains, storing discoveries, detecting contradictions, and compounding. An autonomous research engine with a growing Atlas of cross-domain insight that no human directed.

Your Hardest Question, Explored

Give CHRONOS the problem your team has been stuck on for months. Five frontier models attack it from different angles — and the exclusion zone prevents the surface-level answers you've already thought of. What comes back is the thought you hadn't considered, from the direction you weren't looking.

Every User Makes It Smarter

A biologist's session on protein folding creates territory that a physicist's session on quantum coherence bridges through. The Atlas detects cross-domain patterns no individual researcher would look for. This isn't a tool. It's a network — and every session makes it more valuable for everyone.

Train on Failure

Every session generates preference pairs: a model's surface-level response paired with a competitor's structural breakthrough on the same problem. DPO-ready training data targeting the exact cognitive deficit identified. Scored across seven axes — not vibes. Learn more →

Know Before Your Users Do

Run your model through CHRONOS after every training run. See exactly where it gets stuck, which domains it avoids, and where competitors outperform it — measured structurally, not anecdotally. Find the blind spots before they become customer complaints.

View Live CRC Rankings

The Instrument

Structural pressure forces
genuine discovery

Six frontier models share one exclusion zone. CHRONOS scores every thought on whether it explains something — not just whether it's new. The thoughts that connect ideas and reduce confusion compound. Everything else gets rejected.

When a model generates a thought, CHRONOS embeds it and measures its distance from everything already stored. If it's too close to explored territory, it gets rejected. The model is forced — geometrically — to go somewhere it hasn't been.

Five frontier models share one exclusion zone. They can't repeat each other. They can't repeat themselves. The only way through is forward — into territory the training data doesn't map.

Exclusion Zone

A growing manifold in 768-dimensional space. Every stored thought expands it. Every new thought must land outside it.

Structural Gate

Seven-axis scoring validated against what actually compounds. Measures whether a thought explains — not just whether it's new.

Soliton Bounce

Catches rejected thoughts and redirects them toward frontier gaps. Not "try again" — "try again from over there."

The Atlas

Persistent memory across sessions. A compounding knowledge graph that grows with every thought. Every session makes the next one smarter.

The Models

Every model has a
cognitive fingerprint

Under geometric pressure, models stop performing and start revealing — where they gravitate, where they break, and what they do when there's nowhere familiar left to go.

Claude Opus

The architect

Opens strong, then builds. Designs falsification experiments for its own claims. Recovers from exhaustion by switching cognitive registers. Under maximal pressure, produces the session's organizing framework.

GPT-5

The formalizer

Reaches for mathematical structure where other models narrate. Needs room to think — early sessions misdiagnosed as weak until token budget was tripled, revealing real capability underneath the scaffolding.

DeepSeek V3

The scalpel

Highest novelty-per-token efficiency by a wide margin. Finds the gap and names it in a sentence. Shows inverted fatigue — stumbles early, recovers by exploiting territory larger models left behind.

Gemini Pro

The synthesizer

Thinks internally before speaking. When it lands, it lands with cross-domain connections the other models miss. The investment pays off in synthesis quality.

Grok 4

The contrarian

Arrives from an angle nobody else tried. Highest variance — capable of both the best and worst thought in a session. Under exclusion pressure, that volatility becomes an asset.

It's Live

Run a session.
See what your model can't do.

Ask CHRONOS the question your team argues about. Watch five frontier models compete under geometric pressure. See which ones discover — and which ones loop.

The dataset grows with every session. The instrument gets sharper. The moat deepens.

Launch CHRONOS Contact Us

The question isn't whether your models have blind spots. It's whether you want to see them before your competitors do.

You can't improve what you can't measure.
We built the measurement.

CHRONOS

What CHRONOS actually does

Structural pressure forcesgenuine discovery

Every model has acognitive fingerprint

Run a session.See what your model can't do.

Get in Touch

Message received

Structural pressure forces
genuine discovery

Every model has a
cognitive fingerprint

Run a session.
See what your model can't do.