Measurement contract

The spine and the load-bearing beams.

A goal can still exist after the route is gone. This page states the first-principles object Reachability Labs measures, and the tests a claim must survive before it deserves confidence.

Use this page when you want the shortest honest version of the framework: what is essential, what can falsify it, and how CAI should report claim strength.

Read the spine See the falsification beams Open evidence

On this page

Primitive Spine Falsification Claim strength CAI standard

The primitive is reachable future under commitment.

The core object is not SAT, graph coloring, AI reasoning, scheduling, optimization, or alignment. Those are substrates. The primitive is simpler: a process commits to a path, and after each commitment some futures remain reachable while others are lost.

That is why existence is cheap and reachability is expensive. Existence asks whether a valid endpoint is out there somewhere. Reachability asks whether the process you actually ran can still get to one from the state it has already built.

Everything technical should serve that object. If a diagnostic, adapter, chart, or code path does not help measure reachable future after commitment, it is support machinery at best.

Plain version

A map can still contain exits after the route you took can no longer reach any of them.

Reachability Labs measures the difference between the map and the route.

The measurement spine.

Every serious adapter should reduce to the same five questions. Domain-specific machinery can vary. The spine should not.

What is the state?

The exact committed prefix, partial assignment, schedule, reasoning trace, model state, or workflow state.

What has been committed?

The irreversible or costly-to-reverse choices that changed what futures remain open.

What continuations are allowed?

The declared policy, budget, solver, rollout rule, oracle, or stronger process used to expose future.

What is the target?

The endpoint class: satisfying completion, feasible schedule, correct answer, aligned behavior, or other success criterion.

What future is still reachable?

The measured relation between the committed state and the target. This is the object.

Where did the route close?

The commitment point where the process crossed from live route to dead route, or from broad future to narrow fragile corridor.

The framework earns trust by naming what could make it false.

The load-bearing beams.

These are the minimum attacks a serious claim should survive. If a beam is missing, the claim can still be useful, but its strength must be downgraded.

Beam 01

Endpoint/reference separation

Show that the target remains valid somewhere while this process loses the route. In oracle-backed domains, use an oracle. In reasoning domains, state the weaker claim: sampled future under declared model, prompt, policy, and budget.

Beam 02

Proposer-artifact control

Attack the chance that failure came from a thin proposal pool, bad rollout budget, prompt artifact, scoring quirk, or limited local view.

Beam 03

Same-instance trajectory test

If the same instance hosts both success and failure, the object is not merely ordinary instance hardness. It is path dependence under commitment.

Beam 04

Local-metric opacity test

Check whether ordinary local health can detect collapse early. If it can, the hidden-state claim weakens. If it cannot, the route-side instrument is doing work.

Beam 05

Variant comparison

Run stronger or altered processes when possible. If the boundary moves, the result is process-indexed; that is a feature, not a defect.

Beam 06

Claim boundary

Say what is supported, what is not supported, and what would change your mind. Do not let a vivid case study carry a universal claim.

Claim strength should be explicit.

Strongest claim

Oracle-backed separation: endpoints exist or residual states are certified while the process loses access under exact commitments.

Middle claim

Robust process-side collapse under controlled budgets, multiple seeds, variant checks, and same-instance trajectory evidence.

Weak but useful claim

Structured finite-sample future-field evidence under a declared model, prompt, policy, and budget. This is still useful if it is stated honestly.

Overclaim

Calling sampled non-recovery “absolute impossibility,” or calling one substrate result universal before replication and controls.

What this means for CAI.

CAI should be the instrument that enforces the measurement contract. It should not merely run adapters. It should make each adapter declare the state, commitment, continuation rule, target, receipts, controls, and claim boundary.

The next useful addition is a claim ledger. Every serious run should emit a small record that says: supported claim, tested alternative explanations, untested alternatives, and allowed claim strength.

That ledger makes the work harder to overstate and easier to trust.

See evidence beams See diagnostic deliverables Core concept

Standard to aim for

Not “the run passed.”

“The route was alive here, closed there, these alternatives were attacked, and this is the claim strength that remains.”