Governed Inference — A Formal Architecture for Accountable AI

SECTION 01

The Thesis

5 claim types

State the bridging-object thesis — the Regulatory Interpretation (legal register) and the Governance Specification (technical register) are the same artifact; the Custodial Architecture is the accountable human process that authors, owns, and certifies it. Constrained decoding compiles a grammar into a runtime token mask; constrained encoding renders a Governance Specification into a governed prompt. Same pattern, both sides of inference.

Two Non-Negotiables

BOUNDARY CONDITION

The Seam

The boundary between deterministic structure and probabilistic content is drawn explicitly everywhere it appears; each side of the seam is scoped to exactly what it can guarantee.

OWNERSHIP

The IP Allocation

9606 owns the encoding surface and platform; the custodial function owns certification and custody; the customer owns the Regulatory Interpretation and audit record. Stated to the byte, not left to drift.

EPISTEMIC DISCIPLINE

The Claim Hierarchy

Every claim in the underlying paper is tagged as mathematical, empirical, engineering, legal, or regulatory-posture. A claim that doesn't announce its category is the paper's own failure, not the reader's to guess.

SECTION 02

Why a Model Cannot Govern Itself

4 structural limits

A transformer is a parameterized, continuous, almost-everywhere-differentiable function from token sequences to a probability distribution — extraordinarily capable, and structurally incapable of self-certification. Four independent results establish this; none depends on the others, and none is solved by scale.

REPRESENTATIONAL LIMIT

Rank Collapse

Pure self-attention drives token representations toward a rank-one matrix at a doubly-exponential rate in depth; reliability is a contingent internal equilibrium the residual stream and FFN hold off, not a guaranteed property.

CAPABILITY VS RELIABILITY

The Reasoning Ceiling

Capability is not reliability. Reasoning is brittle under semantics-preserving reformulation in ways the model does not signal; this boundary does not move under scaling the way the capability boundary does.

EPISTEMIC LIMIT

The Causal Rung Problem

Training data sits at Pearl's Rung 1 (association). A model trained on association cannot, by that training alone, certify Rung-2/3 (intervention/counterfactual) claims — exactly the claims a deterministic governance layer needs to make.

RUNTIME LIMIT

Non-Determinism by Default

Autoregressive serving introduces batch-dependent, floating-point-order non-determinism a bare model does not control for and an audit cannot replay without pinning it explicitly.

“The model will sometimes be wrong, and has no internal signal for when. The architecture's claim has never depended on that being false.”

SECTION 03

The Architecture

4 layers

Execution authority lives outside the model, in a formally verifiable Control Graph built on finite state machines and Petri nets. The model is bracketed: a deterministic encode stage assembles the governed prompt before inference; a deterministic decode stage verifies and constrains the output after it. C = M_post ∘ L_θ ∘ A_pre — and the composability theorem proves the bracketed system inherits the substrate's safety and authorization guarantees for every model placed inside it.

LAYER 01

Executable Control Graph

All permitted actions, sequencing, and compliance rules encoded as topology (FSM/Petri net), not as prompts. A policy encoded as a transition cannot be violated by the model; it can only fail to be reached.

LAYER 02

Model-Agnostic Inference Fabric

Generative models are interchangeable workers behind a stable control surface. Switching models is a configuration change, not a rewrite — the control graph's guarantees are proven independent of which L_θ sits inside the bracket.

LAYER 03

Constrained Encoding (A_pre)

A versioned, five-component Governance Specification (Domain Definition, Structural Schema, Relationship Map, Output Grammar, Guardrail Set) is rendered deterministically into the governed prompt before the model ever runs.

LAYER 04

Constrained Decoding & Verification (M_post)

The Output Grammar declared at encode time is enforced as a hard token mask where the endpoint permits, a checked guarantee where it does not, then layered content verification runs on the committed output.

Finite state machinesPetri netsDeterministic by constructionModel-agnosticCryptographic audit trail

SECTION 04

The Correctness Envelope

6 layers · 3 tiers

You cannot prevent a language model from being wrong. The architecture does not claim to. What it claims, and proves, is that every way a governed output can be wrong falls into exactly one of six layers, and every layer is handled by one of three treatments — ordered by strength, applied at the strongest tier available to that layer.

Tier ordering — strongest treatment available is applied per layer

Tier	Treatment	What it means	Example layers
1	Eliminated by construction	Occurrence probability zero, not low	Safety, authorization, reproducibility
2	Bounded and detected	Cannot be eliminated; probability is bounded and occurrence is detectable	Model-content residual (the one layer the model itself owns)
3	Attributed	Cannot be fully eliminated or detected, but maps to a named, accountable party with an evidence trail	Legal-adequacy, certification, use-scope errors

NOTE

The lattice is revisable. An incident-review mechanism reclassifies failure modes the current enumeration hasn't yet anticipated — totality is a claim about the architecture's six-layer pipeline structure, not an assertion that every failure mode is already catalogued.

SECTION 05

The Custodial Architecture

4-party federation

Accountability requires more than an architecture — it requires an institution. The Custodial Architecture is the federated structure that authors, owns, and certifies the Governance Specification, allocating responsibility to a named party at every layer before anything goes wrong, not after.

PLATFORM

9606

Owns the encoding surface, the encoding methodology, and the platform. Guarantees faithful execution: that a correct specification is rendered correctly.

LEGAL ADEQUACY

Specialist counsel

Owns legal-adequacy: that the Governance Specification correctly states what governing law or policy requires. A professional judgment, not a technical one.

CERTIFICATION

Custodial / certification function

Owns the certification framework and the custody record — the versioned, dated proof that a given specification was reviewed and by whom.

DEPLOYMENT

Customer

Owns the Regulatory Interpretation and the audit record for their own deployment; bears use-scope responsibility for how the certified architecture is actually deployed.

“A wrong governed prompt is always attributable to exactly one side of a drawn line: a correct specification rendered unfaithfully, or a specification that was itself wrong. The seam doesn't eliminate error — it makes every error locatable.”

SECTION 06

Economics

25–60% cost reduction

The cost advantage and the governance are the same architectural decision, viewed twice. A system that assembles a bounded governed prompt before inference spends fewer tokens than a system asking the model to manage itself — and the saving is structural, not promotional.

STATUS QUO

The Thinking Tax

Self-managing agent stacks bill every planning, routing, and reflection step as consumed tokens, on infrastructure whose providers face a real, if non-absolute, tension between consumption revenue and consumption-reducing tooling.

GOVERNED PROMPT

The Architecture of Enough

A governed prompt gives the model exactly one bounded task with no self-management overhead. The tokens spent are the tokens the task requires, not the tokens the model's own self-management would add.

EXECUTION

Routing & Caching

Each task routes to the least expensive model that satisfies the governed requirement; a stable governance prefix makes long governed prompts cache-efficient at scale. Both savings are the same ordering decision, viewed from two angles.

3–20ms

Control overhead / request

25–60%

Total inference cost reduction

60–80%

Reduction on high-volume classification

SECTION 07

Regulatory Alignment

architecture-to-requirement mapping

The architecture is not advanced as a compliance conclusion — that determination belongs to counsel and the regulator against specific facts. What it supplies, by construction, is the set of properties regulated environments are increasingly built to demand.

Architecture supplies the property; counsel determines compliance.

Regulatory requirement	What the architecture supplies
Reproducibility	The same trigger and configuration version yield the same governed prompt, every time
Auditability	Every inference event is bound to a versioned governance state and a provenance-tagged context package
Pre-execution control	Compliance is enforced before the model acts, not checked after
Data residency	The governed prompt is assembled under a customer-selected data-boundary mode (Strong / Moderate / Weak), enforced as a constraint on assembly, not a policy

Regulatory frameworks are stated at a level of generality that does not depend on specific provisions; verify current authority with qualified counsel before any compliance representation.

SECTION 08

Reference Deployments

2 domains

An architecture is validated by contact with a real regulated domain, not by internal coherence. Two reference engagements supply that contact, presented at the conservative register their actual stage warrants.

PUBLIC-INTEREST DOMAIN

GetWater — Water Rights Verification

Applies the architecture to verifying water deliveries under Utah water-rights law, a domain that is simultaneously legal, physical, and auditable, with no human-in-the-loop absorbing model uncertainty. Tests source-corpus legal adequacy and the legal-to-deterministic conversion under public-interest stakes.

REGULATED COMMUNICATIONS

Registered Investment Adviser Use Case

Encodes governing compliance obligations (Investment Advisers Act, SEC marketing-rule requirements, firm policy) for a regulated-communications workflow, producing a customer-specific audit package reviewable by counsel, a board, or an examiner.

Deployment status and customer relationships are stated at the stage represented in current internal materials; specific facts are described at a level of generality that does not depend on publication clearance.

SECTION 09

Strategic Position & Backing

unoccupied middle layer

The AI infrastructure market has three layers — model providers, application platforms, and the largely unoccupied middle: reasoning and orchestration infrastructure between the two. 9606 occupies that layer, defined by pre-inference specification, a deterministic execution boundary, constrained output, verification, audit, and accountable ownership.

BOTTOM LINE

The asset appreciates as models commoditize

A revenue model anchored to the control layer is positively correlated with model-market competition; the enterprise with the control layer captures falling model prices through configuration, not migration.

The moat is structural, not a feature gap

Years of formal-methods and production-systems work, not prompt engineering added to a model API.

Read market neighbors at their actual layer

Model providers, agent frameworks, guardrail libraries, compliance-workflow vendors, legal AI vendors, and services firms are each a genuinely different position — most are complementary, not competitive, once read at the layer they actually occupy.

Backed by the people who built the last layer

Open Teams Incubator Fund IV ($225M); portfolio includes NumPy, SciPy, PyTorch, Anaconda. Travis Oliphant (creator of NumPy, founder of Anaconda) is a founding partner.

LEADERSHIP

David Robinson

CEO

Chief Executive Officer.

Burke Powers

CTO

Chief Technology Officer.

Ryan Shepherd

COO

Chief Operating Officer.