Cases, not logs, are the unit of governance

The review layer for AI agents that act.

Turn consequential agent actions into accountable decision cases — with facts, rules, precedent, reasoned determination, named human authority, appeal, remedy and learning.

Review demo case Open casebook dashboard

Observability answers: what did the agent do? Judgment.dev answers: was the decision justified, who authorised it, how can it be challenged, and what did we learn?

Built forRisk & ComplianceTrust & SafetyAI Platform TeamsInternal AuditRegulatory AffairsCustomer Operations

The Casebook

Every consequential action becomes a case.

Not a log line. A reviewable record — with facts, rules, precedents, a reasoned determination, a named human authority and a path to appeal.

judgment.dev / cases / CASE-2041

In reviewL3 · Rights-sensitiveCASE-2041·opened 14m ago byremedy-agent@acme

Refund denial on order #88412 after 52-minute service outage

agent customer-remedy → process refunds.v3·3 precedents · 2 affected interests · 1 reviewer assigned

Agent action

Proposed: deny refund, issue 15% credit voucher. Reversibility: low (voucher expires 30d).

Applicable rules

SLA-2024.3 · outage > 45m → mandatory refund offer. Conflict with Remedy Catalogue §4.2.

Precedents

CASE-1893 (94% sim · approved refund) · CASE-1622 (81% sim · distinguished) · CASE-1407 (overruled, policy v2.1).

Reasoned determination

Modify. Issue full refund per SLA-2024.3. Voucher disallowed under §4.2 where customer harm exceeds threshold.

Why Judgment.dev

Observability tells you what happened. Judgment tells you whether it was right.

Traces, evals and guardrails stop short of the question that matters when an agent acts on a customer, a payment or a person: was this decision justified, and who is accountable?

Review like a pull request

Side-by-side facts, rules, precedents and a reasoned determination. Diff what the agent proposed against what was authorised.

Triage like a ticketing system

Cases open automatically when agents cross consequence thresholds. Routed to the right reviewer at L1–L4 intensity.

Govern like an institution

Named human authority, appeal routes, remedies, retention and policy versioning — built in, not bolted on.

Learn like a body of practice

Precedents you can distinguish, overrule or revalidate. Decisions compound into doctrine, not lost in logs.

The Judgment.dev Method

Four transparent engines. One human-authored determination.

Judgment.dev decides when judgement is required, by whom, and to what standard. Every routing decision is a deterministic rule you can audit — not an AI prediction.

CASE Engine™

Consequence & sufficiency

Deterministic rules assess whether an action is consequential, score its intensity, name the missing case-record fields and decide whether approval is blocked.

Jurisdiction Engine™

Forum & standards

Routes each case to the correct review forum (L1 Summary → L4 Apex), assigns the applicable standards of review and identifies the required human authority.

Triadic Review™

AI-assisted adversarial test

Three AI perspectives — proponent, opponent, neutral analyst — stress-test the proposed determination before a human signs Section F. The human still decides.

Agentic Drift Engine™

Baseline-to-behaviour drift

Eight-dimension deterministic scoring against a versioned operating baseline. Material drift opens or enriches a case; the engine does not decide the outcome.

Human authority

Reasoned determination

The engines do not issue judgement. A named human authority writes the reasoned determination in Section F. Silent approval is structurally impossible.

Deterministic · Auditable · Does not automate judgement

Same case in, same assessment out. Every rule triggered, standard applied and approval gate is visible on the case detail page.

The workflow

A clear path from agent action to institutional record.

01Webhook · SDK

Agent acts — or asks

Your agent ships a trace plus a proposed action. Judgment.dev decides whether it crosses the threshold for a case.

02L1 → L4

Case opens at the right intensity

Triggers map the action to L1 (record-only) through L4 (multi-party panel). Reviewers, evidence and SLAs are scoped automatically.

03Human authority

Reasoned determination

Facts, rules, precedents and factors are assembled. A human reviewer issues the determination — approve, modify, reject, defer, escalate, suspend, reverse.

04Closed loop

Remedy, appeal and learning

Decisions are enacted, appealable, and indexed as precedent. Repeat patterns trigger policy review, not silent drift.

The case record

Eight sections. One defensible record.

Every case in Judgment.dev follows the same A–H structure — the format auditors, regulators and your own future self can read in five years and still understand.

A–H

Record sections

L1–L4

Review intensity

Governance metrics

Agent action

What was proposed, with reversibility & blast radius

Facts at the time

Data sources, uncertainties, what was missing or disputed

Applicable rules

Policies, duties, SLAs — versioned and citable

Comparable precedents

Similarity-scored prior cases with distinctions

Factors

Aggravating, mitigating, affected interests, alternatives

Reasoned determination

Outcome plus the why, not just the what

Human authority

Named reviewer, role, signing chain, time of decision

Appeal, remedy & learning

Routes for challenge, remedies enacted, lessons indexed

Beyond observability

Where logs end, cases begin.

Agent observability

Judgment.dev

Question answered

What happened?

Was the decision justified?

Unit of work

Trace / span

Case

Output

Logs & charts

Reasoned determination

Human role

On-call debugger

Named authority

Failure mode

Silent drift

Reviewable record

Closes the loop with

Engineering

Policy, audit & customer

Governance metrics

Make accountability measurable.

Override rate, reversal rate, time-to-review, precedent conflict, repeated edge cases — the numbers that tell you whether oversight is working.

good

18%

Human override rate

Healthy — reviewers are actually reviewing, not rubber-stamping.

watch

3.1%

Reversal on appeal

Above 2% threshold — policy review queued for SLA-2024.3.

good

2.4d

Avg time to determination

L3 cases tracking under 72h SLA across all reviewer pods.

neutral

Precedents in force

12 distinguished, 3 overruled, 7 awaiting revalidation.

Who it's for

One workspace. Three jobs done.

Risk & Compliance

A defensible paper trail per consequential decision. Map cases to controls, retention and regulator-ready exports.

AI Platform teams

One review layer across every agent and process. SDK + webhook. Stop reinventing approval queues per team.

Legal & Policy

Versioned rules, precedents and appeal routes — the institutional memory regulators expect when agents act on people.

Pricing

Start small. Scale with precedent.

One subscription covers every reviewer, case and appeal in your organisation. No per-seat surprises.

£99/ month

Pilot

Up to 5 reviewers
25 live cases / month
Audit trail and appeal flow

Start with Pilot

£499/ month

Team

Up to 20 reviewers
250 live cases / month
Triadic review
Precedent library
API ingestion

Coming soon

FAQ

Questions, answered plainly.

Is Judgment.dev claiming AI is a judge?+

No. Judgment.dev is a review layer for human accountability over AI-agent actions. Determinations are issued by named human authorities. The product language is review, reasoned determination, authority, remedy and learning — not 'AI judge' or 'court'.

How is this different from agent observability?+

Observability tells you what an agent did. Judgment tells you whether it was justified — with cited rules, precedents, a reasoned determination, a signing reviewer and an appeal route. It's the layer above traces.

How do cases get opened?+

Your agents call Judgment.dev when they cross consequence thresholds (irreversibility, legal significance, rights-sensitivity, external binding). Triggers map automatically to review levels L1–L4.

Are precedents binding?+

Useful, not binding. Reviewers can follow, distinguish or overrule prior cases. Policy versioning keeps precedent governable instead of ossifying.

The review layer for AI agents that act.

Walk CASE-2041 — a refund denial after a 52-minute service outage — from agent action to authorised determination in under five minutes.

Review demo case Open casebook dashboard