Cases, not logs, are the unit of governance

The review layer for AI agents that act.

Turn consequential agent actions into accountable decision cases — with facts, rules, precedent, reasoned determination, named human authority, appeal, remedy and learning.

Observability answers: what did the agent do? Judgment.dev answers: was the decision justified, who authorised it, how can it be challenged, and what did we learn?

Built forRisk & ComplianceTrust & SafetyAI Platform TeamsInternal AuditRegulatory AffairsCustomer Operations

The Casebook

Every consequential action becomes a case.

Not a log line. A reviewable record — with facts, rules, precedents, a reasoned determination, a named human authority and a path to appeal.

judgment.dev / cases / CASE-2041
In reviewL3 · Rights-sensitiveCASE-2041·opened 14m ago byremedy-agent@acme

Refund denial on order #88412 after 52-minute service outage

agent customer-remedy → process refunds.v3·3 precedents · 2 affected interests · 1 reviewer assigned
A
Agent action
Proposed: deny refund, issue 15% credit voucher. Reversibility: low (voucher expires 30d).
C
Applicable rules
SLA-2024.3 · outage > 45m → mandatory refund offer. Conflict with Remedy Catalogue §4.2.
D
Precedents
CASE-1893 (94% sim · approved refund) · CASE-1622 (81% sim · distinguished) · CASE-1407 (overruled, policy v2.1).
F
Reasoned determination
Modify. Issue full refund per SLA-2024.3. Voucher disallowed under §4.2 where customer harm exceeds threshold.

Why Judgment.dev

Observability tells you what happened. Judgment tells you whether it was right.

Traces, evals and guardrails stop short of the question that matters when an agent acts on a customer, a payment or a person: was this decision justified, and who is accountable?

Review like a pull request

Side-by-side facts, rules, precedents and a reasoned determination. Diff what the agent proposed against what was authorised.

Triage like a ticketing system

Cases open automatically when agents cross consequence thresholds. Routed to the right reviewer at L1–L4 intensity.

Govern like an institution

Named human authority, appeal routes, remedies, retention and policy versioning — built in, not bolted on.

Learn like a body of practice

Precedents you can distinguish, overrule or revalidate. Decisions compound into doctrine, not lost in logs.

The Judgment.dev Method

Four transparent engines. One human-authored determination.

Judgment.dev decides when judgement is required, by whom, and to what standard. Every routing decision is a deterministic rule you can audit — not an AI prediction.

CASE Engine™

Consequence & sufficiency

Deterministic rules assess whether an action is consequential, score its intensity, name the missing case-record fields and decide whether approval is blocked.

Jurisdiction Engine™

Forum & standards

Routes each case to the correct review forum (L1 Summary → L4 Apex), assigns the applicable standards of review and identifies the required human authority.

Triadic Review™

AI-assisted adversarial test

Three AI perspectives — proponent, opponent, neutral analyst — stress-test the proposed determination before a human signs Section F. The human still decides.

Agentic Drift Engine™

Baseline-to-behaviour drift

Eight-dimension deterministic scoring against a versioned operating baseline. Material drift opens or enriches a case; the engine does not decide the outcome.

Human authority

Reasoned determination

The engines do not issue judgement. A named human authority writes the reasoned determination in Section F. Silent approval is structurally impossible.

Deterministic · Auditable · Does not automate judgement

Same case in, same assessment out. Every rule triggered, standard applied and approval gate is visible on the case detail page.

The workflow

A clear path from agent action to institutional record.

01Webhook · SDK

Agent acts — or asks

Your agent ships a trace plus a proposed action. Judgment.dev decides whether it crosses the threshold for a case.

02L1 → L4

Case opens at the right intensity

Triggers map the action to L1 (record-only) through L4 (multi-party panel). Reviewers, evidence and SLAs are scoped automatically.

03Human authority

Reasoned determination

Facts, rules, precedents and factors are assembled. A human reviewer issues the determination — approve, modify, reject, defer, escalate, suspend, reverse.

04Closed loop

Remedy, appeal and learning

Decisions are enacted, appealable, and indexed as precedent. Repeat patterns trigger policy review, not silent drift.

The case record

Eight sections. One defensible record.

Every case in Judgment.dev follows the same A–H structure — the format auditors, regulators and your own future self can read in five years and still understand.

A–H
Record sections
L1–L4
Review intensity
11
Governance metrics
A
Agent action
What was proposed, with reversibility & blast radius
B
Facts at the time
Data sources, uncertainties, what was missing or disputed
C
Applicable rules
Policies, duties, SLAs — versioned and citable
D
Comparable precedents
Similarity-scored prior cases with distinctions
E
Factors
Aggravating, mitigating, affected interests, alternatives
F
Reasoned determination
Outcome plus the why, not just the what
G
Human authority
Named reviewer, role, signing chain, time of decision
H
Appeal, remedy & learning
Routes for challenge, remedies enacted, lessons indexed

Beyond observability

Where logs end, cases begin.

Agent observability
Judgment.dev
Question answered
What happened?
Was the decision justified?
Unit of work
Trace / span
Case
Output
Logs & charts
Reasoned determination
Human role
On-call debugger
Named authority
Failure mode
Silent drift
Reviewable record
Closes the loop with
Engineering
Policy, audit & customer

Governance metrics

Make accountability measurable.

Override rate, reversal rate, time-to-review, precedent conflict, repeated edge cases — the numbers that tell you whether oversight is working.

good
18%
Human override rate
Healthy — reviewers are actually reviewing, not rubber-stamping.
watch
3.1%
Reversal on appeal
Above 2% threshold — policy review queued for SLA-2024.3.
good
2.4d
Avg time to determination
L3 cases tracking under 72h SLA across all reviewer pods.
neutral
94
Precedents in force
12 distinguished, 3 overruled, 7 awaiting revalidation.

Who it's for

One workspace. Three jobs done.

Risk & Compliance

A defensible paper trail per consequential decision. Map cases to controls, retention and regulator-ready exports.

AI Platform teams

One review layer across every agent and process. SDK + webhook. Stop reinventing approval queues per team.

Legal & Policy

Versioned rules, precedents and appeal routes — the institutional memory regulators expect when agents act on people.

Pricing

Start small. Scale with precedent.

One subscription covers every reviewer, case and appeal in your organisation. No per-seat surprises.

£99/ month

Pilot

  • Up to 5 reviewers
  • 25 live cases / month
  • Audit trail and appeal flow
£499/ month

Team

  • Up to 20 reviewers
  • 250 live cases / month
  • Triadic review
  • Precedent library
  • API ingestion
Coming soon

FAQ

Questions, answered plainly.

Is Judgment.dev claiming AI is a judge?+

No. Judgment.dev is a review layer for human accountability over AI-agent actions. Determinations are issued by named human authorities. The product language is review, reasoned determination, authority, remedy and learning — not 'AI judge' or 'court'.

How is this different from agent observability?+

Observability tells you what an agent did. Judgment tells you whether it was justified — with cited rules, precedents, a reasoned determination, a signing reviewer and an appeal route. It's the layer above traces.

How do cases get opened?+

Your agents call Judgment.dev when they cross consequence thresholds (irreversibility, legal significance, rights-sensitivity, external binding). Triggers map automatically to review levels L1–L4.

Are precedents binding?+

Useful, not binding. Reviewers can follow, distinguish or overrule prior cases. Policy versioning keeps precedent governable instead of ossifying.

The review layer for AI agents that act.

Walk CASE-2041 — a refund denial after a 52-minute service outage — from agent action to authorised determination in under five minutes.