The review layer for AI agents that act.
Turn consequential agent actions into accountable decision cases — with facts, rules, precedent, reasoned determination, named human authority, appeal, remedy and learning.
Observability answers: what did the agent do? Judgment.dev answers: was the decision justified, who authorised it, how can it be challenged, and what did we learn?
The Casebook
Every consequential action becomes a case.
Not a log line. A reviewable record — with facts, rules, precedents, a reasoned determination, a named human authority and a path to appeal.
Refund denial on order #88412 after 52-minute service outage
Why Judgment.dev
Observability tells you what happened. Judgment tells you whether it was right.
Traces, evals and guardrails stop short of the question that matters when an agent acts on a customer, a payment or a person: was this decision justified, and who is accountable?
Review like a pull request
Side-by-side facts, rules, precedents and a reasoned determination. Diff what the agent proposed against what was authorised.
Triage like a ticketing system
Cases open automatically when agents cross consequence thresholds. Routed to the right reviewer at L1–L4 intensity.
Govern like an institution
Named human authority, appeal routes, remedies, retention and policy versioning — built in, not bolted on.
Learn like a body of practice
Precedents you can distinguish, overrule or revalidate. Decisions compound into doctrine, not lost in logs.
The Judgment.dev Method
Four transparent engines. One human-authored determination.
Judgment.dev decides when judgement is required, by whom, and to what standard. Every routing decision is a deterministic rule you can audit — not an AI prediction.
Consequence & sufficiency
Deterministic rules assess whether an action is consequential, score its intensity, name the missing case-record fields and decide whether approval is blocked.
Forum & standards
Routes each case to the correct review forum (L1 Summary → L4 Apex), assigns the applicable standards of review and identifies the required human authority.
AI-assisted adversarial test
Three AI perspectives — proponent, opponent, neutral analyst — stress-test the proposed determination before a human signs Section F. The human still decides.
Baseline-to-behaviour drift
Eight-dimension deterministic scoring against a versioned operating baseline. Material drift opens or enriches a case; the engine does not decide the outcome.
Reasoned determination
The engines do not issue judgement. A named human authority writes the reasoned determination in Section F. Silent approval is structurally impossible.
Same case in, same assessment out. Every rule triggered, standard applied and approval gate is visible on the case detail page.
The workflow
A clear path from agent action to institutional record.
Agent acts — or asks
Your agent ships a trace plus a proposed action. Judgment.dev decides whether it crosses the threshold for a case.
Case opens at the right intensity
Triggers map the action to L1 (record-only) through L4 (multi-party panel). Reviewers, evidence and SLAs are scoped automatically.
Reasoned determination
Facts, rules, precedents and factors are assembled. A human reviewer issues the determination — approve, modify, reject, defer, escalate, suspend, reverse.
Remedy, appeal and learning
Decisions are enacted, appealable, and indexed as precedent. Repeat patterns trigger policy review, not silent drift.
The case record
Eight sections. One defensible record.
Every case in Judgment.dev follows the same A–H structure — the format auditors, regulators and your own future self can read in five years and still understand.
Beyond observability
Where logs end, cases begin.
Governance metrics
Make accountability measurable.
Override rate, reversal rate, time-to-review, precedent conflict, repeated edge cases — the numbers that tell you whether oversight is working.
Who it's for
One workspace. Three jobs done.
Risk & Compliance
A defensible paper trail per consequential decision. Map cases to controls, retention and regulator-ready exports.
AI Platform teams
One review layer across every agent and process. SDK + webhook. Stop reinventing approval queues per team.
Legal & Policy
Versioned rules, precedents and appeal routes — the institutional memory regulators expect when agents act on people.
Pricing
Start small. Scale with precedent.
One subscription covers every reviewer, case and appeal in your organisation. No per-seat surprises.
Team
- Up to 20 reviewers
- 250 live cases / month
- Triadic review
- Precedent library
- API ingestion
FAQ
Questions, answered plainly.
Is Judgment.dev claiming AI is a judge?+
No. Judgment.dev is a review layer for human accountability over AI-agent actions. Determinations are issued by named human authorities. The product language is review, reasoned determination, authority, remedy and learning — not 'AI judge' or 'court'.
How is this different from agent observability?+
Observability tells you what an agent did. Judgment tells you whether it was justified — with cited rules, precedents, a reasoned determination, a signing reviewer and an appeal route. It's the layer above traces.
How do cases get opened?+
Your agents call Judgment.dev when they cross consequence thresholds (irreversibility, legal significance, rights-sensitivity, external binding). Triggers map automatically to review levels L1–L4.
Are precedents binding?+
Useful, not binding. Reviewers can follow, distinguish or overrule prior cases. Policy versioning keeps precedent governable instead of ossifying.
The review layer for AI agents that act.
Walk CASE-2041 — a refund denial after a 52-minute service outage — from agent action to authorised determination in under five minutes.