Verdict object from @outputai/evals provides helpers for returning evaluation results from your workflow evaluators. There are three categories: deterministic assertions for programmatic checks, manual verdicts for custom logic, and LLM result wrappers for judge output.
Deterministic Assertions
These helpers compare values and return anEvaluationBooleanResult with confidence: 1.0. The reasoning is auto-generated based on the comparison result — you don’t need to write it yourself.
Verdict.equals
Strict equality check (===).
=== comparison, so Verdict.equals(1, '1') will fail. Objects are compared by reference, not by deep equality — use for primitives.
Verdict.closeTo
Checks if a number is within a tolerance of an expected value.Math.abs(actual - expected) <= tolerance. The tolerance is inclusive. Use this for floating-point comparisons where exact equality is unreliable.
Verdict.gt
Strict greater-than (>).
Verdict.gte
Greater-than-or-equal (>=).
Verdict.lt
Strict less-than (<).
Verdict.lte
Less-than-or-equal (<=).
Verdict.inRange
Inclusive range check — both boundaries are included.actual >= min && actual <= max.
Verdict.contains
Substring search usingString.includes().
Verdict.matches
Regular expression test usingRegExp.test().
Verdict.includesAll
Checks that an array contains every expected element.=== for element comparison, so this works best with primitives (strings, numbers).
Verdict.includesAny
Checks that an array contains at least one expected element.Verdict.isTrue
Strict boolean check againsttrue.
===). Truthy values like 1 or "true" will fail — only boolean true passes.
Verdict.isFalse
Strict boolean check againstfalse.
0 or "" will fail — only boolean false passes.
Manual Verdicts
When no deterministic helper fits your logic, construct verdicts directly. These returnEvaluationVerdictResult with the verdict value 'pass', 'partial', or 'fail'.
Verdict.pass
confidence: 1.0.
Verdict.partial
| Field | Required | Description |
|---|---|---|
issue | Yes | What the problem is |
suggestion | No | How to fix it |
reference | No | A reference URL or identifier |
priority | No | 'low' | 'medium' | 'high' | 'critical' |
Verdict.fail
confidence: 0.0. The reasoning is required — you must explain why the evaluation failed.
Using Manual Verdicts in Evaluators
Manual verdicts are useful when your check logic doesn’t map cleanly to a single assertion:tests/evals/evaluators.ts
LLM Result Wrappers
When you call the LLM yourself (instead of using the judge functions), these wrappers convert raw LLM output into evaluation results. All setconfidence: 0.9 to reflect the inherent uncertainty of LLM-generated judgments.
Verdict.fromJudge
EvaluationVerdictResult.
Verdict.score
EvaluationNumberResult.
Verdict.label
EvaluationStringResult.
Confidence Levels
| Category | Confidence | Why |
|---|---|---|
Deterministic assertions (equals, gt, contains, etc.) | 1.0 | Programmatically certain |
Verdict.pass() | 1.0 | Explicitly marked as passing |
Verdict.fail() | 0.0 | Explicitly marked as failing |
Verdict.partial() | Caller-specified | Custom confidence for partial results |
LLM wrappers (fromJudge, score, label) | 0.9 | LLM output has inherent uncertainty |
What’s Next
- Workflow Evaluators — Creating evaluators with
verify()and judge functions - Datasets — Defining test cases with inputs and ground truth
- Running Eval Workflows — Wiring evaluators into eval workflows and running them from the CLI