Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.output.ai/llms.txt

Use this file to discover all available pages before exploring further.

Evaluators score content — usually LLM output — and return results with a value, a confidence score, and optional reasoning. Output uses evaluators in two distinct ways: Evaluator Step — An evaluator that runs inside your workflow as a step. Your workflow generates something, evaluates it, and decides what to do next: retry if quality is low, skip if confidence is high, or branch to a different path. This generate-evaluate-retry loop is how you build self-correcting workflows.
workflow.ts
const summary = await generateSummary( company );
const quality = await judgeSummaryQuality( { summary, companyName: company.name } );

if ( quality.value === true && quality.confidence >= 0.7 ) {
  return summary;
}
// retry or take a different path...
Evaluation Workflow — A separate workflow that tests another workflow’s quality across a dataset of test cases. You define evaluators with verify(), wire them into an eval workflow, and run them from the CLI. Use this for regression testing, CI/CD quality gates, and systematic quality monitoring — without modifying your production workflow code.
npx output workflow test my_workflow --dataset golden_set

Which one do I need?

I want to…Use
Have my workflow check and improve its own outputEvaluator Step
Test my workflow against a set of known inputsEvaluation Workflow
Add quality gates that retry on failureEvaluator Step
Run evals in CI/CD before deployingEvaluation Workflow
Use both in the same projectStart with evaluator steps in your workflow, then add an evaluation workflow for testing
Both approaches use evaluators under the hood — the difference is where they run and what they control.

What’s Next

Evaluator Step

Evaluator Step. Build evaluators and use them inside your workflows for self-correction.

Evaluation Workflow

Evaluation Workflow. Test workflow quality across datasets from the CLI.

LLM-as-a-Judge Best Practices

LLM-as-a-Judge Best Practices. Writing effective judge prompts, grading scales, and common pitfalls.