Overview

Which one do I need?
What’s Next

Evaluators score content — usually LLM output — and return results with a value, a confidence score, and optional reasoning. Output uses evaluators in two distinct ways: Evaluator Step — An evaluator that runs inside your workflow as a step. Your workflow generates something, evaluates it, and decides what to do next: retry if quality is low, skip if confidence is high, or branch to a different path. This generate-evaluate-retry loop is how you build self-correcting workflows.

workflow.ts

const summary = await generateSummary( company );
const quality = await judgeSummaryQuality( { summary, companyName: company.name } );

if ( quality.value === true && quality.confidence >= 0.7 ) {
  return summary;
}
// retry or take a different path...

Evaluation Workflow — A separate workflow that tests another workflow’s quality across a dataset of test cases. You define evaluators with verify(), wire them into an eval workflow, and run them from the CLI. Use this for regression testing, CI/CD quality gates, and systematic quality monitoring — without modifying your production workflow code.

npx output workflow test my_workflow --dataset golden_set

Which one do I need?

I want to…	Use
Have my workflow check and improve its own output	Evaluator Step
Test my workflow against a set of known inputs	Evaluation Workflow
Add quality gates that retry on failure	Evaluator Step
Run evals in CI/CD before deploying	Evaluation Workflow
Use both in the same project	Start with evaluator steps in your workflow, then add an evaluation workflow for testing

Both approaches use evaluators under the hood — the difference is where they run and what they control.

What’s Next

Evaluator Step

Evaluator Step. Build evaluators and use them inside your workflows for self-correction.

Evaluation Workflow

Evaluation Workflow. Test workflow quality across datasets from the CLI.

LLM-as-a-Judge Best Practices

LLM-as-a-Judge Best Practices. Writing effective judge prompts, grading scales, and common pitfalls.

Cost Events Evaluator Step

⌘I

Start Here

Workflows

Steps

Prompts

Clients

Costs

Evaluators

Production Readiness

Deployment

Integrate with Your App

Packages

Releases

Which one do I need?

What’s Next

Evaluator Step

Evaluation Workflow

LLM-as-a-Judge Best Practices

Start Here

Workflows

Steps

Prompts

Clients

Costs

Evaluators

Production Readiness

Deployment

Integrate with Your App

Packages

Releases

Documentation Index

​Which one do I need?

​What’s Next

Evaluator Step

Evaluation Workflow

LLM-as-a-Judge Best Practices

Which one do I need?

What’s Next