Evaluators score content — usually LLM output — and return results with a value, a confidence score, and optional reasoning. Output uses evaluators in two distinct ways: Evaluator Step — An evaluator that runs inside your workflow as a step. Your workflow generates something, evaluates it, and decides what to do next: retry if quality is low, skip if confidence is high, or branch to a different path. This generate-evaluate-retry loop is how you build self-correcting workflows.Documentation Index
Fetch the complete documentation index at: https://docs.output.ai/llms.txt
Use this file to discover all available pages before exploring further.
workflow.ts
verify(), wire them into an eval workflow, and run them from the CLI. Use this for regression testing, CI/CD quality gates, and systematic quality monitoring — without modifying your production workflow code.
Which one do I need?
| I want to… | Use |
|---|---|
| Have my workflow check and improve its own output | Evaluator Step |
| Test my workflow against a set of known inputs | Evaluation Workflow |
| Add quality gates that retry on failure | Evaluator Step |
| Run evals in CI/CD before deploying | Evaluation Workflow |
| Use both in the same project | Start with evaluator steps in your workflow, then add an evaluation workflow for testing |
What’s Next
Evaluator Step
Evaluator Step. Build evaluators and use them inside your workflows for self-correction.
Evaluation Workflow
Evaluation Workflow. Test workflow quality across datasets from the CLI.
LLM-as-a-Judge Best Practices
LLM-as-a-Judge Best Practices. Writing effective judge prompts, grading scales, and common pitfalls.