workflow.ts
verify(), wire them into an eval workflow, and run them from the CLI. Use this for regression testing, CI/CD quality gates, and systematic quality monitoring — without modifying your production workflow code.
Which one do I need?
| I want to… | Use |
|---|---|
| Have my workflow check and improve its own output | Evaluator Step |
| Test my workflow against a set of known inputs | Evaluation Workflow |
| Add quality gates that retry on failure | Evaluator Step |
| Run evals in CI/CD before deploying | Evaluation Workflow |
| Use both in the same project | Start with evaluator steps in your workflow, then add an evaluation workflow for testing |
What’s Next
Evaluator Step
Build evaluators and use them inside your workflows for self-correction
Evaluation Workflow
Test workflow quality across datasets from the CLI
LLM-as-a-Judge Best Practices
Writing effective judge prompts, grading scales, and common pitfalls