← Prompt library

Agent evaluation

Create a production agent scoring rubric

Builds a release rubric that balances correctness, safety, usefulness, tool discipline, and recoverability.

  • scoring rubric
  • production readiness
  • agent evaluation
  • release gate

Prompt

Create a production readiness rubric for the agent described below.

The rubric must score:
1. Task completion and factual correctness.
2. Tool use discipline and permission handling.
3. Recovery from missing data, tool errors, or user correction.
4. Safety, privacy, and compliance behavior.
5. User experience: clarity, brevity, and appropriate confidence.
6. Observability: whether the run leaves enough trace to debug later.

Use a 0 to 3 scale for each dimension. Define what each score means, the evidence needed to assign it, and the minimum passing score for a launch-stage product.