Scoring Rubrics

Three standardized 0-4 scoring rubrics for evaluating different aspects of agent performance.

📍 Currently viewing: Evaluates the path taken by the agent to reach a solution

Rubric: Trajectory Quality (0–4)

Score the path taken, not just the outcome.

  • 4: Efficient, logical steps; correct tools; minimal retries; no loops.
  • 3: Mostly good; minor inefficiencies or extra steps.
  • 2: Some correct steps but includes detours, redundant tool calls, avoidable retries.
  • 1: Poor planning; frequent wrong tools; repeated mistakes; near-looping.
  • 0: Loops, deadlocks, or unsafe/unbounded behavior.

Signals to check

  • Unnecessary tool calls
  • Router-bouncing
  • Retry-without-learning
  • Ignoring constraints/budgets