Technical Strategies & Architecture

Production patterns for reliability, safety, evals, observability, and governance.

Audience: Senior Tech Leads • CTOs • AI/MLOps Engineers • Product Leaders
Promise: Ship the smallest agent that solves the job — and make it safe, observable, and governable.

North Star: Start with minimum autonomy. Treat tools as product. Add guardrails + evals + tracing. Govern rollouts with canaries + kill switches.


TL;DR

  • Start with the minimum autonomy that solves the job.
  • Treat tools as product: strict contracts, permissions, budgets, idempotency.
  • Add guardrails + HITL by default for high-impact actions.
  • Ship with tracing + evals like you ship with tests + monitoring.
  • Build an explicit failure → replay → regression loop.
  • Govern rollouts with feature flags + canaries + kill switches.

Should this be an “agent”?

Use an agent when the job requires multi-step decisions + tool use + iteration, not “one prompt → one response.”

The 3-box filter (agent-ready use case)

A use case is agent-ready only if it is:

  • Valuable (measurable ROI: time, cost, risk, revenue)
  • Executable (tools/data exist and can be accessed safely)
  • Governable (you can bound impact + audit actions)

If any box fails: start with deterministic workflows, automation, or a single-shot assistant.


Agent stack (4 primitives)

Agents are engineered from four primitives:

  1. Instructions — what “good” means (testable, structured)
  2. Tools — how it acts (APIs, DBs, file ops, MCP)
  3. Memory/Data — what it knows (session, RAG, long-term)
  4. Orchestration — how it plans/loops (routing, planners, subagents)

Agent runtime primitives diagram

Mental model: every failure maps to one primitive. Fix the primitive — don’t just tweak prompts.

Playbook cards (Key Principles)

Core mental models

  • Autonomy ladder: Start simple; earn autonomy via eval evidence.
  • Four primitives: Instructions • Tools • Memory • Orchestration.
  • Glass-box agents: Traces + metrics + replayability from day 1.

Reliability & safety

  • Layered guardrails: Input → tool gating → output validation → stop rules.
  • Tool contracts: Typed IO, strict parsing, budgets, idempotency, rollback.
  • Evals as a pipeline: CI → staging → prod monitoring; failures become tests.

Governance & rollout

  • Least privilege: Deny-by-default tools, role-based capabilities, scoped creds.
  • HITL approvals: Interrupt + resume for high-risk actions; clear escalation.
  • Safe shipping: Flags → canaries → A/B → rollback → kill switch.

Orchestration ladder (start simple)

Rule: Don't build a multi-agent "society" until you've proven a single agent fails.

Level Pattern Use when Main risk
1 Single-call + tools one step + action brittle prompts
2 Prompt chaining fixed steps latency
3 Routing distinct categories → specialist flows misroutes
4 Parallelization speed or confidence cost
5 Orchestrator–workers dynamic decomposition coordination bugs
6 Evaluator–optimizer loops quality-critical outputs loops + latency

Use multi-agent when you need:

  • specialization (domain experts)
  • parallel research or parallel checks
  • independent verification / debate
  • strict permission boundaries by role

Avoid multi-agent when:

  • the task is short and linear
  • you lack evals/observability (you'll ship chaos faster)

Two stable patterns

  1. Manager–Worker (manager decomposes, workers execute, manager synthesizes)
  2. Handoffs (control transfers to specialist agents)

Manager-worker pattern sequence diagram


Tools are product (Tool Contracts + ACI)

If the agent can call it, you need a tool contract:

  • strict schema (typed inputs/outputs)
  • permissioning (least privilege)
  • budgets (rate limits, token caps, cost caps)
  • retries/timeouts + idempotency
  • audit logs + rollback plan

ACI (Agent–Computer Interface) heuristics

  • make tool names unambiguous
  • embed examples + edge cases in tool descriptions
  • return structured errors, not prose
  • minimize "free-form" tool results

MCP integration (tool/data plane)

Use MCP when you have many tools × many agents and want:

  • standard interfaces to tools/resources
  • portability across runtimes
  • centralized governance for tool access

Security posture:

  • treat MCP servers as prod services (authn/authz, allowlists, logging)
  • assume tool outputs can be malicious (prompt injection is real)
  • validate everything at boundaries

Guardrails + Human-in-the-Loop (default stance)

Layer guardrails:

  • Failure-resistant input: injection, policy, relevance checks
  • Tool gating: allowlists + approvals for high-risk actions
  • Output validation: PII/policy/format/schema checks
  • Stop rules: loop caps, timeout, budget caps

HITL trigger policy (example)

Require approval when:

  • action is irreversible (payments, deletes, emails)
  • tool result indicates high uncertainty
  • safety classifier flags elevated risk
  • agent exceeds retry/loop thresholds

Observability: make it a glass box

Minimum instrumentation:

  • traces: generations, tool calls, handoffs, guardrails decisions
  • step-level latency + cost (tokens, tool runtime)
  • outcome labels: success/failure + reason codes

Minimum trace schema

  • run_id, agent_version, model_id, prompt_hash
  • steps[]: tool_name, args_hash, status, retries, latency_ms
  • totals: tokens_in/out, cost_estimate, p95 latency
  • safety: flags, approvals, escalations

Evals: CI + staging + prod monitoring (AgentOps)

3-tier pipeline + failure→regression loop

Agents need scenario-driven, multi-metric evals.

3-tier pipeline

  1. CI (every PR): fast unit + golden tests
  2. Staging/canary: full suite + adversarial + rubric
  3. Production: A/B for major changes + continuous monitoring

Evals pipeline diagram


Failure modes & mitigations (what breaks + what to do)

Failure taxonomy + detect/constrain/prevent

Failure modes (common + expensive)

Category What breaks Typical symptom
Prompt injection tool misuse, data exfiltration agent follows hidden instructions
Tool hallucination fake tool calls or invalid args runtime exceptions, partial outputs
Excessive agency “does too much” unexpected actions, high cost
RAG brittleness wrong context retrieved confident wrong answers
Looping / thrashing endless refine/retry runaway latency + spend
Schema drift tools change, outputs break silent corruption
Multi-agent conflict inconsistent outputs contradictions
Data leakage PII/secrets in outputs compliance incidents
Non-determinism flaky behavior eval instability

Detect (signals you can automate)

  • Trace anomalies: loop count spikes, tool retry spikes, sudden tool mix changes
  • Policy flags: injection indicators, sensitive data detectors
  • Validators: JSON schema checks, unit checks, citation checks
  • Canary alarms: error rate, p95 latency, cost/task, intervention rate

Constrain (make the blast radius small)

  • Least privilege tool allowlists (per agent/subagent)
  • High-risk tool approvals (interrupt + resume)
  • Budgets: max steps, max tokens, max tool calls, max cost
  • Sandbox tools: read-only mode, dry-run, staged writes
  • Idempotency keys for side effects
  • Structured outputs + strict parsing + rejection on failure

Prevent regression (make failures non-repeatable)

  • add every incident as a failure replay test case
  • maintain adversarial eval packs (injection, jailbreak, data leakage)
  • run evals in CI and block releases on key KPI regressions
  • track deltas by agent_version + prompt_hash

Failure postmortem template

  • Incident summary + impact
  • Reproduction trace (run_id)
  • Root cause: instructions / tools / memory / orchestration
  • Mitigation shipped
  • New regression tests added
  • Rollback/kill-switch criteria updated

Governance posture (permissions, approvals, audit, rollout)

Governance checklist

Permissions model (capability-based)

  • tool access is granted per role and environment (dev/staging/prod)
  • separate read vs write tools (read-only by default)
  • scoped credentials (time-bound tokens, per-tenant access)
  • deny-by-default + explicit allowlists

Approvals & escalation

  • define risk tiers: low/medium/high/critical
  • enforce HITL for high-impact tools
  • escalation routes: human reviewer → domain owner → security/compliance

Audit trails (non-negotiable)

Store immutable logs:

  • prompts (or prompt hashes), tool args hashes
  • tool results metadata (status, size, source)
  • approval decisions (who/when/why)
  • model + agent versions
  • full trace IDs for replay

Rollout strategy (safe shipping)

  • feature flags for progressive exposure
  • canary releases on small traffic segments
  • A/B tests for major behavior changes
  • automated rollback when KPIs breach thresholds
  • always-on “kill switch” for risky tools

Rollout checklist

  • canary cohort defined
  • KPIs + thresholds defined (quality, cost, safety, latency)
  • rollback + kill switch tested
  • audit logs verified
  • incident on-call + playbook ready

Templates (copy/paste)

One-page Agent Spec (required for each workflow)

  • Goal / non-goals
  • Autonomy level (1–6)
  • Tools allowed + risk ratings + approval policy
  • Memory policy (none/session/RAG/long-term)
  • Budgets (steps/tokens/tool calls/cost)
  • Stop conditions
  • Evals plan (CI/staging/prod)
  • Observability plan (trace schema + dashboards)
  • Rollout plan (flags/canary/kill switch)
  • Ownership + on-call

Back to top ↑