Amazon Managed Prometheus
Managed Prometheus for container and Kubernetes monitoring
Prometheus
Mental model
- AMP is a managed Prometheus backend (remote_write target + long-term store).
- You still run collectors (Prometheus/ADOT) on EKS/ECS/EC2 to scrape/export metrics.
Where it’s must-have in GenAI/agents
- Infra + platform metrics at scale: ECS/EKS/EC2, GPU/CPU/mem, queue depth, autoscaling behavior.
- Best when you want PromQL + alert rules + standard OSS exporters.
Senior knobs
- Scrape interval: 60s is common; 15s increases cost fast.
- Label/cardinality hygiene: this is the #1 cost and performance risk.
- Recording rules: precompute expensive queries; reduce query load.
- Query limits (QSP-style): prevent a single dashboard from nuking costs.
Pricing mental model
-
Your bill is roughly:
- Ingestion: priced per samples ingested (10M samples is the unit).
- Storage: priced per GB-month.
- Queries: priced by samples processed (PromQL cost).
-
Rule of thumb: high-frequency scrapes + high cardinality = surprise bill.
Terraform template (AMP workspace + basic IAM policy for query/write)
resource "aws_prometheus_workspace" "amp" {
alias = var.name
tags = var.tags
}
# Minimal policy (attach to IRSA role / ECS task role used by collector and/or query clients)
resource "aws_iam_policy" "amp_access" {
name = "${var.name}-amp-access"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{ Effect="Allow", Action=["aps:RemoteWrite"], Resource=aws_prometheus_workspace.amp.arn },
{ Effect="Allow", Action=["aps:QueryMetrics","aps:GetSeries","aps:GetLabels","aps:GetMetricMetadata"], Resource=aws_prometheus_workspace.amp.arn }
]
})
}
variable "name" { type = string }
variable "tags" { type = map(string) default = {} }