Amazon OpenSearch
Search and analytics engine for logs, metrics, and vector search
OpenSearch
BM25 + vector + hybrid retrieval
Mental model
- Search engine + vector DB + analytics-ish store in one system.
- Think of it as “ranked retrieval service”: lexical (BM25), semantic (kNN/ANN), filters/facets, aggregations, near-real-time indexing.
Where it’s used in GenAI / Agentic systems
- RAG retrieval: hybrid search over chunked documents + metadata filters.
- Tooling search: search across knowledge, tickets, runbooks, policies.
- Agent observability: searchable traces/events (tool calls, failures, latency), “why did the agent do that?” investigations.
Core index patterns (what to store)
Pattern A: Chunk index (most common for RAG)
-
One document per chunk.
-
Fields:
text(analyzed) for BM25embedding(knn_vector) for semanticdoc_id,source,tenant_id,dt,lang,tags,acl_*for filtering- Optional:
title,section,url,hash,version
Heuristic: keep chunk size stable (e.g., 300–800 tokens) and store the “retrieval unit” you’ll actually feed to the LLM.
Pattern B: Dual-index (docs + chunks)
docsindex: one doc per source doc (metadata, title, summary).chunksindex: retrieval units.- Useful when you want doc-level ranking + chunk-level retrieval, or dedupe by doc.
Hybrid retrieval mental model
Best-practice default
- Run BM25 query + vector query.
- Fuse results with rank-based fusion (RRF is the common default) so you don’t fight score normalization.
- Apply metadata filters consistently (tenant, language, ACL, recency).
Practical scoring heuristics
- RRF as the default hybrid fuse (stable, low tuning).
- Add recency boosting if your corpus changes frequently.
- Add authority boosting (trusted sources) when hallucination risk is high.
- Use two-stage rerank if needed: retrieve top N → rerank with a cross-encoder (outside OpenSearch) → pack context.
Vector search knobs (the ones that matter)
-
Embedding dimension: fixed per index; choose once.
-
ANN method: typically HNSW-based; tuning matters.
m: graph connectivity (quality ↑, memory ↑, indexing slower)ef_construction: indexing quality/time trade-off- Query-time:
kand sometimesef_searchdepending on engine
-
Engine choice: Lucene vs FAISS-style options (trade-offs vary by version/feature set).
-
Filtering strategy:
- Prefer metadata filtering that can be applied during ANN (when supported) to avoid “retrieve then filter” inefficiency.
-
Quantization/compression (if available/used): reduces memory, may reduce quality.
Senior heuristic: vector search is usually RAM-bound. If you’re paging vectors from disk, latency gets ugly fast.
Cluster & index knobs (senior knobs)
Cluster sizing knobs
- Data nodes: hold shards + do query/indexing work.
- Dedicated manager/master nodes: stabilize clusters (recommended beyond tiny dev clusters).
- AZ count: 2–3 AZs for HA; increases cross-AZ traffic and replica overhead.
- EBS volume + IOPS: too-low IOPS causes indexing/query stalls.
- UltraWarm / cold tiers (log-style use cases): cheaper for older read-only data.
Index settings knobs
number_of_shards: scale-out + parallelism; too many shards hurts.number_of_replicas: HA + query throughput; doubles/triples storage.refresh_interval: lower = fresher search but higher indexing cost.- Mappings/analyzers: text analyzers are relevance; bad analyzers = bad BM25.
Senior heuristic: shard count is a lifecycle decision. Over-sharding early is a very common failure mode.
Pricing mental model (back-of-envelope)
Managed OpenSearch Service (provisioned domain)
You pay for:
- Always-on instance-hours (data + dedicated manager + warm nodes if used)
- Storage GB-month (EBS for hot; warm storage pricing differs)
- Snapshots (S3-backed) + data transfer + optional features
Mental model: “OpenSearch (managed) is a fixed monthly platform cost + storage. The bill is dominated by how many nodes you keep running.”
OpenSearch Serverless
You pay for:
- Compute units for indexing and search (metered with usage)
- Storage GB-month
Mental model: “Serverless shifts cost from fixed nodes → workload-driven compute. Great for spiky search or when you don’t want to size clusters.”
When to choose OpenSearch (vs common alternatives)
-
Choose OpenSearch when you need:
- BM25 + vector + hybrid + filters + aggregations
- multi-tenant search with faceting
- search/analytics on logs/telemetry
-
Choose Postgres+pgvector / Aurora when:
- data is strongly relational/transactional and vector is “nice-to-have”
- simpler ops, smaller scale retrieval
-
Choose a dedicated vector DB when:
- you want a pure vector-first feature set and don’t need full-text/aggregations (often simpler tuning for ANN)
Terraform templates
A) Managed OpenSearch domain (VPC, encryption, fine-grained security, logs)
resource "aws_cloudwatch_log_group" "os_app" {
name = "/aws/opensearch/${var.domain_name}/application"
retention_in_days = 14
}
resource "aws_cloudwatch_log_group" "os_slow" {
name = "/aws/opensearch/${var.domain_name}/slow"
retention_in_days = 14
}
resource "aws_cloudwatch_log_resource_policy" "os_logs" {
policy_name = "${var.domain_name}-opensearch-logs"
policy_document = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Principal = { Service = "es.amazonaws.com" },
Action = [
"logs:PutLogEvents",
"logs:CreateLogStream"
],
Resource = [
aws_cloudwatch_log_group.os_app.arn,
aws_cloudwatch_log_group.os_slow.arn
]
}]
})
}
data "aws_iam_policy_document" "os_access" {
statement {
effect = "Allow"
principals { type = "AWS", identifiers = var.allowed_principals }
actions = ["es:ESHttp*"]
resources = ["arn:aws:es:${var.region}:${var.account_id}:domain/${var.domain_name}/*"]
}
}
resource "aws_opensearch_domain" "this" {
domain_name = var.domain_name
engine_version = var.engine_version
cluster_config {
instance_type = var.data_instance_type
instance_count = var.data_instance_count
dedicated_master_enabled = true
dedicated_master_type = var.master_instance_type
dedicated_master_count = var.master_instance_count
zone_awareness_enabled = true
zone_awareness_config {
availability_zone_count = 2
}
}
ebs_options {
ebs_enabled = true
volume_type = "gp3"
volume_size = var.ebs_gb
iops = var.ebs_iops
throughput = var.ebs_throughput
}
vpc_options {
subnet_ids = var.private_subnet_ids
security_group_ids = [var.sg_id]
}
encrypt_at_rest { enabled = true }
node_to_node_encryption { enabled = true }
domain_endpoint_options {
enforce_https = true
tls_security_policy = "Policy-Min-TLS-1-2-2019-07"
}
advanced_security_options {
enabled = true
internal_user_database_enabled = true
master_user_options {
master_user_name = var.master_user
master_user_password = var.master_password
}
}
access_policies = data.aws_iam_policy_document.os_access.json
log_publishing_options {
log_type = "ES_APPLICATION_LOGS"
cloudwatch_log_group_arn = aws_cloudwatch_log_group.os_app.arn
}
log_publishing_options {
log_type = "SEARCH_SLOW_LOGS"
cloudwatch_log_group_arn = aws_cloudwatch_log_group.os_slow.arn
}
tags = var.tags
}
variable "domain_name" { type = string }
variable "engine_version" { type = string default = "OpenSearch_2.13" }
variable "data_instance_type" { type = string default = "m6g.large.search" }
variable "data_instance_count" { type = number default = 3 }
variable "master_instance_type" { type = string default = "m6g.large.search" }
variable "master_instance_count" { type = number default = 3 }
variable "ebs_gb" { type = number default = 200 }
variable "ebs_iops" { type = number default = 3000 }
variable "ebs_throughput" { type = number default = 125 }
variable "private_subnet_ids" { type = list(string) }
variable "sg_id" { type = string }
variable "master_user" { type = string }
variable "master_password" { type = string sensitive = true }
variable "allowed_principals" { type = list(string) default = [] }
variable "region" { type = string }
variable "account_id" { type = string }
variable "tags" { type = map(string) default = {} }
B) OpenSearch Serverless (collection + minimal security skeleton)
resource "aws_opensearchserverless_collection" "col" {
name = var.collection_name
type = "SEARCH"
tags = var.tags
}
# Encryption policy
resource "aws_opensearchserverless_security_policy" "enc" {
name = "${var.collection_name}-enc"
type = "encryption"
policy = jsonencode({
Rules = [{
ResourceType = "collection",
Resource = ["collection/${aws_opensearchserverless_collection.col.name}"]
}],
AWSOwnedKey = true
})
}
# Network policy (public here for brevity; prefer VPC/private where possible)
resource "aws_opensearchserverless_security_policy" "net" {
name = "${var.collection_name}-net"
type = "network"
policy = jsonencode([{
Rules = [{
ResourceType = "collection",
Resource = ["collection/${aws_opensearchserverless_collection.col.name}"]
}],
AllowFromPublic = true
}])
}
# Data access policy
resource "aws_opensearchserverless_access_policy" "access" {
name = "${var.collection_name}-access"
type = "data"
policy = jsonencode([{
Rules = [
{
ResourceType = "index",
Resource = ["index/${aws_opensearchserverless_collection.col.name}/*"],
Permission = ["aoss:ReadDocument","aoss:WriteDocument","aoss:CreateIndex","aoss:DescribeIndex"]
}
],
Principal = var.allowed_principals
}])
}
variable "collection_name" { type = string }
variable "allowed_principals" { type = list(string) default = [] }
variable "tags" { type = map(string) default = {} }
“Minimal-regret” defaults for RAG on OpenSearch
- Start with hybrid retrieval (BM25 + vector) + RRF fusion.
- Keep metadata filtering first-class (tenant, ACL, dt, lang).
- Use aliases for index versioning (
chunks_v1,chunks_v2→ aliaschunks_current). - Design shards for the next 6–12 months, not day 1.
- Treat vectors as RAM planning: vector workloads usually force a memory-first sizing strategy.