AWS Lambda
Serverless compute for burst workloads, lightweight inference wrappers, and async workers
Lambda
When to use (ML/GenAI)
- Burst + spiky inference wrappers, lightweight tool APIs, async workers (SQS), glue code.
- Great for orchestration, pre/post-processing, fanout, not heavy GPU inference.
Knobs that matter
- Memory (also scales CPU), timeout, ephemeral storage, architecture (x86/ARM).
- Concurrency: reserved concurrency per function, account concurrency.
- Cold start: VPC attachment increases risk; use VPC endpoints; consider Provisioned Concurrency if p99 matters.
- Event source mappings: batch size, max batching window, DLQ, partial batch response (SQS).
- Streaming responses (HTTP response streaming) when you need token streaming semantics. ([Amazon Web Services, Inc.][1])
Pricing mental model
- Meter is requests + GB-seconds duration (compute time × memory). ([Amazon Web Services, Inc.][1])
- Mental model: “If I double memory, I roughly double $/sec but may halve runtime — optimize on cost per request not memory.”
Heuristics
- If runtime is > 30–60s, or needs heavy deps/GPU, move to ECS/EKS/EC2.
- If you’re paying big NAT bills for Lambda-in-VPC → add endpoints (S3/Dynamo/etc).
Terraform (basic Lambda + IAM + logs + reserved concurrency)
resource "aws_cloudwatch_log_group" "lg" {
name = "/aws/lambda/${var.name}"
retention_in_days = 14
}
data "aws_iam_policy_document" "assume" {
statement {
effect = "Allow"
principals { type = "Service", identifiers = ["lambda.amazonaws.com"] }
actions = ["sts:AssumeRole"]
}
}
resource "aws_iam_role" "role" {
name = "${var.name}-role"
assume_role_policy = data.aws_iam_policy_document.assume.json
}
resource "aws_iam_role_policy_attachment" "basic" {
role = aws_iam_role.role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}
resource "aws_lambda_function" "fn" {
function_name = var.name
role = aws_iam_role.role.arn
runtime = "python3.12"
handler = "app.handler"
filename = var.zip_path
memory_size = 1024
timeout = 30
environment { variables = var.env }
# Uncomment if needed:
# vpc_config { subnet_ids = var.private_subnet_ids, security_group_ids = [var.sg_id] }
}
resource "aws_lambda_function_concurrency" "reserved" {
function_name = aws_lambda_function.fn.function_name
reserved_concurrent_executions = 50
}
variable "name" { type = string }
variable "zip_path"{ type = string }
variable "env" { type = map(string) default = {} }