SageMaker Batch Transform
Batch inference for large-scale offline predictions
SageMaker Async Inference / Batch Transform
Mental model
- Async Inference: “endpoint, but request/response is async with S3 output”. Great for bursty, long-running inference.
- Batch Transform: run inference over a dataset in S3; classic offline scoring.
Where it fits (ML/GenAI)
- Async: document processing, long generation, heavy pre/post-processing.
- Batch: re-score catalogs, offline embeddings, nightly scoring, eval sweeps.
Knobs that matter
- S3 input/output prefixes (organize by dt/run_id)
- Concurrency: max concurrent invocations / batch strategy
- Payload sizes: keep records sized appropriately; avoid huge single records
- Retry behavior + DLQs (wrap with Step Functions for stronger semantics)
Pricing mental model
- You’re paying for inference instance-hours while the job runs (batch) or while capacity is provisioned (async endpoint).
- If it’s periodic, batch often wins on cost vs keeping endpoints up.
Terraform template (Batch Transform job skeleton)
resource "aws_sagemaker_transform_job" "bt" {
name = "${var.name}-batch-transform"
model_name = var.model_name
transform_input {
data_source {
s3_data_source {
s3_data_type = "S3Prefix"
s3_uri = var.input_s3_uri
}
}
content_type = "application/jsonl"
}
transform_output {
s3_output_path = var.output_s3_uri
accept = "application/jsonl"
}
transform_resources {
instance_type = var.instance_type
instance_count = var.instance_count
}
}
variable "name" { type = string }
variable "model_name" { type = string }
variable "input_s3_uri" { type = string }
variable "output_s3_uri" { type = string }
variable "instance_type" { type = string default = "ml.m5.xlarge" }
variable "instance_count" { type = number default = 2 }