SageMaker Feature Store
Feature storage and serving for ML pipelines
SageMaker Feature Store (classic ML at scale)
Mental model
- Central feature management: offline store + online store, lineage, reuse.
- Use when multiple models/teams need consistent feature definitions.
Where it fits
- Fraud/ranking/recommendation/forecasting systems with many features and multiple consumers.
- Less common for pure GenAI apps; more for “classic ML + GenAI combined” products.
Knobs that matter
- Online store: low-latency retrieval (capacity/throughput considerations)
- Offline store: S3 + Glue catalog integration for training datasets
- Feature definitions: schema management and backfills
- Freshness: streaming vs batch feature updates
Pricing mental model
-
Think of it as paying for:
- Online store (hot serving path)
- Offline storage/compute (building and querying datasets)
-
Biggest lever is avoiding unnecessary online features and controlling backfills.
Terraform template (feature group skeleton)
resource "aws_sagemaker_feature_group" "fg" {
feature_group_name = var.name
record_identifier_feature_name = "record_id"
event_time_feature_name = "event_time"
role_arn = var.sm_role_arn
feature_definition { feature_name = "record_id"; feature_type = "String" }
feature_definition { feature_name = "event_time"; feature_type = "String" }
feature_definition { feature_name = "f1"; feature_type = "Fractional" }
offline_store_config {
s3_storage_config { s3_uri = var.offline_s3_uri }
disable_glue_table_creation = false
}
online_store_config {
enable_online_store = true
}
}
variable "name" { type = string }
variable "sm_role_arn" { type = string }
variable "offline_s3_uri" { type = string }