Streaming

Amazon MSK (Kafka)

Managed Apache Kafka for event streaming

MSK (Managed Kafka)

Mental model

  • Kafka compatibility + ecosystem (Connect, Streams, Schema Registry patterns, exactly-once/transactions, tooling).
  • Use when you need Kafka APIs, multi-team data platform standards, existing Kafka integrations, or high-throughput pub/sub with Kafka semantics.

Key knobs that matter (Provisioned MSK)

  • Broker instance type/count: capacity and cost.
  • Storage per broker (EBS) + throughput characteristics.
  • Partitions: throughput parallelism and consumer scaling; too many partitions increases ops overhead.
  • Auth/encryption: TLS in transit, IAM/SCRAM, ACL strategy.
  • Cross-AZ: strong HA, but model cross-AZ traffic.

Serverless MSK (when it fits)

  • Pricing drivers: cluster-hours + partition-hours + GB in/out + storage. ([Amazon Web Services, Inc.][7])
  • Great when you don’t want broker ops and the workload isn’t huge/steady.

Pricing mental model

  • Provisioned: broker-hours + storage GB-month + data transfer. ([Amazon Web Services, Inc.][8])
  • Serverless: you’re explicitly paying for partitions and data in/out, so “partition sprawl” can become a cost driver. ([Amazon Web Services, Inc.][7])

Agentic/GenAI usage patterns

  • Enterprise “event backbone” where Kafka is mandated; ingestion for telemetry, model events, and cross-service workflows with Kafka tooling.

Terraform (Provisioned MSK cluster skeleton)

resource "aws_security_group" "msk" {
  name   = "${var.name}-msk-sg"
  vpc_id = var.vpc_id

  ingress {
    from_port       = 9092
    to_port         = 9094
    protocol        = "tcp"
    security_groups = [var.app_sg_id]
  }
  egress { from_port = 0, to_port = 0, protocol = "-1", cidr_blocks = ["0.0.0.0/0"] }
}

resource "aws_msk_cluster" "this" {
  cluster_name           = var.name
  kafka_version          = var.kafka_version
  number_of_broker_nodes = var.brokers

  broker_node_group_info {
    instance_type   = var.broker_instance_type
    client_subnets  = var.private_subnet_ids
    security_groups = [aws_security_group.msk.id]

    storage_info {
      ebs_storage_info { volume_size = var.ebs_gb }
    }
  }

  encryption_info {
    encryption_in_transit { client_broker = "TLS", in_cluster = true }
  }

  tags = var.tags
}

variable "name"                { type = string }
variable "vpc_id"              { type = string }
variable "app_sg_id"           { type = string }
variable "private_subnet_ids"  { type = list(string) }
variable "kafka_version"       { type = string default = "3.6.0" }
variable "brokers"             { type = number default = 3 }
variable "broker_instance_type"{ type = string default = "kafka.m5.large" }
variable "ebs_gb"              { type = number default = 1000 }
variable "tags"                { type = map(string) default = {} }