While preparing for the AWS SAA-C03, many candidates get confused by when to use Kinesis vs. SQS vs. S3 for message ingestion. In the real world, this is fundamentally a decision about real-time decoupling vs. batch processing latency. Let’s drill into a simulated scenario.
The Scenario #
GlobalStream Financial operates a real-time payment notification platform that receives transaction alerts from payment gateways worldwide. These incoming messages must be immediately processed by 30+ downstream microservices including fraud detection, customer notification, analytics pipelines, and audit logging systems.
The platform experiences extreme volatility—baseline traffic sits at 5,000 messages per second, but during flash sales or market events, it can spike to 100,000 messages per second within minutes. The current monolithic architecture creates bottlenecks where downstream services directly poll the main application, causing cascading failures during peak loads.
The engineering team has been tasked with redesigning the ingestion layer to achieve two critical goals: completely decouple producers from consumers and support independent scaling of all downstream services without message loss.
Key Requirements #
Design a solution that:
- Handles burst traffic from 5K to 100K messages/second
- Enables dozens of consumers to independently process messages at their own pace
- Removes all tight coupling between message producers and consumers
- Minimizes operational complexity for an associate-level engineering team
The Options #
A) Store transaction data directly into Amazon DynamoDB. Configure DynamoDB table rules to automatically remove sensitive fields during write operations. Use DynamoDB Streams to enable downstream applications to consume the transaction data.
B) Stream transaction data to Amazon Kinesis Data Firehose, which stores data to both Amazon DynamoDB and Amazon S3. Use AWS Lambda integrated with Kinesis Data Firehose to remove sensitive data fields. Downstream applications consume data by reading from the Amazon S3 bucket.
C) Stream transaction data to Amazon Kinesis Data Streams. Use AWS Lambda to process each message and remove sensitive fields, then store sanitized data to Amazon DynamoDB. Downstream applications consume transaction data directly from the Kinesis Data Streams using independent consumer groups.
D) Store batched transaction data as files in Amazon S3. Use AWS Lambda triggered by S3 events to process each file, remove sensitive data, and update the file in S3. The Lambda function then writes records to Amazon DynamoDB. Downstream applications consume transaction files from S3.
Correct Answer #
Option C.
The Architect’s Analysis #
Correct Answer #
Option C - Amazon Kinesis Data Streams with Lambda processing and multi-consumer pattern.
Step-by-Step Winning Logic #
This solution achieves true decoupling through the streaming pub-sub pattern:
Why This Works:
- Kinesis Data Streams acts as the durable buffer that absorbs traffic spikes (100K msgs/sec) without impacting producers or consumers
- Multiple independent consumers can read from the same stream at different rates using Enhanced Fan-Out (2 MB/sec per consumer) or shared throughput
- Lambda integration handles data sanitization in-flight before storage, maintaining separation of concerns
- DynamoDB storage provides low-latency lookup for the source application while Kinesis provides real-time distribution
The Technical Excellence:
- Kinesis retains data for 24 hours (up to 365 days), allowing consumers to catch up after failures
- Each microservice maintains its own shard iterator position—complete autonomy
- Automatic partition key distribution prevents hot shards at high throughput
The Traps (Distractor Analysis) #
Why not Option A (DynamoDB + DynamoDB Streams)?
- DynamoDB Streams is designed for database change capture, not high-volume message ingestion
- You’d be writing 100K records/second to DynamoDB unnecessarily (expensive writes: ~$0.00065 per WCU)
- DynamoDB Streams has 2 concurrent consumers limit—won’t support “dozens of microservices”
- No native rule engine for “write-time data deletion” in DynamoDB
Why not Option B (Kinesis Firehose + S3)?
- Firehose is for data delivery, not message distribution—it buffers and batches to destinations
- Downstream consumers reading from S3 introduces batch latency (60-900 seconds buffer)
- No pub-sub pattern: consumers must poll S3, creating tight coupling and inefficiency
- Firehose doesn’t support multiple independent consumers reading the same stream
Why not Option D (S3 Batch Processing)?
- Batch processing fundamentally violates real-time requirements
- S3 event notifications to Lambda introduce unpredictable delays
- No mechanism for dozens of consumers to independently process the same data
- File-based coordination creates race conditions and complexity
- Completely fails the “decoupling” requirement
The Architect Blueprint #
Diagram Note: Kinesis Data Streams serves as the central decoupling layer, allowing 30+ consumers to independently read transaction data at their own pace while Lambda handles data sanitization before DynamoDB persistence.
Real-World Practitioner Insight #
Exam Rule #
For the SAA-C03 exam, when you see “dozens of consumers” + “real-time” + “extreme traffic spikes”, always choose Kinesis Data Streams over Firehose, SQS, or S3 batch processing. The keyword “decouple” with multiple consumers = streaming pub-sub pattern.
Real World #
In production at GlobalStream Financial scale, we would likely implement:
- Kinesis Data Streams with On-Demand capacity mode instead of provisioned shards to handle unpredictable spikes without over-provisioning
- Hybrid consumer strategy: Critical services use Enhanced Fan-Out ($0.015/GB), while batch analytics use shared throughput to optimize cost
- Dead Letter Queues (DLQ) on Lambda for handling malformed messages without blocking the stream
- AWS Glue or Kinesis Data Analytics for complex aggregations instead of forcing all logic into Lambda
- Cross-region replication for disaster recovery, not mentioned in this associate-level scenario
Cost Reality Check:
- At 100K msgs/sec sustained = 8.64B messages/day = ~1.5 TB/day data volume
- Kinesis Data Streams on-demand: ~$0.04/GB ingested + $0.015/GB Enhanced Fan-Out = ~$82/day just for streaming
- Production would require capacity planning discussions, not “unlimited auto-scaling” assumptions