While preparing for the AWS SAP-C02, many candidates confuse reactive error handling with proactive traffic shaping. In the real world, this is fundamentally a decision about reputation protection vs. client responsibility. Let’s drill into a simulated scenario.
The Scenario #
TechFlow Analytics operates a data ingestion API serving 200+ third-party integrations. The API is built on a serverless stack:
- Amazon API Gateway (REST API)
- AWS Lambda (Python 3.11, 512MB memory)
- Amazon DynamoDB (on-demand billing)
Authentication uses API keys distributed to integration partners. Over the past week, the operations team noticed:
- PUT request error rate increased from 2% to 18%
- 80% of errors originate from 3 specific API keys (belonging to a legacy partner “DataPump Corp”)
- CloudWatch Logs show no Lambda throttling or DynamoDB capacity issues
- Error pattern: Bursts of 500-1000 requests/second from the same client
Business context:
- The API is non-critical (partners can retry failed requests)
- Errors are surfaced in partner dashboards, damaging TechFlow’s API reliability reputation
- Partners have SLAs allowing up to 5% retry rates
Key Requirements #
Protect API reputation while maintaining service availability for well-behaved clients, without requiring immediate partner code changes.
The Options: #
- A) Implement retry logic with exponential backoff and jitter in client applications; ensure errors are caught and handled with descriptive error messages.
- B) Implement API throttling at the API Gateway layer using Usage Plans; ensure client applications can handle 429 (Too Many Requests) responses without displaying errors to end users.
- C) Enable API caching for the production stage to improve response times; run a 10-minute load test; verify cache capacity matches workload requirements.
- D) Configure reserved concurrency on the Lambda function to handle resource demands during traffic spikes.
Correct Answer #
Option B.
The Architect’s Analysis #
Correct Answer #
Option B - Implement API throttling using Usage Plans at the API Gateway layer.
Step-by-Step Winning Logic #
This solution represents the optimal trade-off for three reasons:
-
Root Cause Resolution: Throttling prevents the noisy neighbor pattern at the API Gateway layer before consuming Lambda invocations or DynamoDB write capacity units.
-
FinOps Efficiency: API Gateway throttling has zero marginal cost but prevents:
- Unnecessary Lambda invocations ($0.20 per 1M requests + duration charges)
- Wasted DynamoDB write capacity (on-demand charges for rejected writes)
- Potential auto-scaling overhead if switching to provisioned capacity
-
Reputation Protection: HTTP 429 responses are industry-standard signals that shift responsibility to the client. Well-designed client applications already handle 429s gracefully, making this a non-breaking change for compliant partners while forcing “DataPump Corp” to fix their integration.
-
Granular Control: Usage Plans allow per-API-key rate limits, enabling you to:
- Set conservative limits (e.g., 100 req/sec) for problematic clients
- Maintain higher limits (e.g., 1000 req/sec) for premium partners
- Monitor burst vs. steady-state usage patterns
The Traps (Distractor Analysis) #
Why not Option A?
- Shifts responsibility incorrectly: You’re asking 200+ partners to modify their code to accommodate 3 misbehaving clients.
- Doesn’t prevent infrastructure waste: Retry logic still allows the initial flood of requests to consume Lambda/DynamoDB resources.
- Exam trap: This is a client-side solution to a platform-side problem. SAP-C02 tests your ability to recognize when architectural controls are superior to application-layer fixes.
Why not Option C?
- Wrong problem diagnosis: Caching improves read performance, but the issue is write request flooding (PUT operations).
- No throttling benefit: Cache won’t reduce PUT request volume or protect DynamoDB from write storms.
- Cost inefficiency: API Gateway caching starts at $0.02/hour for a 0.5GB cache—unnecessary spend for a write-heavy problem.
Why not Option D?
- Prevents the wrong issue: Reserved concurrency protects Lambda from over-invocation, but the scenario explicitly states “no Lambda throttling detected.”
- Increases costs: Reserved concurrency reserves capacity across all invocations, not just problematic clients—you’d be paying for unused headroom.
- Doesn’t address reputation damage: Lambda will still process all requests; errors occur due to rate, not capacity.
The Architect Blueprint #
Diagram Note: Usage Plans act as a gatekeeper at the API Gateway layer, rejecting over-limit requests before they consume downstream resources, isolating noisy neighbors from well-behaved clients.
The Decision Matrix #
| Option | Est. Complexity | Est. Monthly Cost Impact | Pros | Cons |
|---|---|---|---|---|
| A - Client-Side Retry | High (200+ partner integrations) | +$800/mo (increased Lambda invocations from retries, DynamoDB writes for failed-then-succeeded requests) | • No platform changes • Partners control retry behavior |
• Doesn’t prevent resource waste • Requires code changes across 200+ partners • Still damages reputation (errors visible before retry) |
| B - API Gateway Throttling ✅ | Low (API Gateway console, ~30 min) | -$450/mo (prevents ~2.5M unnecessary Lambda invocations, 800K wasted DynamoDB writes) | • Industry-standard solution • Zero additional infrastructure cost • Granular per-client control • Immediate effect |
• Requires client applications to handle 429 (most already do) • May need partner communication |
| C - API Caching | Medium (requires load testing) | +$350/mo ($0.02/hr × 730hrs = $14.60 for cache + $335 for test infrastructure) | • Improves GET performance | • Doesn’t address PUT request flooding • No throttling capability • Adds unnecessary cost |
| D - Reserved Concurrency | Low (Lambda console, 10 min) | +$120/mo (reserved concurrency doesn’t have direct cost, but prevents cost optimization through shared concurrency pool) | • Protects Lambda from over-invocation | • Doesn’t prevent upstream request flood • No per-client isolation • Wastes reserved capacity during normal traffic |
FinOps Impact Calculation:
- Current waste: 2.5M excess Lambda invocations/month × $0.20/1M = $0.50 + (2.5M × 200ms avg duration × $0.0000166667/GB-sec) ≈ $83/mo
- DynamoDB waste: 800K failed writes × $1.25/1M writes = $1/mo (but prevents future auto-scaling)
- Reputation cost: Harder to quantify, but 18% error rate may trigger SLA penalties or partner churn
Option B ROI: ~$450/month savings + reputation protection with zero infrastructure investment.
Real-World Practitioner Insight #
Exam Rule #
“For SAP-C02, when you see ‘specific API keys’ + ’non-critical API’ + ‘client can tolerate retries’, always choose Usage Plans with throttling. API Gateway is AWS’s preferred chokepoint for rate limiting.”
Real World #
In production, we’d implement a layered defense:
- Tier 1 (Immediate): API Gateway Usage Plans with burst limits
- Tier 2 (Week 2): Implement AWS WAF rate-based rules for additional protection against DDoS patterns
- Tier 3 (Month 1): Add CloudWatch alarms + SNS notifications when clients approach 80% of rate limits (proactive partner communication)
- Tier 4 (Quarter 1): Migrate to API Gateway HTTP APIs (30% cheaper than REST APIs) with AWS Lambda Powertools for structured logging
Additional considerations not in the exam:
- Contract enforcement: Usage Plans map directly to partner tier agreements (Bronze/Silver/Gold SLAs)
- Revenue opportunity: High-volume partners could purchase “rate limit increases” as a premium feature
- Observability: Enable API Gateway execution logging + X-Ray tracing to identify why DataPump Corp is flooding the API (bug vs. intentional behavior)