Skip to main content
  1. Home
  2. >
  3. AWS
  4. >
  5. SAP-C02
  6. >
  7. This article

AWS SAP-C02 Drill: API Rate Limiting - The Noisy Neighbor Mitigation Trade-off

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | Multi-Cloud Architect & Strategist.
Jeff's Architecture Insights
Go beyond static exam dumps. Jeff’s Insights is engineered to cultivate the mindset of a Production-Ready Architect. We move past ‘correct answers’ to dissect the strategic trade-offs and multi-cloud patterns required to balance reliability, security, and TCO in mission-critical environments.

While preparing for the AWS SAP-C02, many candidates confuse reactive error handling with proactive traffic shaping. In the real world, this is fundamentally a decision about reputation protection vs. client responsibility. Let’s drill into a simulated scenario.

The Scenario
#

TechFlow Analytics operates a data ingestion API serving 200+ third-party integrations. The API is built on a serverless stack:

  • Amazon API Gateway (REST API)
  • AWS Lambda (Python 3.11, 512MB memory)
  • Amazon DynamoDB (on-demand billing)

Authentication uses API keys distributed to integration partners. Over the past week, the operations team noticed:

  • PUT request error rate increased from 2% to 18%
  • 80% of errors originate from 3 specific API keys (belonging to a legacy partner “DataPump Corp”)
  • CloudWatch Logs show no Lambda throttling or DynamoDB capacity issues
  • Error pattern: Bursts of 500-1000 requests/second from the same client

Business context:

  • The API is non-critical (partners can retry failed requests)
  • Errors are surfaced in partner dashboards, damaging TechFlow’s API reliability reputation
  • Partners have SLAs allowing up to 5% retry rates

Key Requirements
#

Protect API reputation while maintaining service availability for well-behaved clients, without requiring immediate partner code changes.

The Options:
#

  • A) Implement retry logic with exponential backoff and jitter in client applications; ensure errors are caught and handled with descriptive error messages.
  • B) Implement API throttling at the API Gateway layer using Usage Plans; ensure client applications can handle 429 (Too Many Requests) responses without displaying errors to end users.
  • C) Enable API caching for the production stage to improve response times; run a 10-minute load test; verify cache capacity matches workload requirements.
  • D) Configure reserved concurrency on the Lambda function to handle resource demands during traffic spikes.

Correct Answer
#

Option B.


The Architect’s Analysis
#

Correct Answer
#

Option B - Implement API throttling using Usage Plans at the API Gateway layer.

Step-by-Step Winning Logic
#

This solution represents the optimal trade-off for three reasons:

  1. Root Cause Resolution: Throttling prevents the noisy neighbor pattern at the API Gateway layer before consuming Lambda invocations or DynamoDB write capacity units.

  2. FinOps Efficiency: API Gateway throttling has zero marginal cost but prevents:

    • Unnecessary Lambda invocations ($0.20 per 1M requests + duration charges)
    • Wasted DynamoDB write capacity (on-demand charges for rejected writes)
    • Potential auto-scaling overhead if switching to provisioned capacity
  3. Reputation Protection: HTTP 429 responses are industry-standard signals that shift responsibility to the client. Well-designed client applications already handle 429s gracefully, making this a non-breaking change for compliant partners while forcing “DataPump Corp” to fix their integration.

  4. Granular Control: Usage Plans allow per-API-key rate limits, enabling you to:

    • Set conservative limits (e.g., 100 req/sec) for problematic clients
    • Maintain higher limits (e.g., 1000 req/sec) for premium partners
    • Monitor burst vs. steady-state usage patterns

The Traps (Distractor Analysis)
#

Why not Option A?

  • Shifts responsibility incorrectly: You’re asking 200+ partners to modify their code to accommodate 3 misbehaving clients.
  • Doesn’t prevent infrastructure waste: Retry logic still allows the initial flood of requests to consume Lambda/DynamoDB resources.
  • Exam trap: This is a client-side solution to a platform-side problem. SAP-C02 tests your ability to recognize when architectural controls are superior to application-layer fixes.

Why not Option C?

  • Wrong problem diagnosis: Caching improves read performance, but the issue is write request flooding (PUT operations).
  • No throttling benefit: Cache won’t reduce PUT request volume or protect DynamoDB from write storms.
  • Cost inefficiency: API Gateway caching starts at $0.02/hour for a 0.5GB cache—unnecessary spend for a write-heavy problem.

Why not Option D?

  • Prevents the wrong issue: Reserved concurrency protects Lambda from over-invocation, but the scenario explicitly states “no Lambda throttling detected.”
  • Increases costs: Reserved concurrency reserves capacity across all invocations, not just problematic clients—you’d be paying for unused headroom.
  • Doesn’t address reputation damage: Lambda will still process all requests; errors occur due to rate, not capacity.

The Architect Blueprint
#

graph TD A[Client with API Key] -->|PUT Request| B[API Gateway] B -->|Check Usage Plan| C{Rate Limit OK?} C -->|Yes| D[Lambda Function] C -->|No| E[Return 429 Response] D -->|Write| F[DynamoDB Table] style B fill:#FF9900,stroke:#232F3E,stroke-width:3px,color:#fff style C fill:#527FFF,stroke:#232F3E,stroke-width:2px,color:#fff style E fill:#D13212,stroke:#232F3E,stroke-width:2px,color:#fff style F fill:#4053D6,stroke:#232F3E,stroke-width:2px,color:#fff

Diagram Note: Usage Plans act as a gatekeeper at the API Gateway layer, rejecting over-limit requests before they consume downstream resources, isolating noisy neighbors from well-behaved clients.

The Decision Matrix
#

Option Est. Complexity Est. Monthly Cost Impact Pros Cons
A - Client-Side Retry High (200+ partner integrations) +$800/mo (increased Lambda invocations from retries, DynamoDB writes for failed-then-succeeded requests) • No platform changes
• Partners control retry behavior
• Doesn’t prevent resource waste
• Requires code changes across 200+ partners
• Still damages reputation (errors visible before retry)
B - API Gateway Throttling Low (API Gateway console, ~30 min) -$450/mo (prevents ~2.5M unnecessary Lambda invocations, 800K wasted DynamoDB writes) • Industry-standard solution
• Zero additional infrastructure cost
• Granular per-client control
• Immediate effect
• Requires client applications to handle 429 (most already do)
• May need partner communication
C - API Caching Medium (requires load testing) +$350/mo ($0.02/hr × 730hrs = $14.60 for cache + $335 for test infrastructure) • Improves GET performance • Doesn’t address PUT request flooding
• No throttling capability
• Adds unnecessary cost
D - Reserved Concurrency Low (Lambda console, 10 min) +$120/mo (reserved concurrency doesn’t have direct cost, but prevents cost optimization through shared concurrency pool) • Protects Lambda from over-invocation • Doesn’t prevent upstream request flood
• No per-client isolation
• Wastes reserved capacity during normal traffic

FinOps Impact Calculation:

  • Current waste: 2.5M excess Lambda invocations/month × $0.20/1M = $0.50 + (2.5M × 200ms avg duration × $0.0000166667/GB-sec) ≈ $83/mo
  • DynamoDB waste: 800K failed writes × $1.25/1M writes = $1/mo (but prevents future auto-scaling)
  • Reputation cost: Harder to quantify, but 18% error rate may trigger SLA penalties or partner churn

Option B ROI: ~$450/month savings + reputation protection with zero infrastructure investment.

Real-World Practitioner Insight
#

Exam Rule
#

“For SAP-C02, when you see ‘specific API keys’ + ’non-critical API’ + ‘client can tolerate retries’, always choose Usage Plans with throttling. API Gateway is AWS’s preferred chokepoint for rate limiting.”

Real World
#

In production, we’d implement a layered defense:

  1. Tier 1 (Immediate): API Gateway Usage Plans with burst limits
  2. Tier 2 (Week 2): Implement AWS WAF rate-based rules for additional protection against DDoS patterns
  3. Tier 3 (Month 1): Add CloudWatch alarms + SNS notifications when clients approach 80% of rate limits (proactive partner communication)
  4. Tier 4 (Quarter 1): Migrate to API Gateway HTTP APIs (30% cheaper than REST APIs) with AWS Lambda Powertools for structured logging

Additional considerations not in the exam:

  • Contract enforcement: Usage Plans map directly to partner tier agreements (Bronze/Silver/Gold SLAs)
  • Revenue opportunity: High-volume partners could purchase “rate limit increases” as a premium feature
  • Observability: Enable API Gateway execution logging + X-Ray tracing to identify why DataPump Corp is flooding the API (bug vs. intentional behavior)

Mastering AWS Solutions Architect Professional (SAP-C02)

Advanced architectural patterns, multi-account governance, and complex migrations.