Skip to main content
  1. Home
  2. >
  3. AWS
  4. >
  5. SAA-C03
  6. >
  7. This article

AWS SAA-C03 Drill: S3 Lifecycle & Archive Strategy - The Cost-Access Trade-off Analysis

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | Multi-Cloud Architect & Strategist.
Jeff's Architecture Insights
Go beyond static exam dumps. Jeff’s Insights is engineered to cultivate the mindset of a Production-Ready Architect. We move past ‘correct answers’ to dissect the strategic trade-offs and multi-cloud patterns required to balance reliability, security, and TCO in mission-critical environments.

While preparing for the AWS SAA-C03, many candidates get confused by S3 storage class selection and lifecycle automation. In the real world, this is fundamentally a decision about Access Pattern Economics vs. Retrieval Latency Tolerance. Let’s drill into a simulated scenario.

The Scenario
#

VoiceConnect Analytics, a telecom analytics startup, processes monthly batches of customer call recordings for compliance and quality assurance. Their usage pattern analysis reveals:

  • Customers actively request recordings during the first 12 months after a call (random access, unpredictable timing)
  • After the 12-month mark, requests drop to less than 2% of total queries
  • Regulatory requirements mandate 7-year retention
  • Current solution stores everything in S3 Standard at $0.023/GB/month

The engineering team needs to optimize storage costs while maintaining:

  • Fast retrieval (seconds) for files under 1 year old
  • Acceptable delay (minutes to hours) for archived files over 1 year old

Key Requirements
#

Design the most cost-effective solution that provides efficient query and retrieval for recent files (< 1 year) while tolerating higher latency for older archives.

The Options
#

  • A) Store individual files in Amazon S3 Glacier Instant Retrieval with object tags, query and retrieve files using tag-based searches.

  • B) Store individual files in Amazon S3 Intelligent-Tiering, use S3 Lifecycle policies to transition files older than 1 year to S3 Glacier Flexible Retrieval, query and retrieve S3 files using Amazon Athena, query and retrieve Glacier files using S3 Glacier Select.

  • C) Store individual files with tags in Amazon S3 Standard, store searchable metadata for each archive in Amazon S3 Standard, use S3 Lifecycle policies to transition files older than 1 year to S3 Glacier Instant Retrieval, query and retrieve files by searching metadata in Amazon S3.

  • D) Store individual files in Amazon S3 Standard, use S3 Lifecycle policies to transition files older than 1 year to S3 Glacier Deep Archive, store searchable metadata in Amazon RDS, query files through Amazon RDS, retrieve files from S3 Glacier Deep Archive.

Correct Answer
#

Option C.

The Architect’s Analysis
#

Correct Answer
#

Option C

Step-by-Step Winning Logic
#

Option C achieves the optimal trade-off through four architectural principles:

  1. Access Pattern Segmentation: Keeps Year 1 data in S3 Standard (millisecond retrieval for unpredictable access), transitions Year 2+ to Glacier Instant Retrieval (millisecond retrieval at 68% lower cost).

  2. Query Performance Without Compute Overhead: The metadata layer in S3 Standard enables instant searches using S3 Select or simple GET operations—no query engine provisioning, no database management.

  3. Cost-Optimized Archive Tier: Glacier Instant Retrieval ($0.004/GB/month) vs. S3 Standard ($0.023/GB/month) = 83% storage cost reduction for aged data, while maintaining instant access (unlike Flexible Retrieval’s 1-12 hour delay).

  4. Serverless Simplicity: Lifecycle policies automate transitions; no Lambda orchestration, no RDS cluster, no Athena query costs.

The Traps (Distractor Analysis)
#

Why not Option A?

  • Fatal Flaw: Storing ALL data (including Year 1 hot data) in Glacier Instant Retrieval costs $0.004/GB storage + $0.03/GB retrieval fee. With random Year 1 access patterns, retrieval costs could exceed storage savings by 5-10x.
  • Tag Search Limitation: S3 tag-based filtering requires listing operations across all objects—inefficient at scale compared to dedicated metadata.

Why not Option B?

  • Retrieval Latency Violation: Glacier Flexible Retrieval imposes 1-5 hour standard retrieval (or $0.03/GB for expedited 1-5 minutes). This fails the “acceptable delay” requirement when Instant Retrieval exists.
  • Query Complexity: Athena for S3 + Glacier Select for archives = dual query systems, adding operational complexity and Athena scan costs ($5/TB scanned).
  • Intelligent-Tiering Overhead: Adds $0.0025/1000 objects monitoring fee with no benefit here since the 1-year transition is deterministic, not access-pattern-based.

Why not Option D?

  • Over-Engineered Database: Amazon RDS introduces:
    • Monthly costs ($50-$200 for db.t3.small)
    • Patch management, backup windows, connection pooling
    • Overkill for simple key-value metadata lookups
  • Deep Archive Latency: 12-hour standard retrieval violates “acceptable delay” for a system with 2% post-year-1 access (users would expect hours, not half a day).
  • Retrieval Cost Explosion: Deep Archive charges $0.02/GB retrieval—on a 10TB archive with 2% annual access (200GB), that’s $4,000/year just in retrieval fees.

The Architect Blueprint
#

graph TD User([Customer Portal]) -->|Search Request| S3Meta[S3 Standard: Metadata Objects<br/>call-metadata/2024-05/*.json] S3Meta -->|Returns File Locations| Lambda[Lambda: File Resolver] Lambda -->|If < 1 Year| S3Hot[S3 Standard<br/>recordings/2024/] Lambda -->|If > 1 Year| S3Cold[S3 Glacier Instant Retrieval<br/>recordings/2023/] S3Hot -->|Millisecond Access| User S3Cold -->|Millisecond Access| User S3Hot -.->|After 365 Days| Lifecycle[S3 Lifecycle Policy] Lifecycle -.->|Transition| S3Cold style S3Meta fill:#FF9900,stroke:#232F3E,color:#fff style S3Hot fill:#569A31,stroke:#232F3E,color:#fff style S3Cold fill:#3B48CC,stroke:#232F3E,color:#fff style Lifecycle fill:#FF9900,stroke:#232F3E,stroke-dasharray: 5 5

Diagram Note: Metadata in S3 Standard enables instant queries, while Lifecycle policies automate cost optimization without sacrificing retrieval speed for either tier.

The Decision Matrix
#

Option Est. Complexity Est. Monthly Cost (10TB, 20% Y1 Access) Pros Cons
A Low $1,240 Storage ($40) + Retrieval ($1,200 for 2TB) Simple single-tier architecture Retrieval costs catastrophic for hot data; poor query performance via tags
B Medium $890 S3-IT ($50) + Glacier Flex ($32) + Athena ($25/month scans) Automatic tiering Dual query systems; 1-5 hour retrieval latency; Athena cost creep
C Low $78 S3 Standard ($46 for 2TB) + Glacier IR ($32 for 8TB) + Metadata ($0.50) Instant access both tiers; serverless; 83% archive savings Requires metadata design discipline
D High $274 S3 ($46) + Deep Archive ($8) + RDS ($120) + Retrieval ($100/month avg) Lowest storage cost 12-hour retrieval; RDS operational burden; retrieval fees unpredictable

Cost assumptions: S3 Standard $0.023/GB, Glacier IR $0.004/GB, Deep Archive $0.00099/GB, RDS db.t3.small $0.017/hr, retrieval at 10% of archive/month for D.

Real-World Practitioner Insight
#

Exam Rule
#

“For the SAA-C03 exam, when you see ‘cost-effective’ + ‘instant access for recent data’ + ‘acceptable delay for old data’, choose S3 Standard → Glacier Instant Retrieval with metadata-driven queries. Avoid Deep Archive unless 12+ hour retrieval is explicitly acceptable.”

Real World
#

In production, we’d likely enhance Option C with:

  • AWS Glue Data Catalog for metadata instead of raw S3 objects (enables schema evolution, better governance)
  • DynamoDB for hot metadata (sub-10ms queries vs. S3’s 100-200ms for frequent searches)
  • CloudFront with S3 origin for frequently accessed recent recordings
  • S3 Batch Operations for backfill scenarios (e.g., re-tagging based on compliance changes)
  • EventBridge + Step Functions to handle edge cases like “urgent legal hold” requiring Deep Archive expedited retrieval

Additionally, consider S3 Intelligent-Tiering Archive Instant Access tier (released 2021) if access patterns are truly unpredictable—it auto-moves objects to Glacier IR equivalent after 90 days of no access, without lifecycle policy management.

Weekly AWS SAA-C03 Drills: Think Like a CTO

Get 3-5 high-frequency scenarios every week. No brain-dumping, just pure architectural trade-offs.