Skip to main content
  1. Home
  2. >
  3. AWS
  4. >
  5. SAP-C02
  6. >
  7. This article

AWS SAP-C02 Drill: Archive Storage with Private Access - The Storage Tier vs. Retrieval Trade-off

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | Multi-Cloud Architect & Strategist.
Jeff's Architecture Insights
Go beyond static exam dumps. Jeff’s Insights is engineered to cultivate the mindset of a Production-Ready Architect. We move past ‘correct answers’ to dissect the strategic trade-offs and multi-cloud patterns required to balance reliability, security, and TCO in mission-critical environments.

While preparing for the AWS SAP-C02, many candidates confuse S3 storage class selection as purely a cost exercise. In the real world, this is fundamentally a decision about Acceptable Retrieval Latency vs. Storage Cost vs. Durability Requirements. A single miscalculation can result in either overspending by 400% or violating SLAs. Let’s drill into a simulated scenario.

The Scenario
#

GlobalLegal Partners, a multinational law firm, needs to implement a digital archive system for completed case files spanning the past 20 years. These documents serve as regulatory compliance backups (original records exist in certified physical vaults). The archive must be accessible only to internal legal researchers connecting through the firm’s existing AWS Client VPN infrastructure attached to their production VPC.

Key Constraints:

  • Documents are duplicate copies (originals exist elsewhere)
  • Access frequency: 5-10 requests per month across the entire archive
  • Retrieval speed: Not a priority (researchers can wait hours if needed)
  • Security: Zero public internet exposure permitted
  • Budget directive: Minimize storage costs as primary objective

Key Requirements
#

Design the most cost-effective archive solution that maintains private network access and meets compliance requirements for backup retention.

The Options
#

  • A) Create an S3 bucket with default storage class set to S3 One Zone-IA. Enable S3 static website hosting. Deploy an S3 Interface VPC Endpoint and configure bucket policy to restrict access exclusively through the endpoint.

  • B) Launch EC2 instances running Apache web servers. Attach Amazon EFS with EFS One Zone-IA storage class for document storage. Configure security groups to permit traffic only from the VPN CIDR range.

  • C) Launch EC2 instances running Nginx web servers. Attach Amazon EBS volumes using the sc1 (Cold HDD) volume type for document storage. Configure security groups to permit traffic only from the VPN CIDR range.

  • D) Create an S3 bucket with default storage class set to S3 Glacier Deep Archive. Enable S3 static website hosting. Deploy an S3 Interface VPC Endpoint and configure bucket policy to restrict access exclusively through the endpoint.

Correct Answer
#

Option D.


The Architect’s Analysis
#

Correct Answer
#

Option D — S3 bucket with Glacier Deep Archive + VPC Interface Endpoint.

Step-by-Step Winning Logic
#

This solution represents the optimal storage cost vs. retrieval requirements trade-off:

  1. Cost Optimization (Primary Goal): Glacier Deep Archive at $0.00099/GB/month is the cheapest S3 storage class, perfectly aligned with “minimize cost” and “infrequent access” requirements.

  2. Retrieval Latency Acceptable: The scenario explicitly states “availability and retrieval speed are not priorities.” Glacier Deep Archive’s 12-hour standard retrieval meets this requirement—researchers waiting hours is acceptable.

  3. Data Redundancy Justification: Documents are duplicate copies (originals exist on physical media), making the lower durability SLA of single-zone storage classes unnecessary risk when standard Glacier provides 11 nines durability.

  4. Private Access Enforcement: S3 Interface VPC Endpoint + bucket policy restriction ensures zero internet exposure, meeting security requirements without managing EC2 infrastructure.

  5. Serverless Architecture: No EC2 management overhead, patching, or scaling concerns.

The Traps (Distractor Analysis)
#

  • Why not Option A (S3 One Zone-IA)?

    • Cost Trap: One Zone-IA costs ~$0.01/GB/month (10x more expensive than Glacier Deep Archive)
    • Wrong Access Pattern: Designed for data accessed monthly, not the 5-10 times/month across entire archive pattern
    • S3 Website Hosting Incompatibility: S3 static website hosting does not work with VPC endpoints—this creates a technical impossibility in the design
    • For 100TB archive: $1,000/month vs. $99/month = $10,800/year wasted
  • Why not Option B (EFS One Zone-IA)?

    • EC2 Operational Overhead: Requires managing web server instances, patching, monitoring, auto-scaling
    • Cost Structure: EFS One Zone-IA at $0.0133/GB/month + EC2 instance costs (minimum t3.medium for web serving = ~$30/month) = 15x+ more expensive
    • Over-Engineering: EFS provides concurrent file system access—unnecessary for archival document retrieval
    • Single AZ Risk: Combines single-AZ storage with single-AZ compute (no mention of multi-AZ deployment)
  • Why not Option C (EBS sc1 Cold HDD)?

    • Highest Cost Option: sc1 at $0.015/GB/month + EC2 + EBS snapshot costs for durability
    • Scalability Nightmare: EBS volumes have 16TB limits—requires volume management and aggregation for large archives
    • Availability Risk: Tightly coupled to EC2 instance lifecycle; instance failure = data unavailable
    • Snapshot Tax: Achieving archive-level durability requires regular EBS snapshots to S3 (additional cost and complexity)

The Architect Blueprint
#

graph TB subgraph "Corporate Network" A[Legal Researcher<br/>Workstation] end subgraph "AWS VPC - Private Subnets" B[AWS Client VPN<br/>Endpoint] C[VPC Interface Endpoint<br/>com.amazonaws.region.s3] end subgraph "AWS S3 Service" D[(S3 Bucket<br/>Glacier Deep Archive<br/>Storage Class)] end A -->|TLS 1.3 over VPN| B B -->|Private IP| C C -->|PrivateLink<br/>No Internet Gateway| D D -.->|Bucket Policy| E[Condition: aws:sourceVpce<br/>= vpce-xxxxx] style D fill:#1e3a8a,stroke:#3b82f6,color:#fff style C fill:#059669,stroke:#10b981,color:#fff style E fill:#dc2626,stroke:#ef4444,color:#fff

Diagram Note: All traffic flows through private networking—VPN to VPC Interface Endpoint to S3 via AWS PrivateLink, with bucket policy enforcing endpoint-only access to prevent accidental public exposure.

The Decision Matrix
#

Option Storage Cost (100TB Archive) Retrieval Cost (10 requests/month) Operational Complexity Durability Cons
A - S3 One Zone-IA $1,000/mo $0.10/GB retrieved (~$100/mo for 1TB) ⭐ Low (Serverless) 99.5% (Single AZ) ❌ 10x storage cost; ❌ S3 website incompatible with VPC endpoints; ❌ Wrong tier for archival
B - EFS One Zone-IA $1,330/mo + EC2 ($60/mo) Included in storage cost ⭐⭐⭐ High (EC2 mgmt) 99.999999999% ❌ 15x total cost; ❌ Operational overhead; ❌ Over-engineered for use case
C - EBS sc1 $1,500/mo + EC2 ($60/mo) + Snapshots ($200/mo) Included ⭐⭐⭐⭐ Very High Depends on snapshot frequency ❌ Highest cost; ❌ 16TB volume limits; ❌ Instance-coupled availability
D - Glacier Deep Archive $99/mo $0.02/GB + $0.0025/request (~$20/mo) ⭐ Low (Serverless) 99.999999999% ⚠️ 12-hour retrieval (acceptable per requirements)

FinOps Impact: Over 3 years, Option D saves $32,436 compared to Option A, $50,868 vs. Option B, and $63,396 vs. Option C—funding equivalent to 2 additional junior architect salaries.

Real-World Practitioner Insight
#

Exam Rule
#

For SAP-C02, when you see “archival” + “infrequent access” + “cost optimization” + “private network”, immediately evaluate:

  1. Glacier Deep Archive for lowest storage cost (if retrieval time unconstrained)
  2. VPC Interface Endpoints for S3 private access (not Gateway Endpoints for this pattern)
  3. Reject EC2-based solutions unless compute processing is required

Real World
#

In production deployments, I would layer on these enterprise considerations:

  1. Lifecycle Policies: Implement intelligent tiering or lifecycle transitions—not all documents may be equally “archival.” Example: Transition to Glacier Deep Archive after 90 days of zero access.

  2. Retrieval Strategy: Configure Glacier Instant Retrieval for the 5% most-accessed documents (identified via S3 Storage Lens analytics) to balance cost with user experience.

  3. Search Indexing: Deploy AWS Kendra or OpenSearch indexing (metadata only) to enable document discovery without retrieving from Glacier—users confirm the right document before paying retrieval costs.

  4. Cross-Region Replication: For mission-critical compliance archives, enable CRR to a second region’s Glacier Deep Archive (adds ~$0.02/GB one-time transfer + duplicate storage cost) to protect against regional disasters.

  5. Access Logging: Enable S3 Server Access Logging to CloudWatch for compliance audit trails—legal industry regulators often require proof of “who accessed what, when.”

  6. Cost Anomaly Detection: Set up AWS Cost Anomaly Detection alerts—if retrieval requests spike unexpectedly, it may indicate a misconfigured application pulling entire archives.

The Scenario Gap: The question doesn’t mention document format. In reality, we’d compress archives using ZSTD or Brotli (20-40% size reduction) before uploading to Glacier Deep Archive, multiplying savings further.

Mastering AWS Solutions Architect Professional (SAP-C02)

Advanced architectural patterns, multi-account governance, and complex migrations.