Amazon's Microservices-to-Monolith Epic: How Prime Video Cut Costs by 90%
In 2023, the Amazon Prime Video team published a case study that sent shockwaves through the software architecture world. After years of building with microservices, they restructured a critical monitoring service into a modular monolith — and slashed infrastructure costs by over 90%.
This wasn't a small experiment. The Prime Video team monitors the quality of thousands of live streams in real time, detecting issues like block corruption, video freeze, and audio-video synchronization problems. Their story challenges the industry dogma that microservices are always the right default and offers a masterclass in pragmatic architecture decisions.
The core insight: microservices and serverless components are tools, not religion. Choosing the right abstraction for the right workload matters more than following trends.
Footnotes
-
Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% - Amazon Prime Video Tech Blog, March 2023. ↩
-
Even Amazon can't make sense of serverless or microservices - David Heinemeier Hansson (DHH), May 2023. ↩
Prime Video Swaps Microservices for Monolith: 90% Cost Reduction
The Original Serverless Architecture
Prime Video's initial Video Quality Analysis (VQA) system was built as a distributed system using serverless components. This was a deliberate choice — serverless allowed the team to build the service quickly, and in theory, scale each component independently.
The architecture had three major components:
- Media Converter — Extracted frames and audio from incoming video streams
- Defect Detector — Analyzed the extracted data for quality issues using ML
- Real-time Notification — Alerted teams when defects were found
The orchestration between these components was managed by AWS Step Functions, with individual processing steps running as AWS Lambda functions. Intermediate data (video frames, audio snippets) was passed between services by writing to and reading from Amazon S3 buckets.
At first, this architecture worked well. But as they scaled to monitor more streams simultaneously, they encountered crippling bottlenecks.
Footnotes
-
Prime Video Microservices - System Design Newsletter - Neo Kim, detailed architecture breakdown. ↩
-
Amazon Prime Video Monitoring Service - ByteByteGo - Architecture diagrams and cost analysis. ↩
The Serverless Trap
Serverless is excellent for prototyping and low-frequency workloads. But high-throughput, data-heavy pipelines with constant inter-service communication can make serverless prohibitively expensive. Each Lambda invocation, Step Function state transition, and S3 read/write has a per-use cost that compounds dramatically at scale.
What Went Wrong: The Two Expensive Operations
The Prime Video team identified two primary cost drivers that made the distributed architecture unsustainable:
1. Orchestration Overhead (AWS Step Functions)
AWS Step Functions charges per state transition. The VQA service performed multiple state transitions for every second of every stream being monitored. At high scale — thousands of concurrent streams — this generated an enormous number of billable transitions. Worse, AWS imposes account-level thresholds on the number of state transitions, which became a hard scaling bottleneck. The team hit this limit at only ~5% of expected load.
2. Data Transfer Between Distributed Components
Because the media converter and defect detector ran as separate services, intermediate data (video frames, audio buffers) had to be stored in Amazon S3 so downstream services could retrieve it. This created two problems:
- High S3 request costs: Every write and read to the temporary S3 bucket was billed, and with thousands of frames per second across thousands of streams, these costs exploded.
- High latency: Writing to and reading from S3 added significant latency compared to keeping data in process memory.
| Cost Factor | Mechanism | Impact at Scale |
|---|---|---|
| Step Functions State Transitions | Billed per transition, triggered every second per stream | Hard scaling limit at 5% load; exponential cost growth |
| S3 Reads/Writes | Billed per request + data transfer | Thousands of redundant I/O operations per second |
| Lambda Invocations | Billed per invocation + compute time | Constant hot-path invocations with cold-start risk |
| Data Serialization | Network serialization/deserialization overhead | CPU waste and latency on every inter-service hop |
Footnotes
-
Prime Video Microservices - System Design Newsletter - Neo Kim, detailed architecture breakdown. ↩ ↩2
How Prime Video Migrated to a Monolith Architecture
- 1Step 1
The team profiled their system and found that orchestration (Step Functions) and data passing (S3 intermediate storage) were the two dominant cost centers. They hit a hard scaling limit at just 5% of expected load, making it clear that incremental fixes wouldn't suffice.
- 2Step 2
Rather than patching individual services, the team decided to consolidate the entire monitoring workflow — media conversion, defect detection, and result aggregation — into a single process running on Amazon EC2 and Amazon ECS. This eliminated inter-service network calls entirely.
- 3Step 3
Instead of writing intermediate video frames and audio buffers to S3 between service hops, all data now stays within the same process memory. The media converter passes frames directly to the defect detector without any network hop, serialization, or storage I/O.
- 4Step 4
Orchestration logic that previously required AWS Step Functions (with per-transition billing) was replaced by simple in-process function calls. This eliminated both the cost of state transitions and the hard account limits that were throttling scalability.
- 5Step 5
The unified application was deployed on Amazon EC2 instances orchestrated by Amazon ECS (Elastic Container Service). The high-level architecture — three logical components — remained the same, allowing significant code reuse and rapid migration.
- 6Step 6
To scale, the team clones the monolith service with different detector configurations. Multiple instances run in parallel, each handling a subset of streams. This is horizontal scaling without the microservices overhead — each instance is self-contained.
- 7Step 7
By moving from per-invocation serverless pricing to EC2-based deployment, the team gained access to EC2 Compute Savings Plans, driving costs down even further. They could now reserve capacity and predictable compute costs.
The New Architecture: A Modular Monolith
The redesigned system preserved the same three logical components (media converter, defect detector, notifications) but deployed them within a single process. This is the key distinction: it is a modular monolith, not a "big ball of mud."
The critical design principle: deployment architecture changed, but logical architecture was preserved. The code was already organized into clean internal modules, so the team could reuse the vast majority of their code during migration.
Footnotes
-
Amazon Prime Video Monitoring Service - ByteByteGo - Architecture diagrams and cost analysis. ↩ ↩2
Cost Comparison: Microservices vs. Monolith (Relative Scale)
Approximate relative infrastructure costs for Prime Video's monitoring service
Important Nuance: It Was One Service, Not All of Prime Video
Amazon did not move all of Prime Video to a monolith. Only the audio/video monitoring service was restructured. The rest of Prime Video's architecture — video delivery, user management, recommendations, payment — still runs as a constellation of independent services. The lesson is about right-sizing your architecture per service, not a blanket rejection of microservices.
The Results: By the Numbers
The outcome of the migration was dramatic and well-documented in Amazon's own publication:
| Metric | Before (Serverless/Microservices) | After (Monolith) |
|---|---|---|
| Infrastructure Cost | Baseline (100%) | ~10% (90% reduction) |
| Scaling Capability | Hard limit at ~5% of expected load | Thousands of streams with headroom |
| Latency | High (S3 round-trips + network hops) | Significantly lower (in-process calls) |
| Debugging Complexity | Distributed tracing across many services | Single-process debugging |
| Development Speed | Slow (changes across many services) | Faster (unified codebase) |
| Code Reuse | N/A | High — same logical components reused |
"Moving our service to a monolith reduced our infrastructure cost by over 90%. It also increased our scaling capabilities. Today, we're able to handle thousands of streams and we still have capacity to scale the service even further." — Amazon Prime Video Tech Blog
Footnotes
-
Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% - Amazon Prime Video Tech Blog, March 2023. ↩ ↩2
The Rise, Fall, and Return of the Monolith
Amazon Pioneers Microservices
Early 2000sAmazon decomposes its monolithic retail application into service-oriented architecture (SOA), laying the groundwork for what became the microservices movement. Jeff Bezos's famous 'two-pizza team' mandate drives organizational structure toward small, independent services."
Microservices Go Mainstream
2014–2019Netflix, Uber, and other tech giants popularize microservices. Serverless platforms like AWS Lambda and AWS Step Functions launch. The industry embraces 'microservices by default' as the modern architecture."
Prime Video Builds VQA Service
2020–2022The Prime Video team builds their audio/video quality monitoring service using serverless microservices (Lambda + Step Functions + S3). The architecture is quick to build and initially works well."
Scaling Crisis at 5% Load
2022As the team scales to handle more streams, they hit a hard AWS account limit on Step Functions state transitions at just 5% of expected load. Costs from S3 I/O and Lambda invocations become unsustainable."
The Blog Post That Broke the Internet
March 2023Amazon publishes 'Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%.' The post goes viral, sparking intense debate about microservices vs. monoliths across the tech industry."
DHH Weighs In
May 2023Ruby on Rails creator David Heinemeier Hansson writes 'Even Amazon can't make sense of serverless or microservices,' arguing the case validates the pragmatic monolith approach."
The Pragmatic Architecture Era
2023–PresentThe industry increasingly adopts a case-by-case approach. Modular monoliths gain legitimacy. The 'right tool for the job' philosophy replaces rigid dogma about architectural patterns."
Why This Worked: The Deeper Analysis
The Prime Video case study isn't simply "microservices bad, monolith good." Understanding why the monolith succeeded here requires analyzing the specific characteristics of this workload:
Tightly Coupled Data Flow
The VQA pipeline is a linear, sequential data flow: media conversion → defect detection → notification. The output of one stage is the direct input to the next, with no branching, no independent scaling requirements, and no separate consumer patterns. Passing large binary data (video frames) over a network between microservices for this type of pipeline is particularly wasteful.
High-Throughput, High-Frequency Processing
The system processes data for every second of every stream — creating a constant, high-frequency hot path. Serverless pricing models (per-invocation, per-transition, per-request) are optimized for sporadic, event-driven workloads, not for continuous, high-volume streaming pipelines. The network tax of the distributed architecture was unsustainable.
Modularity Was Already Preserved
The team had organized their code into clean, well-separated logical components from the start. This made the migration relatively straightforward — they didn't need to untangle a messy codebase. They simply changed the deployment topology from separate services to a single process while keeping the internal module boundaries intact.
The monolith eliminates the middle terms — orchestration, data transfer, and serialization costs — when the workload is a tightly-coupled pipeline running on a single machine.
Footnotes
-
Amazon Prime Video Monitoring Service - ByteByteGo - Architecture diagrams and cost analysis. ↩ ↩2
-
Even Amazon can't make sense of serverless or microservices - David Heinemeier Hansson (DHH), May 2023. ↩
Deep Dives & Edge Cases
Best For:
- Large, independent teams
- Complex domain boundaries
- Independently scalable components
- Polyglot tech stacks
- Frequently changing subsystems
Trade-offs:
- Network latency & serialization overhead
- Distributed debugging complexity
- Higher infrastructure costs
- Operational overhead (service mesh, observability)
- Eventual consistency challenges
Architecture Decision Framework
Rather than choosing an architecture based on hype, use a structured decision framework:
Key Decision Criteria
| Factor | Favors Monolith | Favors Microservices |
|---|---|---|
| Team Structure | Single team | Multiple independent teams |
| Data Coupling | Tight (pipeline, sequential) | Loose (event-driven, async) |
| Scaling Needs | Uniform across all components | Different per component |
| Latency Sensitivity | High (sub-millisecond matters) | Tolerant (eventual is fine) |
| Data Volume Between Steps | Large (frames, buffers, files) | Small (IDs, commands, events) |
| Deployment Frequency | Same cadence for all components | Very different cadences |
| Organizational Complexity | Simple | Complex (Conway's Law applies) |
Pro Tip: Design for Flexibility
The Prime Video team succeeded because they had clean internal module boundaries from the start. This let them change deployment topology without rewriting logic. Always design with logical separation even inside a monolith — you may need to extract a service later, and clean boundaries make that orders of magnitude easier.
Anti-Pattern: Microservices by Default
content: "As Kelsey Hightower warned: 'We're gonna break the monolith up and somehow find the engineering discipline we never had in the first place... Now you went from writing bad code to building bad infrastructure.' Microservices do not fix poor modular design — they amplify it. If you can't write clean modules in a monolith, you can't write clean services in a distributed system. "
Footnotes
Key Concepts: Prime Video Architecture Case Study
Knowledge Check
What were the two most expensive operations in Prime Video's original serverless microservices architecture?
Explore Related Topics
Netflix Intentionally Breaks Production
Microservices Architecture: Design Principles, Patterns, and Best Practices
Microservices architecture breaks applications into independent, domain‑focused services, offering scalability, agility, and fault isolation compared with monolithic designs.
- Microservices use bounded contexts, loose coupling, and high cohesion to enable polyglot, independently deployable services.
- Key patterns include the API Gateway for unified entry, Database‑per‑Service for data ownership, and the Strangler Fig for incremental migration.
- Avoid “distributed monoliths” by fully decoupling databases and eliminating synchronous chains.
- Challenges such as cross‑service transactions, service discovery, and debugging are addressed with the Saga pattern, discovery registries, and distributed tracing.
- The “smart endpoints, dumb pipes” principle keeps business logic inside services, not in the communication layer.
AWS vs Azure: A Comprehensive Cloud Platform Comparison
The course contrasts AWS and Azure on market share, services, pricing, hybrid capabilities, and AI/ML to guide platform choice.
- AWS holds market share with quarterly revenue; Azure with and is the fastest‑growing.
- Both provide 200+ services; Azure offers tighter Microsoft integration and hybrid tools, while AWS gives more instance types and custom AI silicon.
- Azure’s hybrid edge (Arc, Stack) and licensing savings contrast AWS’s broader AI/ML options (SageMaker, Trainium) and lower compute pricing.