Amazon's Microservices-to-Monolith Epic: How Prime Video Cut Costs by 90%

Amazon's Microservices-to-Monolith Epic: How Prime Video Cut Costs by 90%

Verified Sources
Jun 21, 2026

In 2023, the Amazon Prime Video team published a case study that sent shockwaves through the software architecture world. After years of building with microservices, they restructured a critical monitoring service into a modular monolith — and slashed infrastructure costs by over 90%.

This wasn't a small experiment. The Prime Video team monitors the quality of thousands of live streams in real time, detecting issues like block corruption, video freeze, and audio-video synchronization problems. Their story challenges the industry dogma that microservices are always the right default and offers a masterclass in pragmatic architecture decisions.

The core insight: microservices and serverless components are tools, not religion. Choosing the right abstraction for the right workload matters more than following trends.

Footnotes

  1. Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% - Amazon Prime Video Tech Blog, March 2023.

  2. Even Amazon can't make sense of serverless or microservices - David Heinemeier Hansson (DHH), May 2023.

Prime Video Swaps Microservices for Monolith: 90% Cost Reduction

The Original Serverless Architecture

Prime Video's initial Video Quality Analysis (VQA) system was built as a distributed system using serverless components. This was a deliberate choice — serverless allowed the team to build the service quickly, and in theory, scale each component independently.

The architecture had three major components:

  1. Media Converter — Extracted frames and audio from incoming video streams
  2. Defect Detector — Analyzed the extracted data for quality issues using ML
  3. Real-time Notification — Alerted teams when defects were found

The orchestration between these components was managed by AWS Step Functions, with individual processing steps running as AWS Lambda functions. Intermediate data (video frames, audio snippets) was passed between services by writing to and reading from Amazon S3 buckets.

At first, this architecture worked well. But as they scaled to monitor more streams simultaneously, they encountered crippling bottlenecks.

Footnotes

  1. Prime Video Microservices - System Design Newsletter - Neo Kim, detailed architecture breakdown.

  2. Amazon Prime Video Monitoring Service - ByteByteGo - Architecture diagrams and cost analysis.

The Serverless Trap

Serverless is excellent for prototyping and low-frequency workloads. But high-throughput, data-heavy pipelines with constant inter-service communication can make serverless prohibitively expensive. Each Lambda invocation, Step Function state transition, and S3 read/write has a per-use cost that compounds dramatically at scale.

What Went Wrong: The Two Expensive Operations

The Prime Video team identified two primary cost drivers that made the distributed architecture unsustainable:

1. Orchestration Overhead (AWS Step Functions)

AWS Step Functions charges per state transition. The VQA service performed multiple state transitions for every second of every stream being monitored. At high scale — thousands of concurrent streams — this generated an enormous number of billable transitions. Worse, AWS imposes account-level thresholds on the number of state transitions, which became a hard scaling bottleneck. The team hit this limit at only ~5% of expected load.

2. Data Transfer Between Distributed Components

Because the media converter and defect detector ran as separate services, intermediate data (video frames, audio buffers) had to be stored in Amazon S3 so downstream services could retrieve it. This created two problems:

  • High S3 request costs: Every write and read to the temporary S3 bucket was billed, and with thousands of frames per second across thousands of streams, these costs exploded.
  • High latency: Writing to and reading from S3 added significant latency compared to keeping data in process memory.
Cost FactorMechanismImpact at Scale
Step Functions State TransitionsBilled per transition, triggered every second per streamHard scaling limit at 5% load; exponential cost growth
S3 Reads/WritesBilled per request + data transferThousands of redundant I/O operations per second
Lambda InvocationsBilled per invocation + compute timeConstant hot-path invocations with cold-start risk
Data SerializationNetwork serialization/deserialization overheadCPU waste and latency on every inter-service hop

Footnotes

  1. Prime Video Microservices - System Design Newsletter - Neo Kim, detailed architecture breakdown. 2

How Prime Video Migrated to a Monolith Architecture

  1. 1
    Step 1

    The team profiled their system and found that orchestration (Step Functions) and data passing (S3 intermediate storage) were the two dominant cost centers. They hit a hard scaling limit at just 5% of expected load, making it clear that incremental fixes wouldn't suffice.

  2. 2
    Step 2

    Rather than patching individual services, the team decided to consolidate the entire monitoring workflow — media conversion, defect detection, and result aggregation — into a single process running on Amazon EC2 and Amazon ECS. This eliminated inter-service network calls entirely.

  3. 3
    Step 3

    Instead of writing intermediate video frames and audio buffers to S3 between service hops, all data now stays within the same process memory. The media converter passes frames directly to the defect detector without any network hop, serialization, or storage I/O.

  4. 4
    Step 4

    Orchestration logic that previously required AWS Step Functions (with per-transition billing) was replaced by simple in-process function calls. This eliminated both the cost of state transitions and the hard account limits that were throttling scalability.

  5. 5
    Step 5

    The unified application was deployed on Amazon EC2 instances orchestrated by Amazon ECS (Elastic Container Service). The high-level architecture — three logical components — remained the same, allowing significant code reuse and rapid migration.

  6. 6
    Step 6

    To scale, the team clones the monolith service with different detector configurations. Multiple instances run in parallel, each handling a subset of streams. This is horizontal scaling without the microservices overhead — each instance is self-contained.

  7. 7
    Step 7

    By moving from per-invocation serverless pricing to EC2-based deployment, the team gained access to EC2 Compute Savings Plans, driving costs down even further. They could now reserve capacity and predictable compute costs.

The New Architecture: A Modular Monolith

The redesigned system preserved the same three logical components (media converter, defect detector, notifications) but deployed them within a single process. This is the key distinction: it is a modular monolith, not a "big ball of mud."

The critical design principle: deployment architecture changed, but logical architecture was preserved. The code was already organized into clean internal modules, so the team could reuse the vast majority of their code during migration.

Footnotes

  1. Amazon Prime Video Monitoring Service - ByteByteGo - Architecture diagrams and cost analysis. 2

Cost Comparison: Microservices vs. Monolith (Relative Scale)

Approximate relative infrastructure costs for Prime Video's monitoring service

Important Nuance: It Was One Service, Not All of Prime Video

Amazon did not move all of Prime Video to a monolith. Only the audio/video monitoring service was restructured. The rest of Prime Video's architecture — video delivery, user management, recommendations, payment — still runs as a constellation of independent services. The lesson is about right-sizing your architecture per service, not a blanket rejection of microservices.

The Results: By the Numbers

The outcome of the migration was dramatic and well-documented in Amazon's own publication:

MetricBefore (Serverless/Microservices)After (Monolith)
Infrastructure CostBaseline (100%)~10% (90% reduction)
Scaling CapabilityHard limit at ~5% of expected loadThousands of streams with headroom
LatencyHigh (S3 round-trips + network hops)Significantly lower (in-process calls)
Debugging ComplexityDistributed tracing across many servicesSingle-process debugging
Development SpeedSlow (changes across many services)Faster (unified codebase)
Code ReuseN/AHigh — same logical components reused

"Moving our service to a monolith reduced our infrastructure cost by over 90%. It also increased our scaling capabilities. Today, we're able to handle thousands of streams and we still have capacity to scale the service even further." — Amazon Prime Video Tech Blog

Footnotes

  1. Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% - Amazon Prime Video Tech Blog, March 2023. 2

The Rise, Fall, and Return of the Monolith

Amazon Pioneers Microservices

Early 2000s

Amazon decomposes its monolithic retail application into service-oriented architecture (SOA), laying the groundwork for what became the microservices movement. Jeff Bezos's famous 'two-pizza team' mandate drives organizational structure toward small, independent services."

Microservices Go Mainstream

2014–2019

Netflix, Uber, and other tech giants popularize microservices. Serverless platforms like AWS Lambda and AWS Step Functions launch. The industry embraces 'microservices by default' as the modern architecture."

Prime Video Builds VQA Service

2020–2022

The Prime Video team builds their audio/video quality monitoring service using serverless microservices (Lambda + Step Functions + S3). The architecture is quick to build and initially works well."

Scaling Crisis at 5% Load

2022

As the team scales to handle more streams, they hit a hard AWS account limit on Step Functions state transitions at just 5% of expected load. Costs from S3 I/O and Lambda invocations become unsustainable."

The Blog Post That Broke the Internet

March 2023

Amazon publishes 'Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%.' The post goes viral, sparking intense debate about microservices vs. monoliths across the tech industry."

DHH Weighs In

May 2023

Ruby on Rails creator David Heinemeier Hansson writes 'Even Amazon can't make sense of serverless or microservices,' arguing the case validates the pragmatic monolith approach."

The Pragmatic Architecture Era

2023–Present

The industry increasingly adopts a case-by-case approach. Modular monoliths gain legitimacy. The 'right tool for the job' philosophy replaces rigid dogma about architectural patterns."

Why This Worked: The Deeper Analysis

The Prime Video case study isn't simply "microservices bad, monolith good." Understanding why the monolith succeeded here requires analyzing the specific characteristics of this workload:

Tightly Coupled Data Flow

The VQA pipeline is a linear, sequential data flow: media conversion → defect detection → notification. The output of one stage is the direct input to the next, with no branching, no independent scaling requirements, and no separate consumer patterns. Passing large binary data (video frames) over a network between microservices for this type of pipeline is particularly wasteful.

High-Throughput, High-Frequency Processing

The system processes data for every second of every stream — creating a constant, high-frequency hot path. Serverless pricing models (per-invocation, per-transition, per-request) are optimized for sporadic, event-driven workloads, not for continuous, high-volume streaming pipelines. The network tax of the distributed architecture was unsustainable.

Modularity Was Already Preserved

The team had organized their code into clean, well-separated logical components from the start. This made the migration relatively straightforward — they didn't need to untangle a messy codebase. They simply changed the deployment topology from separate services to a single process while keeping the internal module boundaries intact.

Total Costmicroservices=Ccompute+Corchestration+Cdata transfer+Cserialization+Coperational overhead\text{Total Cost}_{\text{microservices}} = C_{\text{compute}} + C_{\text{orchestration}} + C_{\text{data transfer}} + C_{\text{serialization}} + C_{\text{operational overhead}} Total CostmonolithCcompute+CEC2 instance\text{Total Cost}_{\text{monolith}} \approx C_{\text{compute}} + C_{\text{EC2 instance}}

The monolith eliminates the middle terms — orchestration, data transfer, and serialization costs — when the workload is a tightly-coupled pipeline running on a single machine.

Footnotes

  1. Amazon Prime Video Monitoring Service - ByteByteGo - Architecture diagrams and cost analysis. 2

  2. Even Amazon can't make sense of serverless or microservices - David Heinemeier Hansson (DHH), May 2023.

Deep Dives & Edge Cases

Best For:

  • Large, independent teams
  • Complex domain boundaries
  • Independently scalable components
  • Polyglot tech stacks
  • Frequently changing subsystems

Trade-offs:

  • Network latency & serialization overhead
  • Distributed debugging complexity
  • Higher infrastructure costs
  • Operational overhead (service mesh, observability)
  • Eventual consistency challenges

Architecture Decision Framework

Rather than choosing an architecture based on hype, use a structured decision framework:

Key Decision Criteria

FactorFavors MonolithFavors Microservices
Team StructureSingle teamMultiple independent teams
Data CouplingTight (pipeline, sequential)Loose (event-driven, async)
Scaling NeedsUniform across all componentsDifferent per component
Latency SensitivityHigh (sub-millisecond matters)Tolerant (eventual is fine)
Data Volume Between StepsLarge (frames, buffers, files)Small (IDs, commands, events)
Deployment FrequencySame cadence for all componentsVery different cadences
Organizational ComplexitySimpleComplex (Conway's Law applies)

Pro Tip: Design for Flexibility

The Prime Video team succeeded because they had clean internal module boundaries from the start. This let them change deployment topology without rewriting logic. Always design with logical separation even inside a monolith — you may need to extract a service later, and clean boundaries make that orders of magnitude easier.

Anti-Pattern: Microservices by Default

content: "As Kelsey Hightower warned: 'We're gonna break the monolith up and somehow find the engineering discipline we never had in the first place... Now you went from writing bad code to building bad infrastructure.' Microservices do not fix poor modular design — they amplify it. If you can't write clean modules in a monolith, you can't write clean services in a distributed system. "

Footnotes

  1. Reduce costs by 90% by moving from microservices to monolith: Amazon internal case study raises eyebrows - DevClass, May 2023.

Key Concepts: Prime Video Architecture Case Study

1 / 6
17%
Question · Term

What is a modular monolith?

Click to reveal
Answer · Definition

A single deployable application with well-organized, logically separated internal modules. It preserves clean boundaries for future extraction while avoiding the overhead of networked communication.

Knowledge Check

Question 1 of 5
Q1Single choice

What were the two most expensive operations in Prime Video's original serverless microservices architecture?

Explore Related Topics

1

Netflix Intentionally Breaks Production

2

Microservices Architecture: Design Principles, Patterns, and Best Practices

Microservices architecture breaks applications into independent, domain‑focused services, offering scalability, agility, and fault isolation compared with monolithic designs.

  • Microservices use bounded contexts, loose coupling, and high cohesion to enable polyglot, independently deployable services.
  • Key patterns include the API Gateway for unified entry, Database‑per‑Service for data ownership, and the Strangler Fig for incremental migration.
  • Avoid “distributed monoliths” by fully decoupling databases and eliminating synchronous chains.
  • Challenges such as cross‑service transactions, service discovery, and debugging are addressed with the Saga pattern, discovery registries, and distributed tracing.
  • The “smart endpoints, dumb pipes” principle keeps business logic inside services, not in the communication layer.
3

AWS vs Azure: A Comprehensive Cloud Platform Comparison

The course contrasts AWS and Azure on market share, services, pricing, hybrid capabilities, and AI/ML to guide platform choice.

  • AWS holds 30%30\% market share with 28.8B28.8B quarterly revenue; Azure 21%21\% with 25.5B25.5B and is the fastest‑growing.
  • Both provide 200+ services; Azure offers tighter Microsoft integration and hybrid tools, while AWS gives more instance types and custom AI silicon.
  • Azure’s hybrid edge (Arc, Stack) and 80%80\% licensing savings contrast AWS’s broader AI/ML options (SageMaker, Trainium) and lower compute pricing.