Coursify

System Design for Software Engineers

Performance Metrics: Latency vs Throughput

To measure and optimize a system, we need concrete metrics. The two most important performance metrics in system design are Latency and Throughput.

1. Latency

Latency is the time it takes for a single request to be processed and for the response to be received. It is a measure of duration.

  • Measurement: Usually measured in milliseconds (ms) or microseconds (µs).
  • Goal: Minimize latency for a better user experience.
  • Example: The time between clicking "Search" and seeing the results.

Tail Latency (Percentiles)

Average latency can be misleading because it doesn't show the experience of the "unlucky" users. We use percentiles:

  • p50 (Median): 50% of requests are faster than this.
  • p99: 99% of requests are faster than this. Only 1 in 100 users experiences worse latency.
  • p99.9: Only 1 in 1000 users experiences worse latency. This is critical for large-scale systems.

2. Throughput

Throughput is the number of requests a system can handle in a given unit of time. It is a measure of capacity.

  • Measurement: Usually measured in Requests Per Second (RPS) or Queries Per Second (QPS).
  • Goal: Maximize throughput to handle more users with fewer resources.
  • Example: A web server handling 5,000 requests per second.

The Latency-Throughput Trade-off

There is often a trade-off between the two. For example, if you batch requests together to increase throughput, individual requests might have to wait longer, increasing latency.

MetricFocusQuestion it Answers
LatencySpeedHow long does it take?
ThroughputVolumeHow many can we do at once?

Knowledge Check

Question 1 of 3
Q1Single choice

Which metric describes the time it takes for a single operation to complete?

Performance Metrics: Latency vs Throughput | System Design for Software Engineers | Coursify