Performance Metrics: Latency vs Throughput

To measure and optimize a system, we need concrete metrics. The two most important performance metrics in system design are Latency and Throughput.

1. Latency

Latency is the time it takes for a single request to be processed and for the response to be received. It is a measure of duration.

Measurement: Usually measured in milliseconds (ms) or microseconds (µs).
Goal: Minimize latency for a better user experience.
Example: The time between clicking "Search" and seeing the results.

Tail Latency (Percentiles)

Average latency can be misleading because it doesn't show the experience of the "unlucky" users. We use percentiles:

p50 (Median): 50% of requests are faster than this.
p99: 99% of requests are faster than this. Only 1 in 100 users experiences worse latency.
p99.9: Only 1 in 1000 users experiences worse latency. This is critical for large-scale systems.

2. Throughput

Throughput is the number of requests a system can handle in a given unit of time. It is a measure of capacity.

Measurement: Usually measured in Requests Per Second (RPS) or Queries Per Second (QPS).
Goal: Maximize throughput to handle more users with fewer resources.
Example: A web server handling 5,000 requests per second.

The Latency-Throughput Trade-off

There is often a trade-off between the two. For example, if you batch requests together to increase throughput, individual requests might have to wait longer, increasing latency.

Metric	Focus	Question it Answers
Latency	Speed	How long does it take?
Throughput	Volume	How many can we do at once?

Which metric describes the time it takes for a single operation to complete?

1. Latency

Tail Latency (Percentiles)

2. Throughput

The Latency-Throughput Trade-off

Knowledge Check