Performance Metrics: Latency vs Throughput
To measure and optimize a system, we need concrete metrics. The two most important performance metrics in system design are Latency and Throughput.
1. Latency
Latency is the time it takes for a single request to be processed and for the response to be received. It is a measure of duration.
- Measurement: Usually measured in milliseconds (ms) or microseconds (µs).
- Goal: Minimize latency for a better user experience.
- Example: The time between clicking "Search" and seeing the results.
Tail Latency (Percentiles)
Average latency can be misleading because it doesn't show the experience of the "unlucky" users. We use percentiles:
- p50 (Median): 50% of requests are faster than this.
- p99: 99% of requests are faster than this. Only 1 in 100 users experiences worse latency.
- p99.9: Only 1 in 1000 users experiences worse latency. This is critical for large-scale systems.
2. Throughput
Throughput is the number of requests a system can handle in a given unit of time. It is a measure of capacity.
- Measurement: Usually measured in Requests Per Second (RPS) or Queries Per Second (QPS).
- Goal: Maximize throughput to handle more users with fewer resources.
- Example: A web server handling 5,000 requests per second.
The Latency-Throughput Trade-off
There is often a trade-off between the two. For example, if you batch requests together to increase throughput, individual requests might have to wait longer, increasing latency.
| Metric | Focus | Question it Answers |
|---|---|---|
| Latency | Speed | How long does it take? |
| Throughput | Volume | How many can we do at once? |
Knowledge Check
Which metric describes the time it takes for a single operation to complete?