Product Quantization: Compressing Vectors
Master the techniques for handling billion-scale vector datasets. This section explains how Product Quantization (PQ) can reduce memory usage by 100x while maintaining retrieval performance.
Learning Goals
- Define Product Quantization (PQ) and its role in vector compression.
- Understand the process of sub-vector decomposition and codebook creation.
- Explain Asymmetric Distance Computation (ADC) mechanics.
The RAM Bottleneck in Production
Storing raw vectors is expensive. A single 1536-dimensional vector (OpenAI) takes ~6 KB. For a dataset of 100 million vectors, you need 600 GB of high-speed RAM—costing thousands of dollars per month in cloud infrastructure.
Product Quantization (PQ) is the lossy compression technique that solves this. It reduces the memory footprint by 10x to 100x by replacing full-precision floats with byte-sized indices.
Product Quantization in 5 Minutes
Asymmetric Distance Computation (ADC)
The magic of PQ is that it doesn't need to "decompress" the data to search it. Using ADC, the system keeps the user's query in full precision but compares it to the "Codebook" (the average patterns) of the compressed data.
| Method | Data Storage | Computation | Memory Usage |
|---|---|---|---|
| Flat Index | Full Floats (32-bit) | Precise | 100% |
| IVF-PQ | Byte Indices (8-bit) | Approximate | 1% - 10% |
The PQ Compression Workflow
- 1Step 1
The system runs a k-means clustering algorithm on a sample of your data to find the most common 'patterns' (centroids) for every sub-vector.
- 2Step 2
Each incoming vector is chopped into sub-vectors (e.g., 1536 chopped into 96 pieces of 16 dimensions each).
- 3Step 3
Each piece is replaced by the index of the centroid it most closely matches. A huge float array becomes a tiny array of integers.
- 4Step 4
At query time, the system pre-calculates the distance from the query to all possible centroids and stores them in a fast lookup table for instant retrieval.
Knowledge Check
What is the primary engineering benefit of Product Quantization?