Coursify

System Design for Software Engineers

Apache Kafka: Event Streaming at Scale

Apache Kafka: Event Streaming at Scale

While RabbitMQ and SQS are great for traditional message queuing, they struggle with high-throughput event streaming (millions of events per second) and data retention. Apache Kafka is a distributed event streaming platform designed for high throughput, fault tolerance, and "replayability."

The key difference: In a traditional queue, messages are deleted after they are consumed. In Kafka, messages (events) are stored in an Immutable Log and can be kept for days, weeks, or even forever.

How Kafka Works

  • Topic: A category or feed name to which records are published.
  • Partition: Topics are divided into partitions for scalability. Each partition is an ordered, immutable sequence of records.
  • Producer: Sends records to one or more Kafka topics.
  • Consumer: Reads records from one or more Kafka topics.
  • Consumer Group: A group of consumers that work together to consume data from a topic. Each partition is consumed by exactly one consumer in the group.
  • Offset: A unique identifier for each record within a partition. Consumers track their progress using this offset.

Why Kafka?

  1. High Throughput: Kafka can handle millions of messages per second by using sequential disk I/O and zero-copy data transfer.
  2. Persistence (Replayability): Because Kafka stores events on disk, you can "rewind" your consumers to re-process old data (e.g., if you find a bug in your analytics code).
  3. Scalability: You can scale a topic by simply adding more partitions.
  4. Fault Tolerance: Partitions are replicated across multiple brokers. If one broker fails, another takes over.

The Lifecycle of a Kafka Record

  1. 1
    Step 1

    The producer sends a message (Key: user_123, Value: click_event) to the web_logs topic. Kafka hashes the key to decide which Partition the message goes to (ensuring all events for user_123 go to the same partition).

  2. 2
    Step 2

    The broker receives the message and appends it to the end of the partition's log file on disk. It assigns the message a new Offset (e.g., 501).

  3. 3
    Step 3

    The leader broker for that partition sends the message to its followers. Once enough followers acknowledge, the message is considered 'committed'.

  4. 4
    Step 4

    A consumer in a consumer group asks Kafka for the next message in Partition 0. Kafka looks up the consumer's current Offset (e.g., 500) and returns message 501.

  5. 5
    Step 5

    After processing the message, the consumer tells Kafka: 'I have successfully finished up to offset 501'. Kafka stores this information so that if the consumer restarts, it knows exactly where to resume.

Kafka vs. Traditional MQ (RabbitMQ)

FeatureRabbitMQApache Kafka
ModelPush-basedPull-based
PersistenceMessages deleted after ACKMessages stored (Log-based)
ThroughputHighMassive (Scale of TBs/sec)
OrderingGuaranteed per queueGuaranteed per partition
Use CaseComplex routing, task queuesLogging, stream processing, real-time analytics

Common Mistakes

  • Too Few Partitions: Setting a topic to have only 1 partition. This means only 1 consumer can read from it at a time, limiting your horizontal scale.
  • Massive Messages: Trying to send 10MB messages through Kafka. Kafka is optimized for small messages (kilobytes). Use S3 for the large payload.
  • Ignoring Retention Policies: Setting your retention to 7 days but running out of disk space in 2 days because of unexpected traffic spikes.

Recap

  • Kafka is a Distributed Log, not a traditional queue.
  • Partitions are the unit of parallelism and scalability.
  • Offsets allow consumers to track their own progress and "replay" history.
  • Kafka is the backbone of modern Event-Driven Architectures.

Knowledge Check

Question 1 of 3
Q1Single choice

In Kafka, what happens to a message after it is successfully read by a consumer?