I’m a big fan of Kafka. One of the coolest ideas Kafka uses is that while random
access to a disk is slow, sequential I/O has high throughput. It’s OK to have a
queue / stream on disk.
- Design
- Topic: A stream of messages of a certain type
- Brokers: Set of servers responsible for a topic
- Store messages in segment files in append only fashion
- Each files is around 1GB
- Only flush segments after a good amount of them have been published or
after a certain amount of time
- Consumers can only see a message after it is flushed
- Do not have their own cache, rely on FS cache
- Uses
sendfile
API to limit the number of context switches into kernel
- Partition: A topic is sharded across brokers, each called a partition
- Balances load
- A consumer is guaranteed to see messages in order for a given partition
- Smallest unit of parallelism
- Offset: The distance in bytes from the beginning of the log
- Consumers do not request messages by IDs but by offset
- Client libraries keep track of their offset
- Brokers do not need to keep track of the offset
- Makes replay of the log really easy
- Very popular for ETL workloads
- We made good use of this at Twitter
- Client library may only expose one message at time to the client
application but it works with the broker to ensure data is buffered
- Broker deletes data after a configured time
- Consumer Group: A set of consumers that receive each message at least
once
- In other words, only one consumer in the group receives each message
- A partition can only be subscribed to by one consumer in each group
- Coordination
- Topic is often over partitioned to allow for more consumers than brokers
- Consumers keep track of their offsets in Zookeeper
- Brokers use Zookeeper to detect changes in the system
- Misc:
- At-least-once semantics
- Some clients use unique IDs if it matters
- Use a CRC for each message
- The paper makes it sound like it transparently drops messages with a bad
CRC
- I hope there is some kind of logging for this!
- In the future they want to add replication (I believe this is in there now)
- In their use, the producers periodically write a monitoring record which
includes the count of messages they sent
- Consumers then verify this count against what they actually received
- I thought this was a great low-overhead way to verify the system
- They have a custom Hadoop input format that reads from Kafka
- With the ability to tell the broker where you want to start reading from,
the tasks can easily restart when they fail
- They have a schema registry service for their Avro serialized messages