Implementation on Apache Kafka

Implementation on Apache Kafkahttps://kafka.apache.org/32/implementation/Recent content in Implementation on Apache KafkaHugo -- gohugo.ioenNetwork Layerhttps://kafka.apache.org/32/implementation/network-layer/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/32/implementation/network-layer/The network layer is a fairly straight-forward NIO server, and will not be described in great detail. The sendfile implementation is done by giving the MessageSet interface a writeTo method. This allows the file-backed message set to use the more efficient transferTo implementation instead of an in-process buffered write. The threading model is a single acceptor thread and N processor threads which handle a fixed number of connections each. This design has been pretty thoroughly tested elsewhere and found to be simple to implement and fast.Messageshttps://kafka.apache.org/32/implementation/messages/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/32/implementation/messages/Messages consist of a variable-length header, a variable-length opaque key byte array and a variable-length opaque value byte array. The format of the header is described in the following section. Leaving the key and value opaque is the right decision: there is a great deal of progress being made on serialization libraries right now, and any particular choice is unlikely to be right for all uses. Needless to say a particular application using Kafka would likely mandate a particular serialization type as part of its usage.Message Formathttps://kafka.apache.org/32/implementation/message-format/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/32/implementation/message-format/Messages (aka Records) are always written in batches. The technical term for a batch of messages is a record batch, and a record batch contains one or more records. In the degenerate case, we could have a record batch containing a single record. Record batches and records have their own headers. The format of each is described below. Record Batch The following is the on-disk format of a RecordBatch.Loghttps://kafka.apache.org/32/implementation/log/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/32/implementation/log/A log for a topic named “my_topic” with two partitions consists of two directories (namely my_topic_0 and my_topic_1) populated with data files containing the messages for that topic. The format of the log files is a sequence of “log entries”"; each log entry is a 4 byte integer N storing the message length which is followed by the N message bytes. Each message is uniquely identified by a 64-bit integer offset giving the byte position of the start of this message in the stream of all messages ever sent to that topic on that partition.Distributionhttps://kafka.apache.org/32/implementation/distribution/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/32/implementation/distribution/Consumer Offset Tracking Kafka consumer tracks the maximum offset it has consumed in each partition and has the capability to commit offsets so that it can resume from those offsets in the event of a restart. Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the group coordinator. i.e., any consumer instance in that consumer group should send its offset commits and fetches to that group coordinator (broker).