What is Apache Kafka?

Thousands of businesses are built on Kafka, and for good reasons. Kafka is a scalable, community distributed event streaming platform and publish-subscribe messaging system. Kafka is used to develop distributed applications and facilitate web-scale internet businesses like Twitter, LinkedIn etc. Being scalable, it is not only used by Internet unicorns, but also by slower-to-adopt, small-scale or large-scale traditional enterprises. This is because accessing an integrated data stream is essential to develop innovative and disrupting digital services. However, the traditional data source e.g transactional data that includes inventory, orders, shopping carts etc., is augmented by factors like page likes, clicks, searches and suggestions. This data is critically important to comprehend customers’ frictions and behaviors as it helps extract valuable insights via predictive analytics. This is where Kafka comes into play.  

Apache Kafka – The Need of Today

Apache Kafka can handle trillions of events per day. Principally considered as a messaging queue, it is based on the abstraction of a distributed commit log. Since its open-source launch in 2011 by LinkedIn, Kafka has evolved from a simple messaging queue to a complex, full-fledged event streaming platform. Apache Kafka was initially designed to solve the issue of low-latency assimilation of huge volumes of event data originating from the LinkedIn site and infrastructure, into a lambda architecture, harnessing real-time event processing frameworks and Hadoop. The prominent feature was the “real-time” processing. There was no such solution at that time for this kind of real-time applications access. 

On top of that, many companies want to develop complex machine-learning algorithms, which is only possible if data is available.  Obtaining data from sources, and reliably sharing it, was once very difficult, and batch-based enterprise solutions available at that time didn’t address this issue. In 2011, Kafka at LinkedIn began ingesting and therefore sharing over 1 billion events a day. Recently, LinkedIn reported an ingestion rate of 1 trillion messages a day; That’s a lot of available data! 


Kafka Working Model

Kafka is a publish-subscribe system that delivers persistent, in-order, scalable messaging. It has topics, publishers and subscribers. It can partition message topics and facilitate high-scale parallel consumption. Messages written to Apache Kafka are stored and replicated to help brokers be fault tolerant while messages remain available for a defined time period.

The key to Apache Kafka is the log. Developers sometimes get confused after hearing the term “log,” since they primarily understand “logs” in the context of application logs, while in this case, it is actually the log data structure. The log is simply a time-ordered, append-only series of data inserts which can be anything (in Kafka, it’s an array of bytes). If this seems like a simple data structure on which a database is developed, it is.


Where Can Kafka Fit in?

Apache Kafka has become a popular tool for enterprises and developers since it is easy to pick up and offers a powerful event streaming platform with 4 APIs: 

  • Producer
  • Consumer
  • Streams
  • Connect

Often, developers start with a single use case, for instance, using Kafka as a message buffer as workaround for a legacy database that is unable to bear today’s workloads, or the Connect API to keep the database in-line with associated search indexing engine, for processing data as it arrives with Streams API to highlight aggregations right back to the application.

The Verdict

In a nutshell, Apache Kafka makes the complex process of developing data-driven apps and back-end systems simple. It provides you relief by keeping your data fault-tolerant, real-time, and replayable, always. It helps developers with a single event streaming platform to collect, process, store, and connect the enterprises systems and apps with real-time data.

Download our whitepaper

Want to know how we have build a platform based on Apache Kafka, including the learnings? Fill in the form below and we send you our whitepaper.



Event Streaming for the Energy Industry

Hidden costs & risks of implementing Apache Kafka