Data Governance Tools

In any business-grade Kafka deployment and configuration, you must develop a solid governance framework to guarantee the security of private and confidential data as well as who is dealing with data and what types of operations are carried out on that data. Furthermore, governance framework determines who can access what kind of data and who can do operations on data elements. Apache Kafka is the software that allows you to transmit data across applications and services. It has become mainstream in all kind of businesses a few years ago, and for very good reasons. It’s very performant, answers to numerous use-cases, and applies to “simpler” designs which in turn reduces direct interfacing and therefore security requirements.

Apache Kafka started as a single application and has since started to offer more tools and turned itself into a great Streaming Platform where you can also enrich your data and join exterior systems through Kafka Streams (alter a stream of data “on-the-fly”) and Kafka Connect (to send a torrent of data into a third party, non-streaming solution, like for instance ElasticSearch or Hadoop).

Apache Avro has originated from the Hadoop world where people wanted a way to save, send, and query data competently while dealing with the versioning. Both Avro and Kafka are structured upon mechanical sympathy: they take benefits of the fundamental components— like caches, pipelines, access patterns, — to offer impactful performances.


Avro

Without going into much details, Avro is performant since its payload:

  • contains just data: no noise, no field names, just the bare data.

  • is schema-based: It knows how to read/write the data. The schema is not written inside the messages—except in a .avro file, where it may be written in its header.
  • is optimized for computer processing: it’s not supposed to be read by humans at this stage. 

Schemas are applied when reading and writing data. The schemas need to be shared between the services that convey data to each other — they can also be entrenched inside each messages but that affects efficiency.

In a nutshell

This was only a prelude to get the benefits of the mix of Kafka and Avro. It’s not like “you should do” article, but more like “you could do”.

Uniting the Kafka Platform and Avro provides you a bright and stable future through:

  • A performant and strong way of exchanging your data.
  • A total decoupling among producers and consumers: a service does not require to manage who consumes the data.
  • Governance: with the Schema Registry, you may list all the kind of data available and their schemas (and their evolutions). Nobody is required to ask you “what is the set-up of your data? send me the excel please”.
  • Future-proof: with Avro, you confirm nobody can insert “bad” data that could disrupt consuming services.

Download our whitepaper

Data governance tools are just one of the many things you have to keep an eye on when implementing Kafka for your business. Do you want to get a full overview of some other hidden costs and risks when implementing Apache Kafka for your business? Fill in the form below and we send you our whitepaper.

 

Important Kafka Performance Metrics to Monitor

Release Update 2020.2