Kafkagen: testing your Kafka clients made simple

Kafkagen: testing your Kafka clients made simple
Photo by Pablo Arroyo / Unsplash

If you ever:

  • used kcat, the kafka-console-producer or any other third-party CLI to produce Kafka messages,
  • developed a custom Kafka producer for you or your team to ease testing your Kafka client,
  • used any Kafka GUI like AKHQ, Kafbat and already produced messages with it,
  • struggled as a software engineer, quality engineer, business analyst, etc. to produce messages during your tests,

you are at the right place. Let us discover together kafkagen (kafka messages generator - but not only!), our easy-to-use CLI that will make your tests simpler.

A bit of context

We conducted interviews with several teams working with Kafka in our Manufacturing department. These teams had different backgrounds (.NET, Java, more or less experienced with Kafka). We wanted to understand how they were working with Kafka, their practices, what were their pain points. Apart from the improvements needed in AKHQ that I explain in a previous article, one recurrent topic was about testing, generating messages for testing purposes, reusing messages, etc.

Each team had its own practice, from managing a custom Java Kafka producer only for tests, to exposing a REST API that people can call with Postman to produce messages. I thought that we should do something and propose an easy and generic way to handle Kafka messages during testing. This is how kafkagen was born.

What kafkagen offers

Kafkagen is designed to answer 3 needs that people have during Kafka clients development and testing: building and managing your datasets, producing messages and finally asserting your topic state.

Datasets

It is also painful to write the Kafka messages that you need as input to your microservice. Complex schemas, reproducible messages, full scenario with multiple messages. Kafkagen simplifies all of this. No matter if you are working with Avro, Protobuf, JSONSchema or plain text, just use a JSON file (or YAML if you prefer) to define your message. Let kafkagen check if there is a schema associated to your topic to use the right serializer. This way, define your messages like this:

[ {
  "headers" : {
    "header1" : "valueHeader1",
    "header2" : "valueHeader2"
  },
  "key" : "String_value",
  "value" : {
    "field1" : "String_value",
    "field2" : 1,
    "fieldArray": [ 1, 2, 3],
    "fieldRecord" : {
      "fieldRecord1" : "String_value",
      "fieldRecord2" : 1
    }
  }
} ]

Suppose you want to produce a message for a brand new topic with a complex schema. It is painful to write it from scratch in JSON. Using a simple command: kafkagen sample myTopic, kafkagen gives you a sample message that you can reuse and customise.

If you prefer to replay a message you have in production on your dev environment, kafkagen allows you to export a dataset from a topic, save it and produce it on another environment. Check the dataset command to see its capabilities. You can either:

  • export entire partitions. kafkagen dataset myTopic -o 0 1 exports the partitions 0 and 1,
  • a single/multiple offset(s). kafkagen dataset myTopic -o 0=1234 exports the message with offset 1234 on partition 0. You can also, give a comma-separated list to export multiple offsets,
  • a range of offset. kafkagen dataset myTopic -o 0=1234-1245 exports all the messages between offset 1234 and 1245 for partition 0,
  • a dataset from a key. kafkagen dataset myTopic -k myKey exports all the messages with the key myKey in the topic.

Produce messages

What if you need to produce several messages in different topics ? It can easily become difficult and time-consuming to do so using kcat for instance. kafkagen offers 2 ways to manage complex use cases: multi-topics dataset and scenarios.

A multi-topics dataset is a dataset in which you can combine messages targeting different topics, by adding the topic field for each message.

[ {
  "topic": "myFirstTopic",
  "key" : "String_value",
  "value" : {
    "field1" : "String_value",
    "field2" : "String_value"
  }
}, {
  "topic": "mySecondTopic",
  "key" : "String_value",
  "value" : {
    "field1Int" : 1,
    "field2" : "String_value"
  }
}]

A scenario is a YAML file containing all the datasets (single or multi topics) to produce for the test.

---
apiVersion: v1
metadata:
  description: Messages for myFirstTopic
spec:
  topic: myFirstTopic
  datasetFile: myFirstTopicDataset1.json
---
apiVersion: v1
metadata:
  description: Messages for mySecondTopic and myThirdTopic
spec:
  datasetFile: mySecondAndThirstTopicDataset1.json

Asserting

Until now, we only talked about creating datasets and producing messages in Kafka topics. On the other side, you may need to make sure that messages your microservice should have produced are present in your topic. This is where the kafkagen assert command can help you.

Like for producing messages, you can give a dataset to the assert command to look for the messages that you are expecting (kafkagen assert myTopic -f expectedMessages.json ). The most interesting thing is that you can choose between a strict and a lazy assertion. By default, a lazy assertion is performed. It means that you can put only the fields you want to check in the expected dataset. You can therefore skip all the unpredictable fields (GUID, datetime, etc.) and all the fields you want to skip. If you prefer, use the -s option for a strict assertion.

We all know that consuming an entire topic with thousands, millions of messages can be long. You can use the -t option to give a start timestamp milliseconds to start looking for your messages from this timestamp.

The output of the command will be either The dataset has been found in the topic or the list of records that didn't match.

Under the hood

I had one thing in mind when I started working on kafkagen: provide a simple CLI that everyone can use. I wanted to allow non-tech people to be able to use it, as we have functional / quality analyst doing tests on the apps without a tech background.

I naturally opted for picocli and native-image. I also chose Quarkus because I wanted to try something new after already working with Spring boot and Micronaut. This way, kafkagen is available natively for Windows, Linux and MacOS (ARM). Docker images are also provided (JVM-based or native).

To sum up

After developing kafkagen as an innersource project firstly, I witnessed the different usages it serves over the last months. A software engineer uses kafkagen to produce messages to do some unitary tests of the feature he/she is working on. The team can choose kafkagen to setup kind of non-regression tests by using kafkagen to init an environment and perform some checks at the end. I can say that the usage of kafkagen that I'm mostly proud of is the business / quality analysts usage. Non-tech people can now define their JSON datasets (to produce or expected), produce messages by themselves and ask kafkagen to check if messages exists in a topic.

This project is now open source, waiting for feedbacks, bug reports (yes!) and contributions to place kafkagen as a new tool in your Kafka toolbox ! Check the project on GitHub

GitHub - michelin/kafkagen: Testing your Kafka clients made simple.
Testing your Kafka clients made simple. Contribute to michelin/kafkagen development by creating an account on GitHub.