schema-registry-statistics

command module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2023 License: Apache-2.0 Imports: 14 Imported by: 0

README

schema-registry-statistics

Schema Registry Statistics Tool is a small utility that allows you to easily identify the usage of different schema versions within a topic.
Using this tool, you can consume from a topic, while calculating the percentage of each schema version.

Table of Contents

Example output:

[sr-stats] 2022/12/28 10:02:12 Starting to consume from payments-topic
[sr-stats] 2022/12/28 10:02:12 Consumer up and running!...
[sr-stats] 2022/12/28 10:02:12 Use SIGINT to stop consuming.
[sr-stats] 2022/12/28 10:02:14 terminating: via signal
[sr-stats] 2022/12/28 10:02:14 Total messages consumed: 81
Schema ID 1 => 77%
Schema ID 3 => 23%

As you can see, in the payments-topic, 77% of the messages are produced using schema ID 1, while the remaining messages are produced using schema ID 3.

You can get the schema by ID:

curl -s http://<SCHEMA_REGISTRY_ADDR>/schemas/ids/1 | jq .

For further offsets analysis, you can store the results into a JSON file:

{"1":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61],"3":[62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80]}

Flags

Name Description Require Type default
--bootstrap The Kafka bootstrap servers. V string "localhost:9092"
--topic The topic name to consume. V string ""
--version The Kafka client version to be used. string "2.1.1"
--group The consumer group name. string schema-stats
--user The Kafka username for authentication. string ""
--password The Kafka password for authentication. string ""
--tls Use TLS communication. bool false
--cert When TLS communication is enabled, specify the path for the CA certificate. when tls string ""
--store Store results into a file. bool false
--path If store flag is set, the path to store the file. string "/tmp/results.json"
--oldest Consume from oldest offset. bool true
--limit Limit consumer to X messages, if different than 0. int 0
--verbose Raise consumer log level. bool false

Usage

./schema-registry-statistics --bootstrap kafka1:9092 --group stat-consumer --topic payments-topic --store --path ~/results.json

Consume from payments-topic of kafka1 and store the results. The consumer will run until SIGINT (CMD + C) will be used.

How does it work?

According the Kafka wire format, has only a couple of components:

Bytes Area Description
0 Magic Byte Confluent serialization format version number; currently always 0.
1-4 Schema ID 4-byte schema ID as returned by Schema Registry.
5.. Data Serialized data for the specified schema format (Avro, Protobuf).

The tool leverage this format, and reads the binary format of the each message in order to extract the schema ID and store it.

Local testing

You can use the docker-compose.yml file to create a local environment from scratch.
In the /scripts directory, there are 2 versions of the same schema, and a simple Python Avro producer.

License

This project is licensed under the Apache License - see the LICENSE file for details.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL