Skip to main content

Streaming · Event platform connector

The Kafka data catalog your whole team can trust.

The Dawiso Apache Kafka data catalog turns your clusters into a searchable inventory: every cluster, broker, topic and consumer group, with producer-to-consumer lineage.

Live connector Stable connector
Kafka
Dawiso
Metadata-only · your data never leaves the source
Type
Distributed event streaming
Auth
SASL/PLAIN (JAAS) · technical user with ACLs
Sync
Scheduled, incremental
Direction
Read-only · metadata

First things first

What is a data connector?

Metadata-only Read-only access Incremental sync Cross-system lineage

A data connector is the bridge between a tool in your stack and the catalog that gives you a unified view of it. Once a connector is configured, it reaches into the source system on a schedule, reads out the metadata - schemas, tables, dashboards, jobs, ownership, lineage - and represents it inside the catalog. Your actual rows and values stay where they are.

Connectors are the reason a data catalog can answer questions like "which Power BI dashboard depends on this Snowflake table?" or "who owns the orders topic in Kafka?" - automatically, without anyone keeping a spreadsheet up to date.

Three properties separate a good connector from a brittle one: it should be read-only and safe, it should be incremental so a full re-scan isn't required for every refresh, and it should resolve lineage across system boundaries, not just inside one tool.

About the platform

What is Apache Kafka?

Apache Kafka is the open-source event streaming platform that moves data between systems in real time. Banks use it for transactions, retailers for clickstreams, manufacturers for sensor telemetry. Common pairings include a Schema Registry for Avro or Protobuf, Kafka Connect, and downstream warehouses like Snowflake or BigQuery.

Running Kafka is the easy part. Knowing what flows through hundreds of topics, who owns them, and which schemas evolved last night is the hard part: producers ship, consumers break, no catalog tells you why. That's where the Dawiso Apache Kafka data catalog joins the picture: read-only, metadata-only, and cross-platform.

Architecture

How Dawiso connects to Kafka

A small read-only role on the Kafka side. The Dawiso scanner pulls metadata on a schedule. Everything ends up in your catalog, business-readable.

Source

Kafka cluster

  • Brokers & cluster config
  • Topics & partitions
  • Producers & consumers
  • ACLs & configs
REST · JDBC

Dawiso scanner

Read-only metadata

  • Schema & object discovery
  • Dependency resolution
  • SQL flow parsing (optional)
  • Sampling on opt-in
Internal

Catalog

Dawiso platform

  • Searchable metadata
  • Lineage & ownership
  • Business glossary
  • Policy & classifications

Connection details

Protocol
Kafka Admin API
Authentication
SASL/PLAIN with JAAS file · dedicated technical user · Describe + Read ACLs
Lineage
Producer-to-topic-to-consumer relationships resolved from consumer-group metadata and broker ACLs; cross-platform downstream lineage stitched with warehouse and dbt sources

Setup

Connect Kafka in 4 steps

  1. 01

    Create a JAAS technical user

    Create a PlainLoginModule entry in kafka_server_jaas.conf for a dedicated user (e.g. dawiso_technical_user). Pass the file path to brokers via -Djava.security.auth.login.config.

  2. 02

    Grant read-only ACLs

    Use kafka-acls.sh to grant Describe and Read on topic '*' and Describe + DescribeConfigs on the consumer groups you want catalogued. Read-only end to end.

  3. 03

    Connect in Dawiso

    Provide bootstrap servers and credentials. The connection is validated against the Admin API in seconds.

  4. 04

    Run ingestion

    Scheduled incremental sync keeps topics and consumer groups current. Topic and ownership changes flag downstream consumers for impact review.

Capabilities

What you get with the Kafka connector

  • Topic & cluster catalog

    Every topic across every cluster is searchable, with partition count, retention, owners and the team that built the producer.

  • Schemas via Confluent

    Running a Schema Registry? Avro, Protobuf and JSON schemas - with version history and diffs - are catalogued through the dedicated Confluent Kafka connector.

  • Producer to consumer lineage

    See which services publish to a topic and which downstream jobs read from it. Impact analysis works for streaming the same as for tables.

  • PII in events

    Classify a topic once. Dawiso flags every topic carrying email, IBAN or government IDs across all clusters and environments.

  • Data contracts

    Promote a topic to a data contract. Block breaking changes before they ship and notify subscribers when the contract evolves.

  • Throughput & freshness

    Volume per topic, lag per consumer group and last-written timestamp surface inside the catalog, next to the owner.

Business value

Why teams turn on the Kafka connector

  • 0

    Silent topic breakages

    Ownership, data contracts and consumer notifications stop an unannounced topic change from taking down three downstream services overnight.

  • −80%

    Time to find a topic

    New engineers stop pinging Slack to ask 'which topic has order events.' They search the Apache Kafka data catalog and read.

  • EU-grade

    Streaming governance

    PII flowing through Kafka is classified, auditable and policy-tracked the same way as data sitting in your warehouse.

Ready to catalog your Kafka?

Set up the connector in an afternoon. See your first lineage graph the same day.

Frequently asked questions

Still curious? Talk to our team ->
Does Kafka have a data catalog?
Kafka itself has none; Confluent adds Schema Registry and Stream Catalog. Dawiso reads your topics read-only and connects streaming data to the warehouses and BI downstream, so lineage and ownership span the whole stack. Schema Registry objects are catalogued through the separate Confluent Kafka connector.
What is metadata in Kafka?
Kafka metadata covers topics, partitions, consumer groups and offsets. Dawiso reads topic and consumer-group metadata read-only, documents each topic with ownership and meaning, and traces which systems produce and consume it.
What permissions does Dawiso need in Kafka?
A dedicated SASL technical user (e.g. dawiso_technical_user) with Describe + Read ACLs on topic '*' and Describe + DescribeConfigs on the consumer groups you want catalogued. Dawiso never produces to or consumes from your topics.
Does Dawiso consume from our topics?
No. Default mode is metadata-only via the Kafka Admin API. Topic sampling is opt-in per data source, runs only when you explicitly enable it, and never starts automatically.
How are producers and consumers linked?
From consumer-group metadata on the brokers, from broker ACLs, and from cross-platform lineage where the topic is the source or sink for an ingested dbt or warehouse object. Manual mappings can be curated in Dawiso.
Which Kafka distributions are supported?
Apache Kafka with SASL/PLAIN authentication. The connection form requires bootstrap servers and a JAAS-defined user. For Confluent Cloud and Confluent Platform use the separate Confluent Kafka connector, which speaks the Confluent REST endpoint and Stream Governance API.