Streaming · Event platform connector
The Kafka data catalog your whole team can trust.
The Dawiso Apache Kafka data catalog turns your clusters into a searchable inventory: every cluster, broker, topic and consumer group, with producer-to-consumer lineage.
First things first
What is a data connector?
A data connector is the bridge between a tool in your stack and the catalog that gives you a unified view of it. Once a connector is configured, it reaches into the source system on a schedule, reads out the metadata - schemas, tables, dashboards, jobs, ownership, lineage - and represents it inside the catalog. Your actual rows and values stay where they are.
Connectors are the reason a data catalog can answer questions like "which Power BI dashboard depends on this Snowflake table?" or "who owns the orders topic in Kafka?" - automatically, without anyone keeping a spreadsheet up to date.
Three properties separate a good connector from a brittle one: it should be read-only and safe, it should be incremental so a full re-scan isn't required for every refresh, and it should resolve lineage across system boundaries, not just inside one tool.
About the platform
What is Apache Kafka?
Apache Kafka is the open-source event streaming platform that moves data between systems in real time. Banks use it for transactions, retailers for clickstreams, manufacturers for sensor telemetry. Common pairings include a Schema Registry for Avro or Protobuf, Kafka Connect, and downstream warehouses like Snowflake or BigQuery.
Running Kafka is the easy part. Knowing what flows through hundreds of topics, who owns them, and which schemas evolved last night is the hard part: producers ship, consumers break, no catalog tells you why. That's where the Dawiso Apache Kafka data catalog joins the picture: read-only, metadata-only, and cross-platform.
Architecture
How Dawiso connects to Kafka
A small read-only role on the Kafka side. The Dawiso scanner pulls metadata on a schedule. Everything ends up in your catalog, business-readable.
Source
Kafka cluster
- Brokers & cluster config
- Topics & partitions
- Producers & consumers
- ACLs & configs
Dawiso scanner
Read-only metadata
- Schema & object discovery
- Dependency resolution
- SQL flow parsing (optional)
- Sampling on opt-in
Catalog
Dawiso platform
- Searchable metadata
- Lineage & ownership
- Business glossary
- Policy & classifications
Connection details
- Protocol
- Kafka Admin API
- Authentication
- SASL/PLAIN with JAAS file · dedicated technical user · Describe + Read ACLs
- Lineage
- Producer-to-topic-to-consumer relationships resolved from consumer-group metadata and broker ACLs; cross-platform downstream lineage stitched with warehouse and dbt sources
Setup
Connect Kafka in 4 steps
- 01
Create a JAAS technical user
Create a PlainLoginModule entry in kafka_server_jaas.conf for a dedicated user (e.g. dawiso_technical_user). Pass the file path to brokers via -Djava.security.auth.login.config.
- 02
Grant read-only ACLs
Use kafka-acls.sh to grant Describe and Read on topic '*' and Describe + DescribeConfigs on the consumer groups you want catalogued. Read-only end to end.
- 03
Connect in Dawiso
Provide bootstrap servers and credentials. The connection is validated against the Admin API in seconds.
- 04
Run ingestion
Scheduled incremental sync keeps topics and consumer groups current. Topic and ownership changes flag downstream consumers for impact review.
Capabilities
What you get with the Kafka connector
-
Topic & cluster catalog
Every topic across every cluster is searchable, with partition count, retention, owners and the team that built the producer.
-
Schemas via Confluent
Running a Schema Registry? Avro, Protobuf and JSON schemas - with version history and diffs - are catalogued through the dedicated Confluent Kafka connector.
-
Producer to consumer lineage
See which services publish to a topic and which downstream jobs read from it. Impact analysis works for streaming the same as for tables.
-
PII in events
Classify a topic once. Dawiso flags every topic carrying email, IBAN or government IDs across all clusters and environments.
-
Data contracts
Promote a topic to a data contract. Block breaking changes before they ship and notify subscribers when the contract evolves.
-
Throughput & freshness
Volume per topic, lag per consumer group and last-written timestamp surface inside the catalog, next to the owner.
Business value
Why teams turn on the Kafka connector
- 0
Silent topic breakages
Ownership, data contracts and consumer notifications stop an unannounced topic change from taking down three downstream services overnight.
- −80%
Time to find a topic
New engineers stop pinging Slack to ask 'which topic has order events.' They search the Apache Kafka data catalog and read.
- EU-grade
Streaming governance
PII flowing through Kafka is classified, auditable and policy-tracked the same way as data sitting in your warehouse.
Ready to catalog your Kafka?
Set up the connector in an afternoon. See your first lineage graph the same day.
Frequently asked questions
Does Kafka have a data catalog?
What is metadata in Kafka?
What permissions does Dawiso need in Kafka?
Does Dawiso consume from our topics?
How are producers and consumers linked?
Which Kafka distributions are supported?
Explore more connectors
Kafka is one of 30+ connectors. Bring your whole stack into the catalog.
-
Data Warehouse Snowflake -
Data Lakehouse Databricks -
Business Intelligence Power BI -
Business Intelligence Tableau -
Data Warehouse Google BigQuery -
Data Warehouse Amazon Redshift