Skip to main content

Data Lakehouse connector

The Databricks data catalog your whole team can trust.

The Dawiso Databricks connector turns your workspace into a searchable data catalog: every Unity Catalog schema, table, view and volume, with table and column lineage to every BI dashboard downstream.

Live connector Stable connector
Databricks
Dawiso
Metadata-only · your data never leaves the source
Type
Data lakehouse platform
Auth
Personal Access Token (PAT)
Sync
Scheduled, incremental
Direction
Read-only · metadata

First things first

What is a data connector?

Metadata-only Read-only access Incremental sync Cross-system lineage

A data connector is the bridge between a tool in your stack and the catalog that gives you a unified view of it. Once a connector is configured, it reaches into the source system on a schedule, reads out the metadata - schemas, tables, dashboards, jobs, ownership, lineage - and represents it inside the catalog. Your actual rows and values stay where they are.

Connectors are the reason a data catalog can answer questions like "which Power BI dashboard depends on this Snowflake table?" or "who owns the orders topic in Kafka?" - automatically, without anyone keeping a spreadsheet up to date.

Three properties separate a good connector from a brittle one: it should be read-only and safe, it should be incremental so a full re-scan isn't required for every refresh, and it should resolve lineage across system boundaries, not just inside one tool.

About the platform

What is Databricks?

Databricks is the data lakehouse platform that combines warehousing, data engineering and AI on one foundation. Teams at retailers, banks and pharma companies run their analytics, ML pipelines and GenAI workloads on Delta Lake tables governed by Unity Catalog.

Unity Catalog covers what lives inside Databricks. The hard part is what doesn't: the upstream Snowflake or Kafka topic that fed the table, the Power BI report consuming it downstream, the business glossary the data steward owns. That's where the Dawiso Databricks data catalog joins the picture: read-only, metadata-only, and cross-platform.

Architecture

How Dawiso connects to Databricks

A small read-only role on the Databricks side. The Dawiso scanner pulls metadata on a schedule. Everything ends up in your catalog, business-readable.

Source

Databricks workspace

  • Unity Catalog metastore
  • Delta tables, views
  • Volumes
  • Table & column lineage
REST · JDBC

Dawiso scanner

Read-only metadata

  • Schema & object discovery
  • Dependency resolution
  • SQL flow parsing (optional)
  • Sampling on opt-in
Internal

Catalog

Dawiso platform

  • Searchable metadata
  • Lineage & ownership
  • Business glossary
  • Policy & classifications

Connection details

Protocol
Databricks SQL + Unity Catalog REST API
Authentication
Personal Access Token; service principal supported
Lineage
Unity Catalog system tables (system.access.table_lineage, column_lineage) provide table- and column-level lineage.

Setup

Connect Databricks in 4 steps

  1. 01

    Provision a service principal

    Create a Databricks service principal with USE_CATALOG and SELECT on every catalog Dawiso should read. Grant access to system.access.* lineage tables.

  2. 02

    Generate a PAT

    Generate a Personal Access Token for the service principal. Store it in Dawiso's connection vault - it never leaves the platform.

  3. 03

    Connect and pick catalogs

    Add the workspace URL and PAT in Dawiso. Choose which Unity Catalogs to ingest in one comma-separated list.

  4. 04

    Run ingestion

    Scheduled incremental sync keeps the catalog current. Column-level lineage resolves from Unity Catalog on premium and enterprise tiers.

Capabilities

What you get with the Databricks connector

  • Unity Catalog mirror

    Every catalog, schema, Delta table and view is searchable, with column descriptions, types, tags and the job that built it.

  • Column-level lineage

    Cross-platform lineage from a Power BI visual through Databricks DLT pipelines down to the raw bronze table.

  • Unity Catalog tags in context

    Object tags assigned in Unity Catalog are read into Dawiso, so masking and row-filter context stays visible alongside the catalog. Read-only, metadata-only.

  • PII classification

    Classify a column once. Dawiso flags every Databricks column carrying email, IBAN or government IDs across all catalogs.

  • Ownership & certification

    Mark tables as certified, deprecated or under review. The owner is visible directly in the catalog and on the Databricks side.

Business value

Why teams turn on the Databricks connector

  • -70%

    Fewer 'which table?' pings

    Analysts find the certified gold Delta table in Dawiso instead of pinging the data team to ask which silver table maps to revenue.

  • 8x

    Faster impact analysis

    Before altering a Databricks column, see exactly which jobs, BI reports and ML features depend on it. Hours, not days.

  • Audit-ready

    GDPR & DORA evidence

    Sensitive columns are classified once and the policy follows them through Delta tables, joins and downstream BI, with a full audit trail.

Ready to catalog your Databricks?

Set up the connector in an afternoon. See your first lineage graph the same day.

Frequently asked questions

Still curious? Talk to our team ->
What is a data catalog in Databricks?
Databricks uses Unity Catalog to govern data inside the platform. Dawiso complements it with a business-facing, cross-platform catalog: it ingests Unity Catalog metadata read-only and connects tables to BI and source systems with column-level lineage.
What is the difference between Databricks catalog and metastore?
In Databricks the metastore is the top-level container; a catalog is the first level inside Unity Catalog (catalog > schema > table). Dawiso reads Unity Catalog metadata read-only and adds cross-platform lineage and a glossary spanning beyond Databricks.
What is a data catalog used for?
A data catalog makes every Databricks table discoverable, documented and trustworthy. Dawiso turns Unity Catalog metadata into one searchable catalog the whole business can use, with lineage back to source.
What permissions does Dawiso need in Databricks?
A dedicated service principal with USE_CATALOG and SELECT on every catalog Dawiso reads, plus SELECT on the system.access.* schema for lineage. Dawiso never modifies your data.
Does Dawiso copy our Databricks data?
No. Dawiso queries Unity Catalog metadata APIs and the system.access schema for metadata only. Row-level data stays inside Databricks. Column profiling and sampling are opt-in per data source.
How is column-level lineage built?
Object and column dependencies come from system.access.table_lineage and system.access.column_lineage system tables. Available on premium and enterprise Databricks plans.
Does Dawiso work with workspace-level metastores?
Unity Catalog (account-level metastore) is the supported path. Legacy workspace metastores are read in compatibility mode without lineage.