Skip to main content
datahub pricingdatahub cloud pricingopen coreopen source data catalogtotal cost of ownership

DataHub Pricing: The Open-Core Model and What It Really Costs

DataHub Core is free to license, and that single fact misleads more catalog evaluations than any other. Yes, the open-source project is Apache 2.0 with no license fee. But running it in production is a real platform-engineering bill, and the open-core model keeps part of the feature set behind the paid DataHub Cloud, so "free" describes the license on the core, not the cost of the product. The useful question is not whether DataHub is free; it is what it actually costs to run and to get the features you need.

This guide separates three things teams blur together: the price of the software (the core is free), the total cost of operating it (a real platform-engineering bill), and the open-core boundary that decides which features require a paid subscription. Understanding all three is the only way to compare DataHub honestly against managed alternatives.

TL;DR

DataHub Core is free under Apache 2.0, but two costs hide behind that. First, self-hosting a multi-service stack (Kafka, Elasticsearch, the metadata service, the metadata store, and the frontend) is a real platform-engineering project, typically 6 to 12 weeks to production plus ongoing maintenance. Second, the open-core model puts parts of the governance lifecycle and enterprise features in paid DataHub Cloud, priced through sales rather than a public list. "Free core" is genuine; "everything is free" is not. Compare with OpenMetadata pricing.

Is DataHub Free?

Yes, but only the license on the core, which is the smallest part of the cost. DataHub Core is Apache 2.0 with no per-user fee, so the software itself is free to download and self-host. The catch is everything around it: the infrastructure and people needed to run a multi-service stack in production, and the enterprise and governance features that sit in the paid DataHub Cloud rather than the free core. So "DataHub is free" is true of the license and misleading about the product.

The real number is total cost of ownership plus whatever cloud features you end up needing, and the open-core boundary decides how much of that second part applies to you.

The Open-Core Catch

DataHub uses an open-core business model. DataHub Core is the genuinely open foundation. DataHub Cloud is the commercial edition built on that same foundation, adding managed hosting, SLA-backed uptime, advanced governance automation, and managed AI and agent features.

The structural point to understand is that the vendor decides where the line sits between the free core and the paid cloud, and that line is theirs to move, not yours. In practice this means certain governance-lifecycle capabilities, such as approval and certification workflows, no-code automations, and managed delivery of context to AI agents, are positioned as cloud features rather than core ones. A team can invest months building on the open-source core and then discover that a capability they assumed was included sits on the paid side of the boundary.

This is not unique to DataHub; it is the nature of open core. The practical defense is to evaluate by feature, not by label: list the capabilities you actually need, then confirm which edition contains each one before you commit engineering effort. Treat "open source" as a statement about the core, not a guarantee about the whole product.

DataHub Open-Core: Free Core vs Paid Cloud THE VENDOR DECIDES WHERE THE LINE SITS DATAHUB CORE free, open source (Apache 2.0) Catalog and search Column-level lineage Business glossary (core) Self-hosted, you run it DATAHUB CLOUD paid, priced through sales Approval and certification workflows No-code governance automations Managed AI and MCP delivery Managed hosting and SLA uptime Evaluate by feature, not by label: confirm which edition contains each capability you need.
Click to enlarge

The Real Cost of Self-Hosting

Self-hosting DataHub Core means your team owns a multi-service stack. The recurring costs fall into three buckets.

Cloud infrastructure. DataHub runs Kafka for the event stream, Elasticsearch for search and the graph index, a relational metadata store, the metadata service, and the frontend. Each needs compute and storage, sized for your metadata volume and uptime expectations. This is a heavier footprint than a single-server catalog, so the infrastructure line item is correspondingly larger.

Engineering time to deploy and integrate. Standing up the stack, configuring connectors, wiring authentication, and tuning Kafka and Elasticsearch is a platform-engineering project. Time to production is commonly in the range of 6 to 12 weeks for a self-hosted deployment, depending on scale and team experience.

Ongoing maintenance and ownership. The cost teams underestimate most. Someone has to handle upgrades across multiple services, security patches, scaling, backups, monitoring, and incidents, and keep connectors working as sources change. With a multi-service architecture, this ownership burden is continuous and is usually the largest cost over a multi-year horizon.

None of these are license fees, which is the recurring theme of open-source catalogs: the software is free, the operation is not.

DataHub Cloud Pricing

DataHub Cloud is the managed, commercial edition. It removes the operational burden of self-hosting and adds the enterprise and governance features positioned on the cloud side of the open-core line.

On pricing, DataHub Cloud follows the enterprise pattern rather than publishing a fixed public price list: plans are arranged through sales, with custom pricing based on scale and requirements, and listings are also available through cloud marketplaces. In practice you contact the vendor for a quote. Buyers should clarify how pricing scales, since metered models tied to the number of tables or assets can grow with your estate, and ask which specific features are included at the tier being quoted.

The trade is straightforward in shape: you exchange the engineering and operational cost of self-hosting for a subscription, and you gain the cloud-only features. Whether it is worth it depends on which features you need and how your costs scale as your data estate grows.

Why Teams Re-Evaluate

The combination of a heavy self-hosted stack and an open-core feature boundary leads some teams to re-evaluate their catalog. The pattern is familiar: a team adopts the open-source core, invests real effort building glossary content, lineage, and governance on top of it, and over time finds that the total cost of ownership and the features gated to the paid cloud no longer fit their plan.

When that happens, the priorities for a replacement are usually the same three things: migrate the work already done without losing it, get predictable total cost rather than an open-ended one, and avoid having core governance features sit behind a higher tier. This is why a number of teams running open-core catalogs now look at fully managed vendors, including Dawiso, that can take on existing work and ship the same feature set on every plan.

How Dawiso Compares on Cost

Dawiso approaches the same problem with a managed model and transparent, per-seat pricing, and the relevant comparison is total cost of ownership rather than license versus license.

With a self-hosted open-core catalog, the recurring spend is your platform team plus infrastructure, and some capabilities require the paid cloud tier on top. With Dawiso, there is no multi-service stack to run, because the platform is managed in every deployment option, and there is no open-core upgrade tax, because the same feature set ships on every plan. The product is built for fast adoption by business users through a business glossary and data catalog designed for them, and governed context is served to AI agents through the Model Context Protocol (MCP) as a managed capability. Excel import and export and an open metadata model also make it practical to migrate existing glossary and catalog work rather than rebuild it.

The honest summary: DataHub Core can be the lower-cost option for engineering-led teams who already operate platform infrastructure and need only core features, while a managed catalog like Dawiso often wins on total cost once you account for operating a multi-service stack, the features gated to the paid cloud, and the people and time required to drive adoption.

Conclusion

DataHub's pricing has a simple headline and a more useful reality. The headline: DataHub Core is free under Apache 2.0. The reality: running it is a real platform-engineering cost, and the open-core model places part of the governance lifecycle in paid DataHub Cloud, priced through sales. Budget for total cost of ownership, evaluate by feature rather than by the "open source" label, and weigh self-hosting against managed options before deciding. For a side-by-side view, see the Dawiso vs DataHub comparison, and for the other major open-source option, see OpenMetadata pricing.

See it in action

Data & Analytics Catalog

Create a unified view of your data assets and gain insights faster with automated data discovery.