Data lineage for every pipeline

Use Marquez to collect, aggregate, and visualize metadata about your data pipelines and applications.

What is Marquez?

Marquez is an open source metadata service. It maintains data provenance, shows how datasets are consumed and produced, provides global visibility into job runtimes, centralizes dataset lifecycle management, and much more.

Marquez was released and open sourced by WeWork.

What does Marquez do?

Real-time metadata collection

Marquez is a metadata server, offering an OpenLineage-compatible endpoint for real-time collection of information from running jobs and applications.

As the reference implementation of OpenLineage, the Marquez API server already works with all of its integrations developed by the community. This includes Apache Airflow, Apache Spark, dbt, Dagster, and Great Expectations.

Unified visual graph

Through a web user interface, Marquez can provide a visual map that shows complex interdependencies within your data ecosystem.

The user interface allows you to browse the metadata within Marquez, making it easy to see the inputs and outputs of each job, trace the lineage of individual datasets, and study performance metrics and execution details.

Flexible Lineage API

Lineage metadata can be queried using the lineage API, allowing for automation of key tasks like backfills and root cause analysis.

With the Lineage API, you can easily traverse the dependency tree and establish context for datasets across multiple pipelines and orchestration platforms. This can be used to enrich data catalogs and data quality systems.