Use Marquez to collect, aggregate, and visualize metadata about your data pipelines and applications.
Marquez is an open source metadata service. It maintains data provenance, shows how datasets are consumed and produced, provides global visibility into job runtimes, centralizes dataset lifecycle management, and much more.
Marquez was released and open sourced by WeWork.
Marquez is a metadata server, offering an OpenLineage-compatible endpoint for real-time collection of information from running jobs and applications.
As the reference implementation of OpenLineage, the Marquez API server already works with all of its integrations developed by the community. This includes Apache Airflow, Apache Spark, dbt, Dagster, and Great Expectations.
Through a web user interface, Marquez can provide a visual map that shows complex interdependencies within your data ecosystem.
The user interface allows you to browse the metadata within Marquez, making it easy to see the inputs and outputs of each job, trace the lineage of individual datasets, and study performance metrics and execution details.
Lineage metadata can be queried using the lineage API, allowing for automation of key tasks like backfills and root cause analysis.
With the Lineage API, you can easily traverse the dependency tree and establish context for datasets across multiple pipelines and orchestration platforms. This can be used to enrich data catalogs and data quality systems.