Skip to the content.

Run State Transitions

Run states change based on the success or failure of a job run. The datasets consumed and/or produced by a job run are immutable and do not change based on the success or failure of the run.

Dataset Versioning Lifecycle

Dataset versions change based on the success or failure of job runs that consume or generate those datasets. Typically, Runs generate dataset versions as outputs, but sometimes a job run may result in an input dataset version being created if that dataset did not previously exist in the database. Once a dataset exists, a new version will only be created if the dataset is the output of a job run.

As code:

for dataset in jobRun.inputDatasets:
  if not exists(dataset):
    create(dataset)
    version = dataset.createVersion('version1')
    dataset.setCurrentVersion(version)
  jobRun.addInputVersion(dataset.getCurrentVersion())

for dataset in jobRun.outputDatasets:
  datasetVersion
  if not exists(dataset):
    create(dataset)
    datasetVersion = dataset.createVersion('version1')
  else:
    curVersion = dataset.getCurrentVersion()
    datasetVersion = datasetVersion.incrementVersion()
  jobRun.addOutputVersion(dataset.getCurrentVersion())
  if jobRun.suceeded:
    dataset.setCurrentVersion(datasetVersion)

Definitions

Assume the following job graph

Produce a new dataset

Consume a new dataset

Successful Job Chain - Produce and Consume New Datasets

Consume and produce dataset

Successful Job Chain - Produce and Consume Existing Datasets

Failed Job Chain - Produce and Consume Existing Datasets - Workflow Stops

Failed Job Chain - Produce and Consume Existing Datasets - Workflow Continues

Consume and produce dataset failed job

Parent jobs with successful child jobs

Parent job success/failure does not impact the status of the datasets created by child jobs.

Consume and produce dataset in parent job

Parent jobs with failed child jobs

Parent job failure does not impact the status of the datasets created by child jobs.

Parent has failed child jobs

Parent jobs fails

Parent job failure does not impact the status of the datasets created by child jobs.

Parent has failed child jobs


SPDX-License-Identifier: Apache-2.0 Copyright 2018-2023 contributors to the Marquez project.