Ask AI

You are viewing an unreleased or outdated version of the documentation

Automation#

Dagster offers several ways to run data pipelines without manual intervention, including traditional scheduling and event-based triggers. Automating your Dagster pipelines can boost efficiency and ensure that data is produced consistently and reliably.

When one of Dagster's automation methods is triggered, a tick is created. A tick is an opportunity for one or more runs to be launched. A run will either materialize a selection of assets or execute a job. Some schedules and sensors will launch runs on every tick. Others have associated logic that is executed on each tick which determines the runs to be launched.

In this guide, we'll cover the available automation methods Dagster provides and when to use each one.


Prerequisites#

Before continuing, you should be familiar with:


Available methods#

In this section, we'll touch on each of the automation methods currently supported by Dagster. After that we'll discuss what to think about when selecting a method.

Schedules#

Schedules are Dagster's imperative option for automation. They allow you to specify exactly when a run should be launched, such as Mondays at 9:00 AM. Schedules can target a selection of assets or a job. Refer to the Schedules documentation to learn more.

Sensors#

Sensors launch runs in response to a detected event. They periodically check and execute logic to detect an event and conditionally launch runs. They are commonly used for situations where you want to materialize an asset on some externally observable trigger, such as:

  • A new file arrives in a specific location, such as Amazon S3
  • A webhook notification is received
  • An external system frees up a worker slot

You can also use sensors to act on the status of a job run. Refer to the Sensors documentation to learn more.

Declarative Automation
Experimental
#

Declarative Automation allows you to automatically materialize assets when specified criteria are met. Using Declarative Automation, you could update assets:

  • When the asset hasn't yet been materialized
  • When an asset's upstream dependency has been updated
  • After an asset's parents have been updated since a cron tick
  • ... based on your own custom conditions

Materialization conditions are declared on an asset-by-asset basis. Refer to the Declarative Automation documentation to learn more.

Asset Sensors
Experimental
#

Asset sensors launch runs when a specified asset is materialized. Using asset sensors, you can instigate runs across jobs and code locations and keep downstream assets up-to-date with ease.

Refer to the Asset Sensor documentation to learn more.


Selecting a method#

Before you dive into automating your pipelines, you should think about:

  • Is my pipeline made up of assets, ops, graphs, or some of everything?
  • How often does the data need to be refreshed?
  • Is the data partitioned, and do old records require updates?
  • Should updates occur in batches? Or should updates start when specific events occur?

The following cheatsheet contains high-level details about each of the automation methods we covered, along with when to use each one.

MethodHow it worksMay be a good fit if...Works with
SchedulesStarts a job at a specified time
  • You're using jobs, and
  • You want to run the job at a specific time
  • Assets
  • Ops
  • Graphs
SensorsStarts a job or materializes a selection of assets when a specific event occursYou want to trigger runs based off an event
  • Assets
  • Ops
  • Graphs
Declarative AutomationAutomatically materializes an asset when specified criteria (ex: upstream changes) are met
  • You're not using jobs,
  • You want a declarative approach, and
  • You're comfortable with experimental APIs
Assets only
Asset SensorsStarts a job when a materialization occurs for a specific asset or selection of assets
  • You're using jobs,
  • You want to trigger a job in response to asset materialization(s), and
  • You're comfortable with experimental APIs
Assets only