Ask AI

You are viewing an unreleased or outdated version of the documentation

Asset metadata#

Attaching metadata to assets can help make your pipelines easier for you and other team members to understand. Data about your data assets can be attached to both asset definitions and materializations.

By the end of this guide, you'll understand how to attach metadata to assets and view that metadata in the Dagster UI.


How it works#

There are two main types of metadata in Dagster:

  • Definition metadata is information that's fixed or doesn't frequently change. For example, definition metadata could be the storage location of a table, a link the asset's definition in GitHub, or who owns the asset.
  • Runtime, or materialization metadata is information that changes after a materialization occurs. This could be how many records were processed or how long an asset took to materialize.

How metadata is attached to an asset depends on the type of metadata being attached. Refer to the following sections for more details.


Attaching definition metadata#

Dagster supports attaching a few different types of definition metadata:

  • Arbitrary metadata, such as the storage location of the table produced by the asset
  • Asset owners, which are the people and/or teams who own the asset
  • Table and column metadata, which provides additional context about a tabular asset, such as its schema or row count
  • Code references, which link to the source code of the asset locally or in your source control repository

Arbitrary metadata using the metadata parameter#

Attaching arbitrary metadata to an asset definition is done using the metadata argument and a dictionary of key/value pairs. Keys must be a string, but values can:

  • Be any of the MetadataValue classes provided by Dagster
  • Primitive Python types, which Dagster will convert to the appropriate MetadataValue

For example, to attach the name of the table we expect to store the asset in, we'll add a "dataset_name" entry to the metadata argument:

from dagster_duckdb import DuckDBResource

from dagster import asset

# ... other assets


@asset(
    deps=[iris_dataset],
    metadata={"dataset_name": "iris.small_petals"},
)
def small_petals(duckdb: DuckDBResource) -> None:
    with duckdb.get_connection() as conn:
        conn.execute(
            "CREATE TABLE iris.small_petals AS SELECT * FROM iris.iris_dataset WHERE"
            " 'petal_length_cm' < 1 AND 'petal_width_cm' < 1"
        )

Dagster provides a standard set of metadata keys that can be used for common types of metadata, such as an asset's URI or column schema. Note: These entries are intended to be a starting point, and we encourage you to create your own metadata keys that make sense within the context of your data platform.

Asset owners#

Did you know? If using Dagster+ Pro, you can create asset-based alerts that will automatically notify an asset's owners when triggered. Refer to the Dagster+ alert documentation for more information.

An asset can have multiple owners, defined using the owners argument on the @asset decorator. This argument accepts a dictionary of owners, where each value is either an individual email address or a team. Teams must include a team: prefix; for example: team:data-eng.

The asset in the following example has two owners: richard.hendricks@hooli.com and the data-eng team.

from dagster import asset


@asset(owners=["richard.hendricks@hooli.com", "team:data-eng"])
def leads(): ...

Code references#

Attaching code references to an asset definition allows you to easily navigate to the asset's source code, either locally in your editor or in your source control repository. For more information, refer to the Code references guide.


Attaching materialization metadata#

Attaching materialization metadata to an asset is accomplished by returning a MaterializeResult object containing a metadata parameter. This parameter accepts a dictionary of key/value pairs, where keys must be a string.

When specifying values, use the MetadataValue utility class to wrap the data, ensuring it displays correctly in the UI. Values can also be primitive Python types, which Dagster will convert to the appropriate MetadataValue.

Arbitrary metadata#

In the following example, we added a row count and preview to a topstories asset:

import json
import requests
import pandas as pd
from dagster import AssetExecutionContext, MetadataValue, asset, MaterializeResult


@asset(deps=[topstory_ids])
def topstories(context: AssetExecutionContext) -> MaterializeResult:
    with open("data/topstory_ids.json", "r") as f:
        topstory_ids = json.load(f)

    results = []
    for item_id in topstory_ids:
        item = requests.get(
            f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
        ).json()
        results.append(item)

        if len(results) % 20 == 0:
            context.log.info(f"Got {len(results)} items so far.")

    df = pd.DataFrame(results)
    df.to_csv("data/topstories.csv")

    return MaterializeResult(
        metadata={
            "num_records": len(df),  # Metadata can be any key-value pair
            "preview": MetadataValue.md(df.head().to_markdown()),
            # The `MetadataValue` class has useful static methods to build Metadata
        }
    )

Dagster provides a standard set of metadata keys that can be used for common types of metadata, such as an asset's URI or column schema. Note: These entries are intended to be a starting point, and we encourage you to create your own metadata keys that make sense within the context of your data platform.

Table and column metadata#

For assets which produce database tables, you can attach table metadata to provide additional context about the asset. Table metadata can include information such as the schema, row count, or column lineage. Refer to the Table metadata documentation for more information, or the Column-level lineage documentation for specific details on column-level lineage.


Viewing asset metadata in the Dagster UI#

Metadata attached to assets shows up in a few places in the Dagster UI.

Global asset lineage#

In the Global asset lineage page, click on an asset to open the asset details in the sidepanel:

Asset details sidepanel showing metadata in the Global asset lineage page of the Dagster UI

If materialization metadata is numeric, it will display as a plot in the Metadata plots section of the sidepanel.


References#

APIs in this guide#

NameDescription
@assetA decorator used to define assets.
MaterializeResultAn object representing a successful materialization of an asset.
MetadataValueUtility class to wrap metadata values passed into Dagster events, which allows them to be displayed in the Dagster UI and other tooling.

Standard asset metadata entries#

The following is a set of standard asset metadata entries that can be included in the dictionaries passed to metadata attributes of @asset, MaterializeResult, etc. Many of these receive special treatment in Dagster's UI, such as dagster/column_schema resulting in a Columns section on the Overview tab of the Asset details page.

The dagster prefix indicates that the Dagster package takes responsibility for defining the meaning of these metadata entries.

KeyDetails
dagster/uri
  • Value: str
  • Description: The URI for the asset, e.g. "s3://my_bucket/my_object"
dagster/column_schema
dagster/column_lineage
dagster/row_count
dagster/partition_row_count
  • Value: int
  • Description: For a partition of an asset that's a table, the number of rows in the partition.
dagster/table_name
  • Value: str
  • Description: A unique identifier for the table/view, typically fully qualified. For example, my_database.my_schema.my_table
dagster/code_references