Deploying Great Expectations with Astronomer

This guide will help you deploy Great Expectations within an Airflow pipeline that runs in Astronomer. You can use Great Expectations to automate validation of data integrity and navigate your DAG based on the output of validations.

Note that you can also define your Data Context in code by following this documentation: How to instantiate a Data Context without a yml file

Prerequisites: This workflow pattern assumes you have already:

Using the Great Expectations Airflow Operator in an Astronomer Deployment


There are only few additional requirements to deploy a DAG using the Great Expectations operator with Astronomer. Most importantly, you will need to set relevant environment variables.

Step 1: Set the DataContext root directory

Great Expectations needs to know where to find the Data Context by passing the Data Context root directory, which you can then access in the DAG. We recommend adding this variable to your Dockerfile, but you can use any of the methods described in the Astronomer documentation.

For example, if you decide to keep great_expectations directory in the include directory in your Astronomer project, the environment variable will look like this:

ENV GE_DATA_CONTEXT_ROOT_DIR=/usr/local/airflow/include/great_expectations

You can then pass the root directory as an argument into the task:

import os
ge_root_dir = os.environ['GE_DATA_CONTEXT_ROOT_DIR']

my_ge_task = GreatExpectationsOperator(
    task_id='my_task,
    expectation_suite_name='my_suite',
    batch_kwargs={
        'table': 'my_table',
        'datasource': 'my_datasource'
    },
    data_context_root_dir=ge_root_dir,
    dag=dag
)

Step 2: Set the environment variables for credentials

You will need to configure environment variables for any credentials required for external data connections, see How to use a YAML file or environment variables to populate credentials for an explanation of how to use environment variables in your great_expectations.yml.

Then add the environment variables to your local .env file in your Astronomer deploy and as secret environment variables in the Astronomer Cloud settings, following the instructions in the Astronomer documentation.