How to create Expectations that span multiple Batches using Evaluation Parameters¶
This guide will help you create Expectations that span multiple Batches of data using Evaluation Parameters. This pattern is useful for things like verifying that row counts between tables stay consistent.
Show Docs for Stable API (up to 0.12.x)
Prerequisites: This how-to guide assumes you have already:
Configured a Datasource (or Datasources) with at least two Data Assets.
Also created Expectation Suites for those Data Assets.
Have a working Evaluation Parameter Store. (The default in-memory store from
great_expectations init
can work for this.)Have a working Validation Operator. (The default Validation Operator from
great_expectations init
can work for this.)
In a notebook,
Import great_expectations and instantiate your Data Context
import great_expectations as ge context = ge.DataContext()
Instantiate two Batches
We’ll call one of these Batches the upstream Batch and the other the downstream Batch. Evaluation Parameters will allow us to use Validation Results from the upstream Batch as parameters passed into Expectations on the downstream.
It’s common (but not required) for both Batches to come from the same Datasource and BatchKwargsGenerator.
batch_kwargs_1 = context.build_batch_kwargs("my_datasource", "my_generator_name", "my_data_asset_name_1"), upstream_batch = context.get_batch( batch_kwargs_1, expectation_suite_name='my_expectation_suite_1' ) batch_kwargs_2 = context.build_batch_kwargs("my_datasource", "my_generator_name", "my_data_asset_name_2"), downstream_batch = context.get_batch( batch_kwargs_2, expectation_suite_name='my_expectation_suite_2' )
Disable interactive evaluation for the downstream Batch.
downstream_batch.set_config_value("interactive_evaluation", False)
Disabling interactive evaluation allows you to declare an Expectation even when it cannot be evaluated immediately.
Define an Expectation using an Evaluation Parameter on the downstream Batch.
eval_param_urn = 'urn:great_expectations:validations:my_expectation_suite_1:expect_table_row_count_to_be_between.result.observed_value' downstream_batch.expect_table_row_count_to_equal( value={ '$PARAMETER': eval_param_urn, # this is the actual parameter we're going to use in the validation } )
The core of this is a
$PARAMETER : URN
pair. When Great Expectations encounters a$PARAMETER
flag during validation, it will replace theURN
with a value retrieved from an Evaluation Parameter Store or Metrics Store.This declaration above includes two
$PARAMETERS
. The first is the real parameter that will be used after the Expectation Suite is stored and deployed in a Validation Operator. The second parameter supports immediate evaluation in the notebook.When executed in the notebook, this Expectation will generate an Expectation Validation Result. Most values will be missing, since interactive evaluation was disabled.
{ "meta": {}, "success": null, "result": {}, "exception_info": null }
Warning
Your URN must be exactly correct in order to work in production. Unfortunately, successful execution at this stage does not guarantee that the URN is specified correctly and that the intended parameters will be available when executed later.
Save your Expectation Suite
downstream_batch.save_expectation_suite(discard_failed_expectations=False)
This step is necessary because your
$PARAMETER
will only function properly when invoked within a Validation operation with multiple Batches. The simplest way to execute such an operation is through a Validation Operator, and Validation Operators are configured to load Expectation Suites from Expectation Stores, not memory.Execute an existing Validation Operator on your upstream and downstream batches.
You can do this within your notebook by running
context.run_validation_operator
. You can use the samebatch_kwargs
from the top of your notebook—they’ll be used to fetch the same data.results = context.run_validation_operator( "action_list_operator", assets_to_validate=[ (batch_kwargs_1, "my_expectation_suite_1"), (batch_kwargs_2, "my_expectation_suite_2"), ] )
Rebuild Data Docs and review results in docs.
You can do this within your notebook by running:
context.build_data_docs()
You can also execute from the command line with:
great_expectations docs build
Once your Docs rebuild, open them in a browser and navigate to the page for the new Validation Result.
If your Evaluation Parameter was executed successfully, you’ll see something like this:
If it encountered an error, you’ll see something like this. The most common problem is a mis-specified URN name.
Warning
In general, the development loop for testing and debugging URN and Evaluation Parameters is not very user-friendly. We plan to simplify this workflow in the future. In the meantime, we welcome questions in the Great Expectations discussion forum and Slack channel.
Show Docs for Experimental API (0.13)
Prerequisites: This how-to guide assumes you have already:
Configured a Datasource (or Datasources) with at least two Data Assets and understand the basics of batch requests
Also created Expectation Suites for those Data Assets.
Have a working Evaluation Parameter Store. (The default in-memory store from
great_expectations init
can work for this.)Have a working Validation Operator. (The default Validation Operator from
great_expectations init
can work for this.)
In a notebook,
Import great_expectations and instantiate your Data Context
import great_expectations as ge context = ge.DataContext()
Instantiate two Validators, one for each Data Asset
We’ll call one of these Validators the upstream Validator and the other the downstream Validator. Evaluation Parameters will allow us to use Validation Results from the upstream Validator as parameters passed into Expectations on the downstream.
It’s common (but not required) for both Batch Requests to have the same Datasource and Data Connector.
batch_request_1 = BatchRequest( datasource_name="my_datasource", data_connector_name="my_data_connector", data_asset_name="my_data_asset_1" ) upstream_validator = context.get_validator(batch_request=batch_request_1, expectation_suite="my_expectation_suite_1") batch_request_2 = BatchRequest( datasource_name="my_datasource", data_connector_name="my_data_connector", data_asset_name="my_data_asset_2" ) downstream_validator = context.get_validator(batch_request=batch_request_2, expectation_suite="my_expectation_suite_2")
Disable interactive evaluation for the downstream Validator.
downstream_validator.interactive_evaluation = False
Disabling interactive evaluation allows you to declare an Expectation even when it cannot be evaluated immediately.
Define an Expectation using an Evaluation Parameter on the downstream Validator.
eval_param_urn = 'urn:great_expectations:validations:my_expectation_suite_1:expect_table_row_count_to_be_between.result.observed_value' downstream_validator.expect_table_row_count_to_equal( value={ '$PARAMETER': eval_param_urn, # this is the actual parameter we're going to use in the validation } )
The core of this is a
$PARAMETER : URN
pair. When Great Expectations encounters a$PARAMETER
flag during validation, it will replace theURN
with a value retrieved from an Evaluation Parameter Store or Metrics Store.This declaration above includes two
$PARAMETERS
. The first is the real parameter that will be used after the Expectation Suite is stored and deployed in a Validation Operator. The second parameter supports immediate evaluation in the notebook.When executed in the notebook, this Expectation will generate an Expectation Validation Result. Most values will be missing, since interactive evaluation was disabled.
{ "result": {}, "success": null, "meta": {}, "exception_info": { "raised_exception": false, "exception_traceback": null, "exception_message": null } }
Warning
Your URN must be exactly correct in order to work in production. Unfortunately, successful execution at this stage does not guarantee that the URN is specified correctly and that the intended parameters will be available when executed later.
Save your Expectation Suite
downstream_validator.save_expectation_suite(discard_failed_expectations=False)
This step is necessary because your
$PARAMETER
will only function properly when invoked within a Validation operation with multiple Validators. The simplest way to execute such an operation is through a Validation Operator, and Validation Operators are configured to load Expectation Suites from Expectation Stores, not memory.Execute an existing Validation Operator on your upstream and downstream Validators.
You can do this within your notebook by running
context.run_validation_operator
.results = context.run_validation_operator( "action_list_operator", assets_to_validate=[ upstream_validator, downstream_validator ] )
Rebuild Data Docs and review results in docs.
You can do this within your notebook by running:
context.build_data_docs()
You can also execute from the command line with:
great_expectations docs build
Once your Docs rebuild, open them in a browser and navigate to the page for the new Validation Result.
If your Evaluation Parameter was executed successfully, you’ll see something like this:
If it encountered an error, you’ll see something like this. The most common problem is a mis-specified URN name.
Warning
In general, the development loop for testing and debugging URN and Evaluation Parameters is not very user-friendly. We plan to simplify this workflow in the future. In the meantime, we welcome questions in the Great Expectations discussion forum and Slack channel.