Test an Expectation
Data can be validated against individual Expectations. This workflow is generally used when engaging in exploration of new data, or when building out a set of Expectations to comprehensively describe the state that your data should conform to.
Prerequisites
- Python version 3.9 to 3.12.
- An installation of GX Core.
- A preconfigured Data Context.
- A Batch of sample data. This guide assumes the variable
batch
contains your sample data. - An Expectation. This guide assumes the variable
expectation
contains the Expectation to be tested.
Procedure
- Instructions
- Sample code
-
Run the Expectation on the Batch of data.
In this example, the Expectation to test was defined with preset parameters and is already stored in the variable
expectation
. The variablebatch
contains a Batch that was retrieved from a.csv
file using the pandas default Data Source:Pythonvalidation_results = batch.validate(expectation)
In this example, the Expectation to test was defined to take Expectation Parameters at runtime:
Pythonruntime_expectation_parameters = {
"expect_passenger_max_to_be_above": 4,
"expect_passenger_max_to_be_below": 6,
}
validation_results = batch.validate(
expectation, expectation_parameters=runtime_expectation_parameters
) -
Evaluate the returned Validation Results.
Pythonprint(validation_results)
When you print your Validation Results they will be presented in a dictionary format. There are a few key/value pairs in particular that are important for evaluating your Validation Results. These are:
expectation_config
: Provides a dictionary that describes the Expectation that was run and what its parameters are.success
: The value of this key indicates if the data that was validated met the criteria described in the Expectation.result
: Contains a dictionary with additional information that shows why the Expectation succeded or failed.
In the following example you can see the Validation Results for an Expectation that failed because the
observed_value
reported in theresult
dictionary is outside of themin_value
andmax_value
range described in theexpectation_config
:Python output{
"success": false,
"expectation_config": {
"expectation_type": "expect_column_max_to_be_between",
"kwargs": {
"batch_id": "2018-06_taxi",
"column": "passenger_count",
"min_value": 4.0,
"max_value": 5.0
},
"meta": {},
"id": "38368501-4599-433a-8c6a-28f5088a4d4a"
},
"result": {
"observed_value": 6
},
"meta": {},
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
}
} -
Optional. Adjust the Expectation's parameters and retest.
If the Expectation did not return the results you anticipated you can update it to reflect the actual state of your data, rather than recreating it from scratch. An Expectation object with preset parameters stores the parameters that were provided to initialize it as attributes. To modify the Expectation you overwrite those attributes.
For example, if your Expectation has took the parameters
min_value
andmax_value
, you could update them with:Python inputexpectation.min_value = 1
expectation.max_value = 6Once you have set the new values for the Expectation's parameters you can reuse the Batch Definition from earlier and repeat this procedure to test your changes.
Python inputnew_validation_results = batch.validate(expectation)
print(new_validation_results)This time, the updated Expectation accurately describes the data and the validation succeds:
Python output{
"success": true,
"expectation_config": {
"expectation_type": "expect_column_max_to_be_between",
"kwargs": {
"batch_id": "2018-06_taxi",
"column": "passenger_count",
"min_value": 1.0,
"max_value": 6.0
},
"meta": {},
"id": "38368501-4599-433a-8c6a-28f5088a4d4a"
},
"result": {
"observed_value": 6
},
"meta": {},
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
}
}When an Expectation uses an Expectation Parameter dictionary you don't have to modify anything on the Expectation object. Instead, update the dictionary with new values and then test it with the updated dictionary:
Python inputruntime_expectation_parameters = {
"expect_passenger_max_to_be_above": 1,
"expect_passenger_max_to_be_below": 6,
}
validation_results = batch.validate(
expectation, expectation_parameters=runtime_expectation_parameters
)
print(validation_results)For more information about Validation Results, what they contain, and how to adjust their verbosity see Choose result format.
import great_expectations as gx
context = gx.get_context()
# Use the `pandas_default` Data Source to retrieve a Batch of sample Data from a data file:
file_path = "./data/folder_with_data/yellow_tripdata_sample_2019-01.csv"
batch = context.data_sources.pandas_default.read_csv(file_path)
# Define the Expectation to test:
expectation = gx.expectations.ExpectColumnMaxToBeBetween(
column="passenger_count", min_value=1, max_value=6
)
# Test the Expectation:
validation_results = batch.validate(expectation)
# Evaluate the Validation Results:
print(validation_results)
# If needed, adjust the Expectation's preset parameters and test again:
expectation.min_value = 1
expectation.max_value = 6
# Test the modified expectation and review the new Validation Results:
new_validation_results = batch.validate(expectation)
print(new_validation_results)