Customize an Expectation Class
Existing Expectation Classes can be customized to include additional information such as buisness logic, more descriptive naming conventions, and specialized rendering for Data Docs. This is done by subclassing an existing Expectation class and populating the subclass with default values and customized attributes.
Advantages of subclassing an Expectation and providing customized attributes rather than creating an instance of the parent Expectation and passing in parameters include:
- All instances of the Expectation that use the default values will be updated if changes are made to the class definition.
- More descriptive Expectation names can be provided that indicate the buisness logic behind the Expectation.
- Customized text can be provided to describe the Expectation when Data Docs are generated from Validation Results.
Prerequisites
- Python version 3.9 to 3.12.
- An installation of GX Core.
- A preconfigured Data Context.
- Recommended. A preconfigured Data Source and Data Asset connected to your data for testing your customized Expectation.
Procedure
- Instructions
- Sample code
-
Choose a base Expectation class.
You can customize any of the core Expectation classes in GX. You can view the available Expectations and their functionality in the Expectation Gallery.
In this example,
ExpectColumnValuesToBeBetween
will be customized. -
Create a new Expectation class that inherits the base Expectation class.
The core Expectations in GX have names describing their functionality. When you create a customized Expectation class you can provide a class name that is more indicative of your specific use case:
Pythonclass ExpectValidPassengerCount(gx.expectations.ExpectColumnValuesToBeBetween):
-
Override the Expectation's attributes with new default values.
The attributes that can be overriden correspond to the parameters required by the base Expectation. These can be referenced from the Expectation Gallery.
In this example, the default column for
ExpectValidPassengerCount
is set topassenger_count
and the default value range for the column is defined as between1
and6
:Pythonclass ExpectValidPassengerCount(gx.expectations.ExpectColumnValuesToBeBetween):
column: str = "passenger_count"
min_value: int = 1
max_value: int = 6 -
Customize the rendering of the new Expectation when displayed in Data Docs.
The
description
attribute of a customized Expectation class contains the text describing the customized Expectation when its results are rendered into Data Docs. You can format thedescription
string with Markdown syntax:Pythonclass ExpectValidPassengerCount(gx.expectations.ExpectColumnValuesToBeBetween):
column: str = "passenger_count"
min_value: int = 1
max_value: int = 6
description: str = "There should be between **1** and **6** passengers." -
Use the customized subclass as an Expectation.
It is best not to overwrite the predefined default values by passing in parameters when a customized Expectation is created. This ensures that the
description
remains accurate to the values that the customized Expectation uses. It also allows you to update all instances of the customized Expectation by editing the default values in the customized Expectation's class definition rather than having to update each instance individually in their Expectation Suites:Pythonexpectation = ExpectValidPassengerCount() # Uses the predefined default values
A customized Expectation instance can be added to Expectation Suites and validated just like any other Expectation.
import great_expectations as gx
context = gx.get_context()
class ExpectValidPassengerCount(gx.expectations.ExpectColumnValuesToBeBetween):
column: str = "passenger_count"
min_value: int = 1
max_value: int = 6
description: str = "There should be between **1** and **6** passengers."
# Create an instance of the custom Expectation
expectation = ExpectValidPassengerCount() # Uses the predefined default values
# Optional. Test the Expectation with some sample data
data_source_name = "my_data_source"
asset_name = "my_data_asset"
batch_definition_name = "my_batch_definition"
batch = (
context.data_sources.get(data_source_name)
.get_asset(asset_name)
.get_batch_definition(batch_definition_name)
.get_batch()
)
print(batch.validate(expectation))