Create Custom Parameterized Expectations
This guide will walk you through the process of creating Parameterized ExpectationsA verifiable assertion about data. - very quickly. This method is only available using the new Modular Expectations API in 0.13.
Prerequisites
A Parameterized ExpectationA verifiable assertion about data. is a capability unlocked by Modular Expectations. Now that Expectations are structured in class form, it is easy to inherit from these classes and build similar Expectations that are adapted to your own needs.
Select an Expectation to inherit from
For the purpose of this exercise, we will implement the Expectations expect_column_mean_to_be_positive
and expect_column_values_to_be_two_letter_country_code
- realistic Expectations
of the data that can easily inherit from expect_column_mean_to_be_between
and expect_column_values_to_be_in_set
respectively.
Select default values for your class
Our first implementation will be expect_column_mean_to_be_positive
.
As can be seen in the implementation below, we have chosen to keep our default minimum value at 0, given that we are validating that all our values are positive. Setting the upper bound to None
means that no upper bound will be checked – effectively setting the threshold at ∞ and allowing any positive value.
Notice that we do not need to set default_kwarg_values
for all kwargs: it is sufficient to set them only for ones for which we would like to set a default value. To keep our implementation simple, we do not override the metric_dependencies
or success_keys
.
class ExpectColumnMeanToBePositive(gxe.ExpectColumnMeanToBeBetween):
"""Expect the mean of values in this column to be positive."""
min_value: Union[float, EvaluationParameterDict, datetime, None] = 0
strict_min = True
We could also explicitly override our parent methods to modify the behavior of our new Expectation, for example by updating the configuration validation to require the values we set as defaults not be altered.
def validate_configuration(self, configuration):
super().validate_configuration(configuration)
assert "min_value" not in configuration.kwargs, "min_value cannot be altered"
assert "max_value" not in configuration.kwargs, "max_value cannot be altered"
assert "strict_min" not in configuration.kwargs, "strict_min cannot be altered"
assert "strict_max" not in configuration.kwargs, "strict_max cannot be altered"
For another example, let's take a look at expect_column_values_to_be_in_set
.
In this case, we will only be changing our value_set
:
class ExpectColumnValuesToBeTwoLetterCountryCode(gxe.ExpectColumnValuesToBeInSet):
value_set: List[str] = ["FR", "DE", "CH", "ES", "IT", "BE", "NL", "PL"]
Contribute (Optional)
If you plan to contribute your Expectation to the public open source project, you should include a library_metadata
object. For example:
library_metadata = {"tags": ["basic stats"], "contributors": ["@joegargery"]}
This is particularly important because we want to make sure that you get credit for all your hard work!
Additionally, you will need to implement some basic examples and test cases before your contribution can be accepted. For guidance on examples and testing, see our guide on implementing examples and test cases.
For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.