great_expectations.expectations.core.expect_column_kl_divergence_to_be_less_than

Module Contents

Classes

ExpectColumnKlDivergenceToBeLessThan(configuration: Optional[ExpectationConfiguration] = None)

Expect the Kulback-Leibler (KL) divergence (relative entropy) of the specified column with respect to the partition object to be lower than the provided threshold.

great_expectations.expectations.core.expect_column_kl_divergence_to_be_less_than.logger
class great_expectations.expectations.core.expect_column_kl_divergence_to_be_less_than.ExpectColumnKlDivergenceToBeLessThan(configuration: Optional[ExpectationConfiguration] = None)

Bases: great_expectations.expectations.expectation.ColumnExpectation

Expect the Kulback-Leibler (KL) divergence (relative entropy) of the specified column with respect to the partition object to be lower than the provided threshold.

KL divergence compares two distributions. The higher the divergence value (relative entropy), the larger the difference between the two distributions. A relative entropy of zero indicates that the data are distributed identically, when binned according to the provided partition.

In many practical contexts, choosing a value between 0.5 and 1 will provide a useful test.

This expectation works on both categorical and continuous partitions. See notes below for details.

expect_column_kl_divergence_to_be_less_than is a [Column Aggregate Expectation](https://docs.greatexpectations.io/docs/guides/expectations/creating_custom_expectations/how_to_create_custom_column_aggregate_expectations).

Parameters
Keyword Arguments
  • internal_weight_holdout (float between 0 and 1 or None) – The amount of weight to split uniformly among zero-weighted partition bins. internal_weight_holdout provides a mechanisms to make the test less strict by assigning positive weights to values observed in the data for which the partition explicitly expected zero weight. With no internal_weight_holdout, any value observed in such a region will cause KL divergence to rise to +Infinity. Defaults to 0.

  • tail_weight_holdout (float between 0 and 1 or None) – The amount of weight to add to the tails of the histogram. Tail weight holdout is split evenly between (-Infinity, min(partition_object[‘bins’])) and (max(partition_object[‘bins’]), +Infinity). tail_weight_holdout provides a mechanism to make the test less strict by assigning positive weights to values observed in the data that are not present in the partition. With no tail_weight_holdout, any value observed outside the provided partition_object will cause KL divergence to rise to +Infinity. Defaults to 0.

  • bucketize_data (boolean) – If True, then continuous data will be bucketized before evaluation. Setting this parameter to false allows evaluation of KL divergence with a None partition object for profiling against discrete data.

Other Parameters
Returns

//docs.greatexpectations.io/docs/terms/validation_result)

Exact fields vary depending on the values passed to result_format, include_config, catch_exceptions, and meta.

Return type

An [ExpectationSuiteValidationResult](https

Notes

  • observed_value field in the result object is customized for this expectation to be a float representing the true KL divergence (relative entropy) or None if the value is calculated as infinity, -infinity, or NaN

  • details.observed_partition in the result object is customized for this expectation to be a dict representing the partition observed in the data

  • details.expected_partition in the result object is customized for this expectation to be a dict representing the partition against which the data were compared, after applying specified weight holdouts

If the partition_object is categorical, this expectation will expect the values in column to also be categorical.

  • If the column includes values that are not present in the partition, the tail_weight_holdout will be equally split among those values, providing a mechanism to weaken the strictness of the expectation (otherwise, relative entropy would immediately go to infinity).

  • If the partition includes values that are not present in the column, the test will simply include zero weight for that value.

If the partition_object is continuous, this expectation will discretize the values in the column according to the bins specified in the partition_object, and apply the test to the resulting distribution.

  • The internal_weight_holdout and tail_weight_holdout parameters provide a mechanism to weaken the expectation, since an expected weight of zero would drive relative entropy to be infinite if any data are observed in that interval.

  • If internal_weight_holdout is specified, that value will be distributed equally among any intervals with weight zero in the partition_object.

  • If tail_weight_holdout is specified, that value will be appended to the tails of the bins ((-Infinity, min(bins)) and (max(bins), Infinity).

If relative entropy/kl divergence goes to infinity for any of the reasons mentioned above, the observed value will be set to None. This is because inf, -inf, Nan, are not json serializable and cause some json parsers to crash when encountered. The python None token will be serialized to null in json.

library_metadata
success_keys = ['partition_object', 'threshold', 'tail_weight_holdout', 'internal_weight_holdout', 'bucketize_data']
default_kwarg_values
args_keys = ['column', 'partition_object', 'threshold']
validate_configuration(self, configuration: Optional[ExpectationConfiguration])

Validates that a configuration has been set, and sets a configuration if it has yet to be set. Ensures that necessary configuration arguments have been provided for the validation of the expectation.

Parameters

configuration (OPTIONAL[ExpectationConfiguration]) – An optional Expectation Configuration entry that will be used to configure the expectation

Returns

None. Raises InvalidExpectationConfigurationError if the config is not validated successfully

get_validation_dependencies(self, configuration: Optional[ExpectationConfiguration] = None, execution_engine: Optional[ExecutionEngine] = None, runtime_configuration: Optional[dict] = None)

Returns the result format and metrics required to validate this Expectation using the provided result format.

_validate(self, configuration: ExpectationConfiguration, metrics: Dict, runtime_configuration: Optional[dict] = None, execution_engine: Optional[ExecutionEngine] = None)
classmethod _get_kl_divergence_chart(cls, partition_object, header=None)
classmethod _atomic_kl_divergence_chart_template(cls, partition_object: dict)
classmethod _get_kl_divergence_partition_object_table(cls, partition_object, header=None)
classmethod _atomic_partition_object_table_template(cls, partition_object: dict)
classmethod _atomic_prescriptive_template(cls, configuration: Optional[ExpectationConfiguration] = None, result: Optional[ExpectationValidationResult] = None, runtime_configuration: Optional[dict] = None, **kwargs)

Template function that contains the logic that is shared by AtomicPrescriptiveRendererType.SUMMARY and LegacyRendererType.PRESCRIPTIVE.

classmethod _prescriptive_summary(cls, configuration: Optional[ExpectationConfiguration] = None, result: Optional[ExpectationValidationResult] = None, runtime_configuration: Optional[dict] = None, **kwargs)
classmethod _prescriptive_renderer(cls, configuration: Optional[ExpectationConfiguration] = None, result: Optional[ExpectationValidationResult] = None, runtime_configuration: Optional[dict] = None, **kwargs)
classmethod _atomic_diagnostic_observed_value_template(cls, configuration: Optional[ExpectationConfiguration] = None, result: Optional[ExpectationValidationResult] = None, runtime_configuration: Optional[dict] = None, **kwargs)
classmethod _atomic_diagnostic_observed_value(cls, configuration: Optional[ExpectationConfiguration] = None, result: Optional[ExpectationValidationResult] = None, runtime_configuration: Optional[dict] = None, **kwargs)
classmethod _diagnostic_observed_value_renderer(cls, configuration: Optional[ExpectationConfiguration] = None, result: Optional[ExpectationValidationResult] = None, runtime_configuration: Optional[dict] = None, **kwargs)
classmethod _descriptive_histogram_renderer(cls, configuration: Optional[ExpectationConfiguration] = None, result: Optional[ExpectationValidationResult] = None, runtime_configuration: Optional[dict] = None, **kwargs)