great_expectations.expectations.metrics.multicolumn_map_metrics

Package Contents

Classes

CompoundColumnsUnique()

While the support for “PandasExecutionEngine” and “SparkDFExecutionEngine” is accomplished using a compact

MulticolumnSumEqual()

Base class for all metric providers.

SelectColumnValuesUniqueWithinRecord()

Base class for all metric providers.

class great_expectations.expectations.metrics.multicolumn_map_metrics.CompoundColumnsUnique

Bases: great_expectations.expectations.metrics.map_metric_provider.MulticolumnMapMetricProvider

While the support for “PandasExecutionEngine” and “SparkDFExecutionEngine” is accomplished using a compact implementation, which combines the “map” and “condition” parts in a single step, the support for “SqlAlchemyExecutionEngine” is more detailed. Thus, the “map” and “condition” parts for “SqlAlchemyExecutionEngine” are handled separately, with the “condition” part relying on the “map” part as a metric dependency.

function_metric_name = compound_columns.count
condition_metric_name = compound_columns.unique
condition_domain_keys = ['batch_id', 'table', 'column_list', 'row_condition', 'condition_parser', 'ignore_row_if']
_pandas(cls, column_list, **kwargs)
_sqlalchemy_function(self, column_list, **kwargs)

Computes the “map” between the specified “column_list” (treated as a group so as to model the “compound” aspect) and the number of occurrences of every permutation of the values of “column_list” as the grouped subset of all rows of the table. In the present context, the term “compound” refers to having to treat the specified columns as unique together (e.g., as a multi-column primary key). For example, suppose that in the example below, all three columns (“A”, “B”, and “C”) of the table are included as part of the “compound” columns list (i.e., column_list = [“A”, “B”, “C”]):

A B C _num_rows 1 1 2 2 1 2 3 1 1 1 2 2 2 2 2 1 3 2 3 1

The fourth column, “_num_rows”, holds the value of the “map” function – the number of rows the group occurs in.

_sqlalchemy_condition(cls, column_list, **kwargs)

Retrieve the specified “map” metric dependency value as the “FromClause” “compound_columns_count_query” object and extract from it – using the supported SQLAlchemy column access method – the “_num_rows” columns. The uniqueness of “compound” columns (as a group) is expressed by the “BinaryExpression” “row_wise_cond” returned.

Importantly, since the “compound_columns_count_query” is the “FromClause” object that incorporates all columns of the original table, no additional “FromClause” objects (“select_from”) must augment this “condition” metric. Other than boolean operations, column access, argument of filtering, and limiting the size of the result set, this “row_wise_cond”, serving as the main component of the unexpected condition logic, carries along with it the entire object hierarchy, making any encapsulating query ready for execution against the database engine.

_spark(cls, column_list, **kwargs)
classmethod _get_evaluation_dependencies(cls, metric: MetricConfiguration, configuration: Optional[ExpectationConfiguration] = None, execution_engine: Optional[ExecutionEngine] = None, runtime_configuration: Optional[dict] = None)

Returns a dictionary of given metric names and their corresponding configuration, specifying the metric types and their respective domains.

class great_expectations.expectations.metrics.multicolumn_map_metrics.MulticolumnSumEqual

Bases: great_expectations.expectations.metrics.map_metric_provider.MulticolumnMapMetricProvider

Base class for all metric providers.

MetricProvider classes must have the following attributes set:
  1. metric_name: the name to use. Metric Name must be globally unique in a great_expectations installation.

  1. domain_keys: a tuple of the keys used to determine the domain of the metric

  2. value_keys: a tuple of the keys used to determine the value of the metric.

In some cases, subclasses of Expectation, such as TableMetricProvider will already have correct values that may simply be inherited.

They may optionally override the default_kwarg_values attribute.

MetricProvider classes must implement the following:

1. _get_evaluation_dependencies. Note that often, _get_evaluation_dependencies should augment dependencies provided by a parent class; consider calling super()._get_evaluation_dependencies

In some cases, subclasses of Expectation, such as MapMetricProvider will already have correct implementations that may simply be inherited.

Additionally, they may provide implementations of:

1. Data Docs rendering methods decorated with the @renderer decorator. See the guide “How to create renderers for custom expectations” for more information.

condition_metric_name = multicolumn_sum.equal
condition_domain_keys = ['batch_id', 'table', 'column_list', 'row_condition', 'condition_parser', 'ignore_row_if']
condition_value_keys = ['sum_total']
_pandas(cls, column_list, **kwargs)
_sqlalchemy(cls, column_list, **kwargs)
_spark(cls, column_list, **kwargs)
class great_expectations.expectations.metrics.multicolumn_map_metrics.SelectColumnValuesUniqueWithinRecord

Bases: great_expectations.expectations.metrics.map_metric_provider.MulticolumnMapMetricProvider

Base class for all metric providers.

MetricProvider classes must have the following attributes set:
  1. metric_name: the name to use. Metric Name must be globally unique in a great_expectations installation.

  1. domain_keys: a tuple of the keys used to determine the domain of the metric

  2. value_keys: a tuple of the keys used to determine the value of the metric.

In some cases, subclasses of Expectation, such as TableMetricProvider will already have correct values that may simply be inherited.

They may optionally override the default_kwarg_values attribute.

MetricProvider classes must implement the following:

1. _get_evaluation_dependencies. Note that often, _get_evaluation_dependencies should augment dependencies provided by a parent class; consider calling super()._get_evaluation_dependencies

In some cases, subclasses of Expectation, such as MapMetricProvider will already have correct implementations that may simply be inherited.

Additionally, they may provide implementations of:

1. Data Docs rendering methods decorated with the @renderer decorator. See the guide “How to create renderers for custom expectations” for more information.

condition_metric_name = select_column_values.unique.within_record
condition_domain_keys = ['batch_id', 'table', 'column_list', 'row_condition', 'condition_parser', 'ignore_row_if']
_pandas(cls, column_list, **kwargs)
_sqlalchemy(cls, column_list, **kwargs)

The present approach relies on an inefficient query condition construction implementation, whose computational cost is O(num_columns^2). However, until a more efficient implementation compatible with SQLAlchemy is available, this is the only feasible mechanism under the current architecture, where map metric providers must return a condition. Nevertheless, SQL query length limit is 1GB (sufficient for most practical scenarios).

_spark(cls, column_list, **kwargs)