great_expectations.expectations.metrics.multicolumn_map_metrics
¶
Submodules¶
Package Contents¶
Classes¶
While the support for “PandasExecutionEngine” and “SparkDFExecutionEngine” is accomplished using a compact |
|
Base class for all metric providers. |
|
Base class for all metric providers. |
-
class
great_expectations.expectations.metrics.multicolumn_map_metrics.
CompoundColumnsUnique
¶ Bases:
great_expectations.expectations.metrics.map_metric_provider.MulticolumnMapMetricProvider
While the support for “PandasExecutionEngine” and “SparkDFExecutionEngine” is accomplished using a compact implementation, which combines the “map” and “condition” parts in a single step, the support for “SqlAlchemyExecutionEngine” is more detailed. Thus, the “map” and “condition” parts for “SqlAlchemyExecutionEngine” are handled separately, with the “condition” part relying on the “map” part as a metric dependency.
-
function_metric_name
= compound_columns.count¶
-
condition_metric_name
= compound_columns.unique¶
-
condition_domain_keys
= ['batch_id', 'table', 'column_list', 'row_condition', 'condition_parser', 'ignore_row_if']¶
-
_pandas
(cls, column_list, **kwargs)¶
-
_sqlalchemy_function
(self, column_list, **kwargs)¶ Computes the “map” between the specified “column_list” (treated as a group so as to model the “compound” aspect) and the number of occurrences of every permutation of the values of “column_list” as the grouped subset of all rows of the table. In the present context, the term “compound” refers to having to treat the specified columns as unique together (e.g., as a multi-column primary key). For example, suppose that in the example below, all three columns (“A”, “B”, and “C”) of the table are included as part of the “compound” columns list (i.e., column_list = [“A”, “B”, “C”]):
A B C _num_rows 1 1 2 2 1 2 3 1 1 1 2 2 2 2 2 1 3 2 3 1
The fourth column, “_num_rows”, holds the value of the “map” function – the number of rows the group occurs in.
-
_sqlalchemy_condition
(cls, column_list, **kwargs)¶ Retrieve the specified “map” metric dependency value as the “FromClause” “compound_columns_count_query” object and extract from it – using the supported SQLAlchemy column access method – the “_num_rows” columns. The uniqueness of “compound” columns (as a group) is expressed by the “BinaryExpression” “row_wise_cond” returned.
Importantly, since the “compound_columns_count_query” is the “FromClause” object that incorporates all columns of the original table, no additional “FromClause” objects (“select_from”) must augment this “condition” metric. Other than boolean operations, column access, argument of filtering, and limiting the size of the result set, this “row_wise_cond”, serving as the main component of the unexpected condition logic, carries along with it the entire object hierarchy, making any encapsulating query ready for execution against the database engine.
-
_spark
(cls, column_list, **kwargs)¶
-
classmethod
_get_evaluation_dependencies
(cls, metric: MetricConfiguration, configuration: Optional[ExpectationConfiguration] = None, execution_engine: Optional[ExecutionEngine] = None, runtime_configuration: Optional[dict] = None)¶ Returns a dictionary of given metric names and their corresponding configuration, specifying the metric types and their respective domains.
-
-
class
great_expectations.expectations.metrics.multicolumn_map_metrics.
MulticolumnSumEqual
¶ Bases:
great_expectations.expectations.metrics.map_metric_provider.MulticolumnMapMetricProvider
Base class for all metric providers.
- MetricProvider classes must have the following attributes set:
metric_name: the name to use. Metric Name must be globally unique in a great_expectations installation.
domain_keys: a tuple of the keys used to determine the domain of the metric
value_keys: a tuple of the keys used to determine the value of the metric.
In some cases, subclasses of Expectation, such as TableMetricProvider will already have correct values that may simply be inherited.
They may optionally override the default_kwarg_values attribute.
- MetricProvider classes must implement the following:
1. _get_evaluation_dependencies. Note that often, _get_evaluation_dependencies should augment dependencies provided by a parent class; consider calling super()._get_evaluation_dependencies
In some cases, subclasses of Expectation, such as MapMetricProvider will already have correct implementations that may simply be inherited.
- Additionally, they may provide implementations of:
1. Data Docs rendering methods decorated with the @renderer decorator. See the guide “How to create renderers for custom expectations” for more information.
-
condition_metric_name
= multicolumn_sum.equal¶
-
condition_domain_keys
= ['batch_id', 'table', 'column_list', 'row_condition', 'condition_parser', 'ignore_row_if']¶
-
condition_value_keys
= ['sum_total']¶
-
_pandas
(cls, column_list, **kwargs)¶
-
_sqlalchemy
(cls, column_list, **kwargs)¶
-
_spark
(cls, column_list, **kwargs)¶
-
class
great_expectations.expectations.metrics.multicolumn_map_metrics.
SelectColumnValuesUniqueWithinRecord
¶ Bases:
great_expectations.expectations.metrics.map_metric_provider.MulticolumnMapMetricProvider
Base class for all metric providers.
- MetricProvider classes must have the following attributes set:
metric_name: the name to use. Metric Name must be globally unique in a great_expectations installation.
domain_keys: a tuple of the keys used to determine the domain of the metric
value_keys: a tuple of the keys used to determine the value of the metric.
In some cases, subclasses of Expectation, such as TableMetricProvider will already have correct values that may simply be inherited.
They may optionally override the default_kwarg_values attribute.
- MetricProvider classes must implement the following:
1. _get_evaluation_dependencies. Note that often, _get_evaluation_dependencies should augment dependencies provided by a parent class; consider calling super()._get_evaluation_dependencies
In some cases, subclasses of Expectation, such as MapMetricProvider will already have correct implementations that may simply be inherited.
- Additionally, they may provide implementations of:
1. Data Docs rendering methods decorated with the @renderer decorator. See the guide “How to create renderers for custom expectations” for more information.
-
condition_metric_name
= select_column_values.unique.within_record¶
-
condition_domain_keys
= ['batch_id', 'table', 'column_list', 'row_condition', 'condition_parser', 'ignore_row_if']¶
-
_pandas
(cls, column_list, **kwargs)¶
-
_sqlalchemy
(cls, column_list, **kwargs)¶ The present approach relies on an inefficient query condition construction implementation, whose computational cost is O(num_columns^2). However, until a more efficient implementation compatible with SQLAlchemy is available, this is the only feasible mechanism under the current architecture, where map metric providers must return a condition. Nevertheless, SQL query length limit is 1GB (sufficient for most practical scenarios).
-
_spark
(cls, column_list, **kwargs)¶