Module Contents



Base class for all metric providers.

class great_expectations.expectations.metrics.multicolumn_map_metrics.select_column_values_unique_within_record.SelectColumnValuesUniqueWithinRecord

Bases: great_expectations.expectations.metrics.map_metric_provider.MulticolumnMapMetricProvider

Base class for all metric providers.

MetricProvider classes must have the following attributes set:
  1. metric_name: the name to use. Metric Name must be globally unique in a great_expectations installation.

  1. domain_keys: a tuple of the keys used to determine the domain of the metric

  2. value_keys: a tuple of the keys used to determine the value of the metric.

In some cases, subclasses of Expectation, such as TableMetricProvider will already have correct values that may simply be inherited.

They may optionally override the default_kwarg_values attribute.

MetricProvider classes must implement the following:

1. _get_evaluation_dependencies. Note that often, _get_evaluation_dependencies should augment dependencies provided by a parent class; consider calling super()._get_evaluation_dependencies

In some cases, subclasses of Expectation, such as MapMetricProvider will already have correct implementations that may simply be inherited.

Additionally, they may provide implementations of:

1. Data Docs rendering methods decorated with the @renderer decorator. See the guide “How to create renderers for custom expectations” for more information.

condition_metric_name = select_column_values.unique.within_record
condition_domain_keys = ['batch_id', 'table', 'column_list', 'row_condition', 'condition_parser', 'ignore_row_if']
_pandas(cls, column_list, **kwargs)
_sqlalchemy(cls, column_list, **kwargs)

The present approach relies on an inefficient query condition construction implementation, whose computational cost is O(num_columns^2). However, until a more efficient implementation compatible with SQLAlchemy is available, this is the only feasible mechanism under the current architecture, where map metric providers must return a condition. Nevertheless, SQL query length limit is 1GB (sufficient for most practical scenarios).

_spark(cls, column_list, **kwargs)