great_expectations.rule_based_profiler.parameter_builder

Package Contents

Classes

ParameterBuilder(name: str, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

A ParameterBuilder implementation provides support for building Expectation Configuration Parameters suitable for

MetricMultiBatchParameterBuilder(name: str, metric_name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, single_batch_mode: Union[str, bool] = False, enforce_numeric_metric: Union[str, bool] = False, replace_nan_with_zero: Union[str, bool] = False, reduce_scalar_metric: Union[str, bool] = True, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

A Single/Multi-Batch implementation for obtaining a resolved (evaluated) metric, using domain_kwargs, value_kwargs,

MetricSingleBatchParameterBuilder(name: str, metric_name: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, enforce_numeric_metric: Union[str, bool] = False, replace_nan_with_zero: Union[str, bool] = False, reduce_scalar_metric: Union[str, bool] = True, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

A Single-Batch-only implementation for obtaining a resolved (evaluated) metric, using domain_kwargs, value_kwargs,

NumericMetricRangeMultiBatchParameterBuilder(name: str, metric_name: Optional[str] = None, metric_multi_batch_parameter_builder_name: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, enforce_numeric_metric: Union[str, bool] = True, replace_nan_with_zero: Union[str, bool] = True, reduce_scalar_metric: Union[str, bool] = True, false_positive_rate: Optional[Union[str, float]] = None, estimator: str = ‘bootstrap’, n_resamples: Optional[Union[str, int]] = None, random_seed: Optional[Union[str, int]] = None, quantile_statistic_interpolation_method: str = ‘auto’, quantile_bias_correction: Union[str, bool] = False, quantile_bias_std_error_ratio_threshold: Optional[Union[str, float]] = None, bw_method: Optional[Union[str, float, Callable]] = None, include_estimator_samples_histogram_in_details: Union[str, bool] = False, truncate_values: Optional[Union[str, Dict[str, Union[Optional[int], Optional[float]]]]] = None, round_decimals: Optional[Union[str, int]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

A Multi-Batch implementation for obtaining the range estimation bounds for a resolved (evaluated) numeric metric,

MeanUnexpectedMapMetricMultiBatchParameterBuilder(name: str, map_metric_name: str, total_count_parameter_builder_name: str, null_count_parameter_builder_name: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Compute mean unexpected count ratio (as a fraction) of specified map-style metric across every Batch of data given.

MeanTableColumnsSetMatchMultiBatchParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Compute mean match ratio (as a fraction) of “table.columns” metric across every Batch of data given.

RegexPatternStringParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, threshold: Union[str, float] = 1.0, candidate_regexes: Optional[Union[str, Iterable[str]]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Detects the domain REGEX from a set of candidate REGEX strings by computing the

SimpleDateFormatStringParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, threshold: Union[str, float] = 1.0, candidate_strings: Optional[Union[Iterable[str], str]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Detects the domain date format from a set of candidate date format strings by computing the

ValueSetMultiBatchParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Build a set of unique values across all specified batches.

ValueCountsSingleBatchParameterBuilder(name: str, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Compute value counts using specified metric for one Batch of data.

HistogramSingleBatchParameterBuilder(name: str, bins: str = ‘uniform’, n_bins: int = 10, allow_relative_error: bool = False, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Compute histogram using specified metric for one Batch of data.

Functions

init_rule_parameter_builders(parameter_builder_configs: Optional[List[dict]] = None, data_context: Optional[AbstractDataContext] = None)

class great_expectations.rule_based_profiler.parameter_builder.ParameterBuilder(name: str, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: abc.ABC, great_expectations.rule_based_profiler.builder.Builder

A ParameterBuilder implementation provides support for building Expectation Configuration Parameters suitable for use in other ParameterBuilders or in ConfigurationBuilders as part of profiling.

A ParameterBuilder is configured as part of a ProfilerRule. Its primary interface is the build_parameters method.

As part of a ProfilerRule, the following configuration will create a new parameter for each domain returned by the domain_builder, with an associated id.

``` parameter_builders:

  • name: my_parameter_builder class_name: MetricMultiBatchParameterBuilder metric_name: column.mean

```

exclude_field_names :Set[str]
build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, parameter_computation_impl: Optional[Callable] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequestBase, dict]] = None, recompute_existing_parameter_values: bool = False)
Parameters
  • domain – “Domain” object that is context for execution of this “ParameterBuilder” object.

  • variables – attribute name/value pairs

  • parameters – Dictionary of “ParameterContainer” objects corresponding to all “Domain” objects in memory.

  • parameter_computation_impl – Object containing desired “ParameterBuilder” implementation.

  • batch_list – Explicit list of “Batch” objects to supply data at runtime.

  • batch_request – Explicit batch_request used to supply data at runtime.

  • recompute_existing_parameter_values – If “True”, recompute value if “fully_qualified_parameter_name” exists.

resolve_evaluation_dependencies(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, fully_qualified_parameter_names: Optional[List[str]] = None, recompute_existing_parameter_values: bool = False)

This method computes (“resolves”) pre-requisite (“evaluation”) dependencies (i.e., results of executing other “ParameterBuilder” objects), whose output(s) are needed by specified “ParameterBuilder” object to operate.

abstract _build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

property name(self)
property evaluation_parameter_builders(self)
property evaluation_parameter_builder_configs(self)
property raw_fully_qualified_parameter_name(self)

This fully-qualified parameter name references “raw” “ParameterNode” output (including “Numpy” “dtype” values).

property json_serialized_fully_qualified_parameter_name(self)

This fully-qualified parameter name references “JSON-serialized” “ParameterNode” output.

get_validator(self, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)
get_batch_ids(self, limit: Optional[int] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)
get_metrics(self, metric_name: str, metric_domain_kwargs: Optional[Union[Union[str, dict], List[Union[str, dict]]]] = None, metric_value_kwargs: Optional[Union[Union[str, dict], List[Union[str, dict]]]] = None, limit: Optional[int] = None, enforce_numeric_metric: Union[str, bool] = False, replace_nan_with_zero: Union[str, bool] = False, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

General multi-batch metric computation facility.

Computes specified metric (can be multi-dimensional, numeric, non-numeric, or mixed) and conditions (or “sanitizes”) result according to two criteria: enforcing metric output to be numeric and handling NaN values. :param metric_name: Name of metric of interest, being computed. :param metric_domain_kwargs: Metric Domain Kwargs is an essential parameter of the MetricConfiguration object. :param metric_value_kwargs: Metric Value Kwargs is an essential parameter of the MetricConfiguration object. :param limit: Optional limit on number of “Batch” objects requested (supports single-Batch scenarios). :param enforce_numeric_metric: Flag controlling whether or not metric output must be numerically-valued. :param replace_nan_with_zero: Directive controlling how NaN metric values, if encountered, should be handled. :param domain: “Domain” object scoping “$variable”/”$parameter”-style references in configuration and runtime. :param variables: Part of the “rule state” available for “$variable”-style references. :param parameters: Part of the “rule state” available for “$parameter”-style references. :return: “MetricComputationResult” object, containing both: data samples in the format “N x R^m”, where “N” (most significant dimension) is the number of measurements (e.g., one per “Batch” of data), while “R^m” is the multi-dimensional metric, whose values are being estimated, and details (to be used for metadata purposes).

static _sanitize_metric_computation(parameter_builder: ParameterBuilder, metric_name: str, attributed_resolved_metrics: AttributedResolvedMetrics, enforce_numeric_metric: Union[str, bool] = False, replace_nan_with_zero: Union[str, bool] = False, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

This method conditions (or “sanitizes”) data samples in the format “N x R^m”, where “N” (most significant dimension) is the number of measurements (e.g., one per Batch of data), while “R^m” is the multi-dimensional metric, whose values are being estimated. The “conditioning” operations are: 1. If “enforce_numeric_metric” flag is set, raise an error if a non-numeric value is found in sample vectors. 2. Further, if a NaN is encountered in a sample vectors and “replace_nan_with_zero” is True, then replace those NaN values with the 0.0 floating point number; if “replace_nan_with_zero” is False, then raise an error.

static _get_best_candidate_above_threshold(candidate_ratio_dict: Dict[str, float], threshold: float)

Helper method to calculate which candidate strings or patterns are the best match (ie. highest ratio), provided they are also above the threshold.

static _get_sorted_candidates_and_ratios(candidate_ratio_dict: Dict[str, float])

Helper method to sort all candidate strings or patterns by success ratio (how well they matched the domain).

Returns sorted dict of candidate as key and ratio as value

great_expectations.rule_based_profiler.parameter_builder.init_rule_parameter_builders(parameter_builder_configs: Optional[List[dict]] = None, data_context: Optional[AbstractDataContext] = None) → Optional[List[ParameterBuilder]]
class great_expectations.rule_based_profiler.parameter_builder.MetricMultiBatchParameterBuilder(name: str, metric_name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, single_batch_mode: Union[str, bool] = False, enforce_numeric_metric: Union[str, bool] = False, replace_nan_with_zero: Union[str, bool] = False, reduce_scalar_metric: Union[str, bool] = True, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.ParameterBuilder

A Single/Multi-Batch implementation for obtaining a resolved (evaluated) metric, using domain_kwargs, value_kwargs, and metric_name as arguments.

property metric_name(self)
property metric_domain_kwargs(self)
property metric_value_kwargs(self)
property single_batch_mode(self)
property enforce_numeric_metric(self)
property replace_nan_with_zero(self)
property reduce_scalar_metric(self)
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

class great_expectations.rule_based_profiler.parameter_builder.MetricSingleBatchParameterBuilder(name: str, metric_name: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, enforce_numeric_metric: Union[str, bool] = False, replace_nan_with_zero: Union[str, bool] = False, reduce_scalar_metric: Union[str, bool] = True, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.MetricMultiBatchParameterBuilder

A Single-Batch-only implementation for obtaining a resolved (evaluated) metric, using domain_kwargs, value_kwargs, and metric_name as arguments.

exclude_field_names :Set[str]
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

class great_expectations.rule_based_profiler.parameter_builder.NumericMetricRangeMultiBatchParameterBuilder(name: str, metric_name: Optional[str] = None, metric_multi_batch_parameter_builder_name: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, enforce_numeric_metric: Union[str, bool] = True, replace_nan_with_zero: Union[str, bool] = True, reduce_scalar_metric: Union[str, bool] = True, false_positive_rate: Optional[Union[str, float]] = None, estimator: str = 'bootstrap', n_resamples: Optional[Union[str, int]] = None, random_seed: Optional[Union[str, int]] = None, quantile_statistic_interpolation_method: str = 'auto', quantile_bias_correction: Union[str, bool] = False, quantile_bias_std_error_ratio_threshold: Optional[Union[str, float]] = None, bw_method: Optional[Union[str, float, Callable]] = None, include_estimator_samples_histogram_in_details: Union[str, bool] = False, truncate_values: Optional[Union[str, Dict[str, Union[Optional[int], Optional[float]]]]] = None, round_decimals: Optional[Union[str, int]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.MetricMultiBatchParameterBuilder

A Multi-Batch implementation for obtaining the range estimation bounds for a resolved (evaluated) numeric metric, using domain_kwargs, value_kwargs, metric_name, and false_positive_rate (tolerance) as arguments.

This Multi-Batch ParameterBuilder is general in the sense that any metric that computes numbers can be accommodated. On the other hand, it is specific in the sense that the parameter names will always have the semantics of numeric ranges, which will incorporate the requirements, imposed by the configured false_positive_rate tolerances.

The implementation supports four methods of estimating parameter values from data: * quantiles – assumes that metric values, computed on batch data, are normally distributed and computes the mean

and the standard error using the queried batches as the single sample of the distribution.

  • exact – uses the minimum and maximum observations for range boundaries.

  • bootstrap – a statistical resampling technique (see “https://en.wikipedia.org/wiki/Bootstrapping_(statistics)”).

  • kde – a statistical technique that fits a gaussian to the distribution and resamples from it.

RECOGNIZED_SAMPLING_METHOD_NAMES :set
RECOGNIZED_TRUNCATE_DISTRIBUTION_KEYS :set
exclude_field_names :Set[str]
property metric_multi_batch_parameter_builder_name(self)
property false_positive_rate(self)
property estimator(self)
property n_resamples(self)
property random_seed(self)
property quantile_statistic_interpolation_method(self)
property quantile_bias_correction(self)
property quantile_bias_std_error_ratio_threshold(self)
property bw_method(self)
property include_estimator_samples_histogram_in_details(self)
property truncate_values(self)
property round_decimals(self)
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

The algorithm operates according to the following steps: 1. Obtain batch IDs of interest using AbstractDataContext and BatchRequest (unless passed explicitly as argument). 2. Set up metric_domain_kwargs and metric_value_kwargs (using configuration and/or variables and parameters). 3. Instantiate the Validator object corresponding to BatchRequest (with a temporary expectation_suite_name) in

order to have access to all Batch objects, on each of which the specified metric_name will be computed.

  1. Perform metric computations and obtain the result in the array-like form (one metric value per each Batch).

  2. Using the configured directives and heuristics, determine whether or not the ranges should be clipped.

  3. Using the configured directives and heuristics, determine if return values should be rounded to an integer.

  4. Convert the multi-dimensional metric computation results to a numpy array (for further computations).

  5. Compute [low, high] for the desired metric using the chosen estimator method.

  6. Return [low, high] for the desired metric as estimated by the specified sampling method.

  1. Set up the arguments and call build_parameter_container() to store the parameter as part of “rule state”.

_build_numeric_range_estimator(self, round_decimals: int, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

Determines “estimator” name and returns appropriate configured “NumericRangeEstimator” subclass instance.

_estimate_metric_value_range(self, metric_values: np.ndarray, numeric_range_estimator: NumericRangeEstimator, round_decimals: int, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

This method accepts “NumericRangeEstimator” and data samples in format “N x R^m”, where “N” (most significant dimension) is the number of measurements (e.g., one per Batch of data), while “R^m” is the multi-dimensional metric, whose values are being estimated. Thus, for each element in the “R^m” hypercube, an “N”-dimensional vector of sample measurements is constructed and given to the estimator to apply its specific algorithm for computing the range of values in this vector. Estimator algorithms differ based on their use of data samples.

static _is_metric_values_ndarray_datetime_dtype(metric_values: np.ndarray, metric_value_vector_indices: List[tuple])
static _is_metric_values_ndarray_decimal_dtype(metric_values: np.ndarray, metric_value_vector_indices: List[tuple])
_get_truncate_values_using_heuristics(self, metric_values: np.ndarray, domain: Domain, *, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)
_get_round_decimals_using_heuristics(self, metric_values: np.ndarray, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)
class great_expectations.rule_based_profiler.parameter_builder.MeanUnexpectedMapMetricMultiBatchParameterBuilder(name: str, map_metric_name: str, total_count_parameter_builder_name: str, null_count_parameter_builder_name: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.MetricMultiBatchParameterBuilder

Compute mean unexpected count ratio (as a fraction) of specified map-style metric across every Batch of data given.

exclude_field_names :Set[str]
property map_metric_name(self)
property total_count_parameter_builder_name(self)
property null_count_parameter_builder_name(self)
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

class great_expectations.rule_based_profiler.parameter_builder.MeanTableColumnsSetMatchMultiBatchParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.MetricMultiBatchParameterBuilder

Compute mean match ratio (as a fraction) of “table.columns” metric across every Batch of data given.

Step-1: Compute “table.columns” metric value for each Batch object. Step-2: Compute set union operation of column lists from Step-1 over all Batch objects (gives maximum column set). Step-3: Assign match scores: if column set of a Batch equals overall (maximum) column set, give it 1; 0 otherwise. Step-4: Compute mean value of match scores as “success_ratio” (divide sum of scores by number of Batch objects).

exclude_field_names :Set[str]
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

class great_expectations.rule_based_profiler.parameter_builder.RegexPatternStringParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, threshold: Union[str, float] = 1.0, candidate_regexes: Optional[Union[str, Iterable[str]]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.ParameterBuilder

Detects the domain REGEX from a set of candidate REGEX strings by computing the column_values.match_regex_format.unexpected_count metric for each candidate format and returning the format that has the lowest unexpected_count ratio.

CANDIDATE_REGEX :Set[str]
property metric_domain_kwargs(self)
property metric_value_kwargs(self)
property threshold(self)
property candidate_regexes(self)
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Check the percentage of values matching the REGEX string, and return the best fit, or None if no string exceeds the configured threshold.

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

class great_expectations.rule_based_profiler.parameter_builder.SimpleDateFormatStringParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, threshold: Union[str, float] = 1.0, candidate_strings: Optional[Union[Iterable[str], str]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.ParameterBuilder

Detects the domain date format from a set of candidate date format strings by computing the column_values.match_strftime_format.unexpected_count metric for each candidate format and returning the format that has the lowest unexpected_count ratio.

property metric_domain_kwargs(self)
property metric_value_kwargs(self)
property threshold(self)
property candidate_strings(self)
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Check the percentage of values matching each string, and return the best fit, or None if no string exceeds the configured threshold.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

class great_expectations.rule_based_profiler.parameter_builder.ValueSetMultiBatchParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.MetricMultiBatchParameterBuilder

Build a set of unique values across all specified batches.

This parameter builder can be used to build a unique value_set for each of the domains specified by the DomainBuilder from all of the batches specified. This value_set can be used to create Expectations.

This unique value_set is the unique values from ALL batches accessible to the parameter builder. For example, if batch 1 has the unique values {1, 4, 8} and batch 2 {2, 8, 10} the unique values returned by this parameter builder are the set union, or {1, 2, 4, 8, 10}

Notes

  1. The computation of the unique values across batches is done within this ParameterBuilder so please be aware that testing large columns with high cardinality could require a large amount of memory.

  2. This ParameterBuilder filters null values out from the unique value_set.

exclude_field_names :Set[str]
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

class great_expectations.rule_based_profiler.parameter_builder.ValueCountsSingleBatchParameterBuilder(name: str, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.MetricSingleBatchParameterBuilder

Compute value counts using specified metric for one Batch of data.

exclude_field_names :Set[str]
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.

class great_expectations.rule_based_profiler.parameter_builder.HistogramSingleBatchParameterBuilder(name: str, bins: str = 'uniform', n_bins: int = 10, allow_relative_error: bool = False, evaluation_parameter_builder_configs: Optional[List[ParameterBuilderConfig]] = None, data_context: Optional[AbstractDataContext] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.MetricSingleBatchParameterBuilder

Compute histogram using specified metric for one Batch of data.

exclude_field_names :Set[str]
_build_parameters(self, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, recompute_existing_parameter_values: bool = False)

Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and details.

Returns

Attributes object, containing computed parameter values and parameter computation details metadata.