great_expectations.rule_based_profiler.parameter_builder.numeric_metric_range_multi_batch_parameter_builder

Module Contents

Classes

NumericMetricRangeMultiBatchParameterBuilder(name: str, metric_name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, sampling_method: str = ‘bootstrap’, enforce_numeric_metric: Union[str, bool] = True, replace_nan_with_zero: Union[str, bool] = True, reduce_scalar_metric: Union[str, bool] = True, false_positive_rate: Union[str, float] = 0.05, num_bootstrap_samples: Optional[Union[str, int]] = None, round_decimals: Optional[Union[str, int]] = None, truncate_values: Optional[Union[str, Dict[str, Union[Optional[int], Optional[float]]]]] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict]] = None, data_context: Optional[‘DataContext’] = None)

A Multi-Batch implementation for obtaining the range estimation bounds for a resolved (evaluated) numeric metric,

great_expectations.rule_based_profiler.parameter_builder.numeric_metric_range_multi_batch_parameter_builder.MAX_DECIMALS :int = 9
great_expectations.rule_based_profiler.parameter_builder.numeric_metric_range_multi_batch_parameter_builder.DEFAULT_BOOTSTRAP_NUM_RESAMPLES :int = 9999
class great_expectations.rule_based_profiler.parameter_builder.numeric_metric_range_multi_batch_parameter_builder.NumericMetricRangeMultiBatchParameterBuilder(name: str, metric_name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, sampling_method: str = 'bootstrap', enforce_numeric_metric: Union[str, bool] = True, replace_nan_with_zero: Union[str, bool] = True, reduce_scalar_metric: Union[str, bool] = True, false_positive_rate: Union[str, float] = 0.05, num_bootstrap_samples: Optional[Union[str, int]] = None, round_decimals: Optional[Union[str, int]] = None, truncate_values: Optional[Union[str, Dict[str, Union[Optional[int], Optional[float]]]]] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict]] = None, data_context: Optional['DataContext'] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.parameter_builder.ParameterBuilder

A Multi-Batch implementation for obtaining the range estimation bounds for a resolved (evaluated) numeric metric, using domain_kwargs, value_kwargs, metric_name, and false_positive_rate (tolerance) as arguments.

This Multi-Batch ParameterBuilder is general in the sense that any metric that computes numbers can be accommodated. On the other hand, it is specific in the sense that the parameter names will always have the semantics of numeric ranges, which will incorporate the requirements, imposed by the configured false_positive_rate tolerances.

The implementation supports two methods of estimating parameter values from data: * bootstrapped (default) – a statistical technique (see “https://en.wikipedia.org/wiki/Bootstrapping_(statistics)”) * one-shot – assumes that metric values, computed on batch data, are normally distributed and computes the mean

and the standard error using the queried batches as the single sample of the distribution (fast, but inaccurate).

RECOGNIZED_SAMPLING_METHOD_NAMES :set
RECOGNIZED_TRUNCATE_DISTRIBUTION_KEYS :set
property fully_qualified_parameter_name(self)
property metric_name(self)
property metric_domain_kwargs(self)
property metric_value_kwargs(self)
property sampling_method(self)
property enforce_numeric_metric(self)
property replace_nan_with_zero(self)
property reduce_scalar_metric(self)
property false_positive_rate(self)
property num_bootstrap_samples(self)
property round_decimals(self)
property truncate_values(self)
_build_parameters(self, parameter_container: ParameterContainer, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)
Builds ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and optional

details.

return: Tuple containing computed_parameter_value and parameter_computation_details metadata.

The algorithm operates according to the following steps: 1. Obtain batch IDs of interest using DataContext and BatchRequest (unless passed explicitly as argument). Note that this specific BatchRequest was specified as part of configuration for the present ParameterBuilder class. 2. Set up metric_domain_kwargs and metric_value_kwargs (using configuration and/or variables and parameters). 3. Instantiate the Validator object corresponding to BatchRequest (with a temporary expectation_suite_name) in

order to have access to all Batch objects, on each of which the specified metric_name will be computed.

  1. Perform metric computations and obtain the result in the array-like form (one metric value per each Batch).

  2. Using the configured directives and heuristics, determine whether or not the ranges should be clipped.

  3. Using the configured directives and heuristics, determine if return values should be rounded to an integer.

7. Convert the multi-dimensional metric computation results to a numpy array (for further computations). Steps 8 – 10 are for the “oneshot” sampling method only (the “bootstrap” method achieves same automatically): 8. Compute the mean and the standard deviation of the metric (aggregated over all the gathered Batch objects). 9. Compute number of standard deviations (as floating point) needed (around the mean) to achieve the specified

false_positive_rate (note that false_positive_rate of 0.0 would result in infinite number of standard deviations, hence it is “nudged” by small quantity “epsilon” above 0.0 if false_positive_rate of 0.0 appears as argument). (Please refer to “https://en.wikipedia.org/wiki/Normal_distribution” and references therein for background.)

  1. Compute the “band” around the mean as the min_value and max_value (to be used in ExpectationConfiguration).

  2. Return [low, high] for the desired metric as estimated by the specified sampling method.

  3. Set up the arguments and call build_parameter_container() to store the parameter as part of “rule state”.

_estimate_metric_value_range(self, metric_values: np.ndarray, estimator: Callable, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, **kwargs)

This method accepts an estimator Callable and data samples in the format “N x R^m”, where “N” (most significant dimension) is the number of measurements (e.g., one per Batch of data), while “R^m” is the multi-dimensional metric, whose values are being estimated. Thus, for each element in the “R^m” hypercube, an “N”-dimensional vector of sample measurements is constructed and given to the estimator to apply its specific algorithm for computing the range of values in this vector. Estimator algorithms differ based on their use of data samples.

_get_truncate_values_using_heuristics(self, metric_values: np.ndarray, domain: Domain, *, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)
_get_round_decimals_using_heuristics(self, metric_values: np.ndarray, domain: Domain, *, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)
_get_bootstrap_estimate(self, metric_values: np.ndarray, domain: Domain, *, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, **kwargs)
_get_deterministic_estimate(self, metric_values: np.ndarray, domain: Domain, *, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None, **kwargs)