great_expectations.rule_based_profiler.util

Module Contents

Functions

get_validator(purpose: str, *, data_context: Optional[‘DataContext’] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

get_batch_ids(data_context: Optional[‘DataContext’] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

build_batch_request(batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

build_metric_domain_kwargs(batch_id: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

get_parameter_value_and_validate_return_type(domain: Optional[Domain] = None, parameter_reference: Optional[Union[Any, str]] = None, expected_return_type: Optional[Union[type, tuple]] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

This method allows for the parameter_reference to be specified as an object (literal, dict, any typed object, etc.)

get_parameter_value(domain: Optional[Domain] = None, parameter_reference: Optional[Union[Any, str]] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

This method allows for the parameter_reference to be specified as an object (literal, dict, any typed object, etc.)

compute_quantiles(metric_values: np.ndarray, false_positive_rate: np.float64)

compute_bootstrap_quantiles(metric_values: np.ndarray, false_positive_rate: np.float64, n_resamples: int)

Internal implementation of the “bootstrap” estimator method, returning confidence interval for a distribution.

great_expectations.rule_based_profiler.util.NP_EPSILON :Union[Number, np.float64]
great_expectations.rule_based_profiler.util.get_validator(purpose: str, *, data_context: Optional['DataContext'] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional['Validator']
great_expectations.rule_based_profiler.util.get_batch_ids(data_context: Optional['DataContext'] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[List[str]]
great_expectations.rule_based_profiler.util.build_batch_request(batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[Union[BatchRequest, RuntimeBatchRequest]]
great_expectations.rule_based_profiler.util.build_metric_domain_kwargs(batch_id: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)
great_expectations.rule_based_profiler.util.get_parameter_value_and_validate_return_type(domain: Optional[Domain] = None, parameter_reference: Optional[Union[Any, str]] = None, expected_return_type: Optional[Union[type, tuple]] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[Any]

This method allows for the parameter_reference to be specified as an object (literal, dict, any typed object, etc.) or as a fully-qualified parameter name. In either case, it can optionally validate the type of the return value.

great_expectations.rule_based_profiler.util.get_parameter_value(domain: Optional[Domain] = None, parameter_reference: Optional[Union[Any, str]] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[Any]

This method allows for the parameter_reference to be specified as an object (literal, dict, any typed object, etc.) or as a fully-qualified parameter name. Moreover, if the parameter_reference argument is an object of type “dict”, it will recursively detect values using the fully-qualified parameter name format and evaluate them accordingly.

great_expectations.rule_based_profiler.util.compute_quantiles(metric_values: np.ndarray, false_positive_rate: np.float64) → Tuple[Number, Number]
great_expectations.rule_based_profiler.util.compute_bootstrap_quantiles(metric_values: np.ndarray, false_positive_rate: np.float64, n_resamples: int) → Tuple[Number, Number]

Internal implementation of the “bootstrap” estimator method, returning confidence interval for a distribution. See https://en.wikipedia.org/wiki/Bootstrapping_(statistics) for an introduction to “bootstrapping” in statistics.

This implementation is sub-par compared to the one available from the “SciPy” standard library (“https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bootstrap.html”), because it introduces bias and does not handle multi-dimensional statistics (unlike “scipy.stats.bootstrap”, which corrects for bias and is vectorized, thus having the ability to accept a multi-dimensional statistic function and process all dimensions).

This implementation will be replaced by “scipy.stats.bootstrap” when Great Expectations can be upgraded to use a more up-to-date version of the “scipy” Python package (the currently used version does not have “bootstrap”).

Additional future direction (potentially as a contribution submission to the “SciPy” community) include developing enhancements to bootstrapped estimator based on theory presented in “http://dido.econ.yale.edu/~dwka/pub/p1001.pdf”: @article{Andrews2000a,

added-at = {2008-04-25T10:38:44.000+0200}, author = {Andrews, Donald W. K. and Buchinsky, Moshe}, biburl = {https://www.bibsonomy.org/bibtex/28e2f0a58cdb95e39659921f989a17bdd/smicha}, day = 01, interhash = {778746398daa9ba63bdd95391f1efd37}, intrahash = {8e2f0a58cdb95e39659921f989a17bdd}, journal = {Econometrica}, keywords = {imported}, month = Jan, note = {doi: 10.1111/1468-0262.00092}, number = 1, pages = {23–51}, timestamp = {2008-04-25T10:38:52.000+0200}, title = {A Three-step Method for Choosing the Number of Bootstrap Repetitions}, url = {http://www.blackwell-synergy.com/doi/abs/10.1111/1468-0262.00092}, volume = 68, year = 2000

} The article outlines a three-step minimax procedure that relies on the Central Limit Theorem (C.L.T.) along with the bootstrap sampling technique (see https://en.wikipedia.org/wiki/Bootstrapping_(statistics) for background) for computing the stopping criterion, expressed as the optimal number of bootstrap samples, needed to achieve a maximum probability that the value of the statistic of interest will be minimally deviating from its actual (ideal) value.