great_expectations.rule_based_profiler.util
¶
Module Contents¶
Functions¶
|
|
|
|
|
|
|
|
|
This method allows for the parameter_reference to be specified as an object (literal, dict, any typed object, etc.) |
|
This method allows for the parameter_reference to be specified as an object (literal, dict, any typed object, etc.) |
|
|
|
Internal implementation of the “bootstrap” estimator method, returning confidence interval for a distribution. |
-
great_expectations.rule_based_profiler.util.
NP_EPSILON
:Union[Number, np.float64]¶
-
great_expectations.rule_based_profiler.util.
get_validator
(purpose: str, *, data_context: Optional['DataContext'] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional['Validator']¶
-
great_expectations.rule_based_profiler.util.
get_batch_ids
(data_context: Optional['DataContext'] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[List[str]]¶
-
great_expectations.rule_based_profiler.util.
build_batch_request
(batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict, str]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[Union[BatchRequest, RuntimeBatchRequest]]¶
-
great_expectations.rule_based_profiler.util.
build_metric_domain_kwargs
(batch_id: Optional[str] = None, metric_domain_kwargs: Optional[Union[str, dict]] = None, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)¶
-
great_expectations.rule_based_profiler.util.
get_parameter_value_and_validate_return_type
(domain: Optional[Domain] = None, parameter_reference: Optional[Union[Any, str]] = None, expected_return_type: Optional[Union[type, tuple]] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[Any]¶ This method allows for the parameter_reference to be specified as an object (literal, dict, any typed object, etc.) or as a fully-qualified parameter name. In either case, it can optionally validate the type of the return value.
-
great_expectations.rule_based_profiler.util.
get_parameter_value
(domain: Optional[Domain] = None, parameter_reference: Optional[Union[Any, str]] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[Any]¶ This method allows for the parameter_reference to be specified as an object (literal, dict, any typed object, etc.) or as a fully-qualified parameter name. Moreover, if the parameter_reference argument is an object of type “dict”, it will recursively detect values using the fully-qualified parameter name format and evaluate them accordingly.
-
great_expectations.rule_based_profiler.util.
compute_quantiles
(metric_values: np.ndarray, false_positive_rate: np.float64) → Tuple[Number, Number]¶
-
great_expectations.rule_based_profiler.util.
compute_bootstrap_quantiles
(metric_values: np.ndarray, false_positive_rate: np.float64, n_resamples: int) → Tuple[Number, Number]¶ Internal implementation of the “bootstrap” estimator method, returning confidence interval for a distribution. See https://en.wikipedia.org/wiki/Bootstrapping_(statistics) for an introduction to “bootstrapping” in statistics.
This implementation is sub-par compared to the one available from the “SciPy” standard library (“https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bootstrap.html”), because it introduces bias and does not handle multi-dimensional statistics (unlike “scipy.stats.bootstrap”, which corrects for bias and is vectorized, thus having the ability to accept a multi-dimensional statistic function and process all dimensions).
This implementation will be replaced by “scipy.stats.bootstrap” when Great Expectations can be upgraded to use a more up-to-date version of the “scipy” Python package (the currently used version does not have “bootstrap”).
Additional future direction (potentially as a contribution submission to the “SciPy” community) include developing enhancements to bootstrapped estimator based on theory presented in “http://dido.econ.yale.edu/~dwka/pub/p1001.pdf”: @article{Andrews2000a,
added-at = {2008-04-25T10:38:44.000+0200}, author = {Andrews, Donald W. K. and Buchinsky, Moshe}, biburl = {https://www.bibsonomy.org/bibtex/28e2f0a58cdb95e39659921f989a17bdd/smicha}, day = 01, interhash = {778746398daa9ba63bdd95391f1efd37}, intrahash = {8e2f0a58cdb95e39659921f989a17bdd}, journal = {Econometrica}, keywords = {imported}, month = Jan, note = {doi: 10.1111/1468-0262.00092}, number = 1, pages = {23–51}, timestamp = {2008-04-25T10:38:52.000+0200}, title = {A Three-step Method for Choosing the Number of Bootstrap Repetitions}, url = {http://www.blackwell-synergy.com/doi/abs/10.1111/1468-0262.00092}, volume = 68, year = 2000
} The article outlines a three-step minimax procedure that relies on the Central Limit Theorem (C.L.T.) along with the bootstrap sampling technique (see https://en.wikipedia.org/wiki/Bootstrapping_(statistics) for background) for computing the stopping criterion, expressed as the optimal number of bootstrap samples, needed to achieve a maximum probability that the value of the statistic of interest will be minimally deviating from its actual (ideal) value.