great_expectations.rule_based_profiler.parameter_builder.regex_pattern_string_parameter_builder

Module Contents

Classes

RegexPatternStringParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, threshold: Union[float, str] = 1.0, candidate_regexes: Optional[Union[Iterable[str], str]] = None, data_context: Optional[‘DataContext’] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict]] = None)

Detects the domain REGEX from a set of candidate REGEX strings by computing the

great_expectations.rule_based_profiler.parameter_builder.regex_pattern_string_parameter_builder.logger
class great_expectations.rule_based_profiler.parameter_builder.regex_pattern_string_parameter_builder.RegexPatternStringParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, threshold: Union[float, str] = 1.0, candidate_regexes: Optional[Union[Iterable[str], str]] = None, data_context: Optional['DataContext'] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict]] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.parameter_builder.ParameterBuilder

Detects the domain REGEX from a set of candidate REGEX strings by computing the column_values.match_regex_format.unexpected_count metric for each candidate format and returning the format that has the lowest unexpected_count ratio.

CANDIDATE_REGEX :Set[str]
property metric_domain_kwargs(self)
property metric_value_kwargs(self)
property threshold(self)
property candidate_regexes(self)
_build_parameters(self, parameter_container: ParameterContainer, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

Check the percentage of values matching the REGEX string, and return the best fit, or None if no string exceeds the configured threshold.

Returns

ParameterContainer object that holds ParameterNode objects with attribute name-value pairs and optional details

_get_regex_matched_greater_than_threshold(self, regex_string_success_ratio_dict: dict, threshold: float)

Helper method to calculate which regex_strings match greater than threshold

_get_sorted_regex_and_ratios(self, regex_string_success_ratio_dict: dict)

Helper method to sort all regexes that were evaluated by their success ratio. Returns Tuple(ratio, sorted_strings)