great_expectations.rule_based_profiler.parameter_builder.regex_pattern_string_parameter_builder

Module Contents

Classes

RegexPatternStringParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, threshold: Union[float, str] = 1.0, candidate_regexes: Optional[Union[Iterable[str], str]] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict]] = None, data_context: Optional[‘DataContext’] = None)

Detects the domain REGEX from a set of candidate REGEX strings by computing the

great_expectations.rule_based_profiler.parameter_builder.regex_pattern_string_parameter_builder.logger
class great_expectations.rule_based_profiler.parameter_builder.regex_pattern_string_parameter_builder.RegexPatternStringParameterBuilder(name: str, metric_domain_kwargs: Optional[Union[str, dict]] = None, metric_value_kwargs: Optional[Union[str, dict]] = None, threshold: Union[float, str] = 1.0, candidate_regexes: Optional[Union[Iterable[str], str]] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict]] = None, data_context: Optional['DataContext'] = None)

Bases: great_expectations.rule_based_profiler.parameter_builder.parameter_builder.ParameterBuilder

Detects the domain REGEX from a set of candidate REGEX strings by computing the column_values.match_regex_format.unexpected_count metric for each candidate format and returning the format that has the lowest unexpected_count ratio.

CANDIDATE_REGEX :Set[str]
property fully_qualified_parameter_name(self)
property metric_domain_kwargs(self)
property metric_value_kwargs(self)
property threshold(self)
property candidate_regexes(self)
_build_parameters(self, parameter_container: ParameterContainer, domain: Domain, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

Check the percentage of values matching the REGEX string, and return the best fit, or None if no string exceeds the configured threshold.

return: Tuple containing computed_parameter_value and parameter_computation_details metadata.

static _get_regex_matched_greater_than_threshold(regex_string_success_ratio_dict: dict, threshold: float)

Helper method to calculate which regex_strings match greater than threshold

static _get_sorted_regex_and_ratios(regex_string_success_ratio_dict: dict)

Helper method to sort all regexes that were evaluated by their success ratio. Returns Tuple(ratio, sorted_strings)