great_expectations.rule_based_profiler.types

Package Contents

Classes

Attributes()

This class generalizes dictionary in order to hold generic attributes with unique ID.

Builder(batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict]] = None, data_context: Optional[‘DataContext’] = None)

A Builder provides methods to serialize any builder object of a rule generically.

Domain(domain_type: Union[str, MetricDomainTypes], domain_kwargs: Optional[Union[Dict[str, Any], DomainKwargs]] = None, details: Optional[Dict[str, Any]] = None)

Analogously to the way “SerializableDictDot” extends “DictDot” to provide JSON serialization, the present class,

SemanticDomainTypes()

Generic enumeration.

InferredSemanticDomainType()

A convenience class for migrating away from untyped dictionaries to stronger typed objects.

ParameterNode()

ParameterNode is a node of a tree structure.

ParameterContainer()

ParameterContainer holds root nodes of tree structures, corresponding to fully-qualified parameter names.

Functions

build_parameter_container(parameter_container: ParameterContainer, parameter_values: Dict[str, Any])

Builds the ParameterNode trees, corresponding to the fully_qualified_parameter_name first-level keys.

build_parameter_container_for_variables(variables_configs: Dict[str, Any])

Build a ParameterContainer for all of the profiler config variables passed as key value pairs

get_parameter_value_by_fully_qualified_parameter_name(fully_qualified_parameter_name: str, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None)

Get the parameter value from the current “rule state” using the fully-qualified parameter name.

class great_expectations.rule_based_profiler.types.Attributes

Bases: great_expectations.types.SerializableDotDict, great_expectations.core.IDDict

This class generalizes dictionary in order to hold generic attributes with unique ID.

to_dict(self)
to_json_dict(self)
class great_expectations.rule_based_profiler.types.Builder(batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequest, RuntimeBatchRequest, dict]] = None, data_context: Optional['DataContext'] = None)

Bases: great_expectations.types.SerializableDictDot

A Builder provides methods to serialize any builder object of a rule generically.

exclude_field_names :Set[str]
to_json_dict(self)

# TODO: <Alex>2/4/2022</Alex> This implementation of “SerializableDictDot.to_json_dict() occurs frequently and should ideally serve as the reference implementation in the “SerializableDictDot” class itself. However, the circular import dependencies, due to the location of the “great_expectations/types/__init__.py” and “great_expectations/core/util.py” modules make this refactoring infeasible at the present time.

__deepcopy__(self, memo)
__repr__(self)

# TODO: <Alex>2/4/2022</Alex> This implementation of a custom “__repr__()” occurs frequently and should ideally serve as the reference implementation in the “SerializableDictDot” class. However, the circular import dependencies, due to the location of the “great_expectations/types/__init__.py” and “great_expectations/core/util.py” modules make this refactoring infeasible at the present time.

__str__(self)

# TODO: <Alex>2/4/2022</Alex> This implementation of a custom “__str__()” occurs frequently and should ideally serve as the reference implementation in the “SerializableDictDot” class. However, the circular import dependencies, due to the location of the “great_expectations/types/__init__.py” and “great_expectations/core/util.py” modules make this refactoring infeasible at the present time.

class great_expectations.rule_based_profiler.types.Domain(domain_type: Union[str, MetricDomainTypes], domain_kwargs: Optional[Union[Dict[str, Any], DomainKwargs]] = None, details: Optional[Dict[str, Any]] = None)

Bases: great_expectations.types.SerializableDotDict

Analogously to the way “SerializableDictDot” extends “DictDot” to provide JSON serialization, the present class, “SerializableDotDict” extends “DotDict” to provide JSON-serializable version of the “DotDict” class as well. Since “DotDict” is already YAML-serializable, “SerializableDotDict” is both YAML-serializable and JSON-serializable.

__repr__(self)

Return repr(self).

__str__(self)

Return str(self).

__eq__(self, other)

Return self==value.

__ne__(self, other)

Return self!=value.

property id(self)
to_json_dict(self)
_convert_dictionaries_to_domain_kwargs(self, source: Optional[Any] = None)
class great_expectations.rule_based_profiler.types.SemanticDomainTypes

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

NUMERIC = numeric
TEXT = text
LOGIC = logic
DATETIME = datetime
BINARY = binary
CURRENCY = currency
VALUE_SET = value_set
IDENTIFIER = identifier
MISCELLANEOUS = miscellaneous
UNKNOWN = unknown
class great_expectations.rule_based_profiler.types.InferredSemanticDomainType

Bases: great_expectations.types.SerializableDictDot

A convenience class for migrating away from untyped dictionaries to stronger typed objects.

Can be instantiated with arguments:

my_A = MyClassA(

foo=”a string”, bar=1,

)

Can be instantiated from a dictionary:

my_A = MyClassA(
**{

“foo”: “a string”, “bar”: 1,

}

)

Can be accessed using both dictionary and dot notation

my_A.foo == “a string” my_A.bar == 1

my_A[“foo”] == “a string” my_A[“bar”] == 1

Pairs nicely with @dataclass:

@dataclass() class MyClassA(DictDot):

foo: str bar: int

Can be made immutable:

@dataclass(frozen=True) class MyClassA(DictDot):

foo: str bar: int

For more examples of usage, please see test_dataclass_serializable_dot_dict_pattern.py in the tests folder.

semantic_domain_type :Optional[Union[str, SemanticDomainTypes]]
details :Optional[Dict[str, Any]]
to_dict(self)
to_json_dict(self)

# TODO: <Alex>2/4/2022</Alex> A reference implementation can be provided, once circular import dependencies, caused by relative locations of the “great_expectations/types/__init__.py” and “great_expectations/core/util.py” modules are resolved.

class great_expectations.rule_based_profiler.types.ParameterNode

Bases: great_expectations.types.SerializableDotDict

ParameterNode is a node of a tree structure.

The tree is implemented as a nested dictionary that also supports the “dot” notation at every level of hierarchy. Together, these design aspects allow the entire tree to be converted into a JSON object for external compatibility.

Since the descendant nodes (i.e., sub-dictionaries) are of the same type as their parent node, then each descendant node is also a tree (or a sub-tree). Each node can support the combination of attribute name-value pairs representing values and details containing helpful information regarding how these values were obtained (tolerances, explanations, etc.). By convention, the “value” key corresponds the parameter value, while the “details” key corresponds the auxiliary details. These details can be used to set the “meta” key of the ExpectationConfiguration.

See the ParameterContainer documentation for examples of different parameter naming structures supported.

Even though, typically, only the leaf nodes (characterized by having no keys of “ParameterNode” type) store parameter values and details, intermediate nodes may also have these properties. This is important for supporting the situations where multiple long fully-qualified parameter names have overlapping intermediate parts (see below).

to_dict(self)
to_json_dict(self)
class great_expectations.rule_based_profiler.types.ParameterContainer

Bases: great_expectations.types.SerializableDictDot

ParameterContainer holds root nodes of tree structures, corresponding to fully-qualified parameter names.

While all parameter names begin with the dollar sign character (“$”), a fully-qualified parameter name is a string, whose parts are delimited by the period character (“.”).

As an example, suppose that the value of the attribute “max_num_conversion_attempts” is needed for certain processing operations. However, there could be several attributes with this name, albeit in different contexts: $parameter.date_strings.tolerances.max_num_conversion_attempts $parameter.character.encodings.tolerances.max_num_conversion_attempts The fully-qualified parameter names disambiguate the same “leaf” attribute names occurring in multiple contexts. In the present example, the use of fully-qualified parameter names makes it clear that in one context, “max_num_conversion_attempts” refers to the operations on date/time, while in the other – it applies to characters.

$variables.false_positive_threshold $parameter.date_strings.yyyy_mm_dd_hh_mm_ss_tz_date_format.value $parameter.date_strings.yyyy_mm_dd_hh_mm_ss_tz_date_format.details $parameter.date_strings.yyyy_mm_dd_date_format.value $parameter.date_strings.yyyy_mm_dd_date_format.details $parameter.date_strings.mm_yyyy_dd_hh_mm_ss_tz_date_format.value $parameter.date_strings.mm_yyyy_dd_hh_mm_ss_tz_date_format.details $parameter.date_strings.mm_yyyy_dd_date_format.value $parameter.date_strings.mm_yyyy_dd_date_format.details $parameter.date_strings.tolerances.max_abs_error_time_milliseconds $parameter.date_strings.tolerances.max_num_conversion_attempts $parameter.tolerances.mostly $parameter.tolerances.financial.usd $mean $parameter.monthly_taxi_fairs.mean_values.value[0] $parameter.monthly_taxi_fairs.mean_values.value[1] $parameter.monthly_taxi_fairs.mean_values.value[2] $parameter.monthly_taxi_fairs.mean_values.value[3] $parameter.monthly_taxi_fairs.mean_values.details $parameter.daily_taxi_fairs.mean_values.value[“friday”] $parameter.daily_taxi_fairs.mean_values.value[“saturday”] $parameter.daily_taxi_fairs.mean_values.value[“sunday”] $parameter.daily_taxi_fairs.mean_values.value[“monday”] $parameter.daily_taxi_fairs.mean_values.details $parameter.weekly_taxi_fairs.mean_values.value[1][‘friday’] $parameter.weekly_taxi_fairs.mean_values.value[18][‘saturday’] $parameter.weekly_taxi_fairs.mean_values.value[20][‘sunday’] $parameter.weekly_taxi_fairs.mean_values.value[21][‘monday’] $parameter.weekly_taxi_fairs.mean_values.details $custom.lang.character_encodings

The reason that ParameterContainer is needed is that each ParameterNode can point only to one tree structure, characterized by having a specific root-level ParameterNode object. A root-level ParameterNode object corresponds to a set of fully-qualified parameter names that have the same first part (e.g., “parameter”). However, a Domain may utilize fully-qualified parameter names that have multiple first parts (i.e., from different “name spaces”). The ParameterContainer maintains a dictionary that holds references to root-level ParameterNode objects for all parameter “name spaces” applicable to the given Domain (where the first part of all fully-qualified parameter names within the same “name space” serves as the dictionary key, and the root-level ParameterNode objects are the values).

parameter_nodes :Optional[Dict[str, ParameterNode]]
set_parameter_node(self, parameter_name_root: str, parameter_node: ParameterNode)
get_parameter_node(self, parameter_name_root: str)
_convert_dictionaries_to_parameter_nodes(self, source: Optional[Any] = None)
to_dict(self)
to_json_dict(self)

# TODO: <Alex>2/4/2022</Alex> A reference implementation can be provided, once circular import dependencies, caused by relative locations of the “great_expectations/types/__init__.py” and “great_expectations/core/util.py” modules are resolved.

great_expectations.rule_based_profiler.types.build_parameter_container(parameter_container: ParameterContainer, parameter_values: Dict[str, Any])

Builds the ParameterNode trees, corresponding to the fully_qualified_parameter_name first-level keys.

:param parameter_container initialized ParameterContainer for all ParameterNode trees :param parameter_values Example of the name-value structure for building parameters (matching the type hint in the method signature): {

“$parameter.date_strings.tolerances.max_abs_error_time_milliseconds.value”: 100, # Actual value can of Any type. # The “details” dictionary is Optional. “$parameter.date_strings.tolerances.max_abs_error_time_milliseconds.details”: {

“max_abs_error_time_milliseconds”: {
“confidence”: { # Arbitrary dictionary key

“success_ratio”: 1.0, # Arbitrary entries “comment”: “matched template”, # Arbitrary entries

}

},

}, # While highly recommended, the use of “.value” and “.details” keys is conventional (it is not enforced). “$parameter.tolerances.mostly”: 9.0e-1, # The key here does not end on “.value” and no “.details” is provided. …

} :return parameter_container holds the dictionary of ParameterNode objects corresponding to roots of parameter names

This function loops through the supplied pairs of fully-qualified parameter names and their corresponding values (and any “details”) and builds the tree under a single root-level ParameterNode object for a “name space”. In particular, if any ParameterNode object in the tree (starting with the root-level ParameterNode object) already exists, it is reused; in other words, ParameterNode objects are unique per part of fully-qualified parameter names.

great_expectations.rule_based_profiler.types.build_parameter_container_for_variables(variables_configs: Dict[str, Any]) → ParameterContainer

Build a ParameterContainer for all of the profiler config variables passed as key value pairs :param variables_configs: Variable key: value pairs e.g. {“variable_name”: variable_value, …}

Returns

ParameterContainer containing all variables

great_expectations.rule_based_profiler.types.get_parameter_value_by_fully_qualified_parameter_name(fully_qualified_parameter_name: str, domain: Optional[Domain] = None, variables: Optional[ParameterContainer] = None, parameters: Optional[Dict[str, ParameterContainer]] = None) → Optional[Union[Any, ParameterNode]]

Get the parameter value from the current “rule state” using the fully-qualified parameter name. A fully-qualified parameter name must be a dot-delimited string, or the name of a parameter (without the dots). Args

param fully_qualified_parameter_name

str – A dot-separated string key starting with $ for fetching parameters

param domain

Domain – current Domain of interest

:param variables :param parameters

Returns

Optional[Union[Any, ParameterNode]] object corresponding to the last part of the fully-qualified parameter

name supplied as argument – a value (of type “Any”) or a ParameterNode object (containing the sub-tree structure).

great_expectations.rule_based_profiler.types.DOMAIN_KWARGS_PARAMETER_NAME :str = domain_kwargs