great_expectations.rule_based_profiler

Subpackages

Package Contents

Classes

RuleBasedProfilerResult()

“RuleBasedProfilerResult” is an immutable “dataclass” object, designed to hold results with auxiliary information of

BaseRuleBasedProfiler(profiler_config: RuleBasedProfilerConfig, data_context: Optional[AbstractDataContext] = None, usage_statistics_handler: Optional[UsageStatisticsHandler] = None)

BaseRuleBasedProfiler class is initialized from RuleBasedProfilerConfig typed object and contains all functionality

RuleBasedProfiler(name: str, config_version: float, variables: Optional[Dict[str, Any]] = None, rules: Optional[Dict[str, Dict[str, Any]]] = None, data_context: Optional[AbstractDataContext] = None, id: Optional[str] = None)

RuleBasedProfiler object serves to profile, or automatically evaluate a set of rules, upon a given

class great_expectations.rule_based_profiler.RuleBasedProfilerResult

Bases: great_expectations.types.SerializableDictDot

“RuleBasedProfilerResult” is an immutable “dataclass” object, designed to hold results with auxiliary information of executing “RuleBasedProfiler.run()” method. Principal properties are: “fully_qualified_parameter_names_by_domain”, “parameter_values_for_fully_qualified_parameter_names_by_domain”, “expectation_configurations”, and “citation” (which represents configuration of effective Rule-Based Profiler, with all run-time overrides properly reconciled).

fully_qualified_parameter_names_by_domain :Dict[Domain, List[str]]
parameter_values_for_fully_qualified_parameter_names_by_domain :Optional[Dict[Domain, Dict[str, ParameterNode]]]
expectation_configurations :List[ExpectationConfiguration]
citation :dict
rule_domain_builder_execution_time :Dict[str, float]
rule_execution_time :Dict[str, float]
_usage_statistics_handler :Optional[UsageStatisticsHandler]
to_dict(self)

Returns: This RuleBasedProfilerResult as dictionary (JSON-serializable for RuleBasedProfilerResult objects).

to_json_dict(self)

Returns: This RuleBasedProfilerResult as JSON-serializable dictionary.

get_expectation_suite(self, expectation_suite_name: str)

Returns: “ExpectationSuite” object, built from properties, populated into this “RuleBasedProfilerResult” object.

class great_expectations.rule_based_profiler.BaseRuleBasedProfiler(profiler_config: RuleBasedProfilerConfig, data_context: Optional[AbstractDataContext] = None, usage_statistics_handler: Optional[UsageStatisticsHandler] = None)

Bases: great_expectations.core.config_peer.ConfigPeer

BaseRuleBasedProfiler class is initialized from RuleBasedProfilerConfig typed object and contains all functionality in the form of interface methods (which can be overwritten by subclasses) and their reference implementation.

EXPECTATION_SUCCESS_KEYS :Set[str]
property ge_cloud_id(self)
_init_profiler_rules(self, rules: Dict[str, Dict[str, Any]])
_init_rule(self, rule_name: str, rule_config: Dict[str, Any])
static _init_rule_domain_builder(domain_builder_config: dict, data_context: Optional[AbstractDataContext] = None)
run(self, variables: Optional[Dict[str, Any]] = None, rules: Optional[Dict[str, Dict[str, Any]]] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequestBase, dict]] = None, recompute_existing_parameter_values: bool = False, reconciliation_directives: ReconciliationDirectives = DEFAULT_RECONCILATION_DIRECTIVES, variables_directives_list: Optional[List[RuntimeEnvironmentVariablesDirectives]] = None, domain_type_directives_list: Optional[List[RuntimeEnvironmentDomainTypeDirectives]] = None, comment: Optional[str] = None)

Executes and collects “RuleState” side-effect from all “Rule” objects of this “RuleBasedProfiler”.

Parameters
  • variables – attribute name/value pairs (overrides), commonly-used in Builder objects

  • rules – name/(configuration-dictionary) (overrides)

  • batch_list – Explicit list of Batch objects to supply data at runtime

  • batch_request – Explicit batch_request used to supply data at runtime

  • recompute_existing_parameter_values – If “True”, recompute value if “fully_qualified_parameter_name” exists

  • reconciliation_directives – directives for how each rule component should be overwritten

  • variables_directives_list – additional/override runtime variables directives (modify “BaseRuleBasedProfiler”)

  • domain_type_directives_list – additional/override runtime domain directives (modify “BaseRuleBasedProfiler”)

  • comment – Optional comment for “citation” of “ExpectationSuite” returned as part of “RuleBasedProfilerResult”

Returns

“RuleBasedProfilerResult” dataclass object, containing essential outputs of profiling.

get_expectation_configurations(self)
Returns

List of ExpectationConfiguration objects, accumulated from RuleState of every Rule executed.

get_fully_qualified_parameter_names_by_domain(self)
Returns

Dictionary of fully-qualified parameter names by Domain, accumulated from RuleState of every Rule executed.

get_fully_qualified_parameter_names_for_domain_id(self, domain_id: str)
Parameters

domain_id – ID of desired Domain object.

Returns

List of fully-qualified parameter names for Domain with domain_id as specified, accumulated from RuleState of corresponding Rule executed.

get_parameter_values_for_fully_qualified_parameter_names_by_domain(self)
Returns

Dictionaries of values for fully-qualified parameter names by Domain, accumulated from RuleState of every Rule executed.

get_parameter_values_for_fully_qualified_parameter_names_for_domain_id(self, domain_id: str)
Parameters

domain_id – ID of desired Domain object.

Returns

Dictionary of values for fully-qualified parameter names for Domain with domain_id as specified, accumulated from RuleState of corresponding Rule executed.

add_rule(self, rule: Rule)

Add Rule object to existing profiler object by reconciling profiler rules and updating _profiler_config.

reconcile_profiler_variables(self, variables: Optional[Dict[str, Any]] = None, reconciliation_strategy: ReconciliationStrategy = DEFAULT_RECONCILATION_DIRECTIVES.variables)

Profiler “variables” reconciliation involves combining the variables, instantiated from Profiler configuration (e.g., stored in a YAML file managed by the Profiler store), with the variables overrides, provided at run time.

The reconciliation logic for “variables” is of the “replace” nature: An override value complements the original on key “miss”, and replaces the original on key “hit” (or “collision”), because “variables” is a unique member.

Parameters
  • variables – variables overrides, supplied in dictionary (configuration) form

  • reconciliation_strategy – one of update, nested_update, or overwrite ways of reconciling overwrites

Returns

reconciled variables in their canonical ParameterContainer object form

_reconcile_profiler_variables_as_dict(self, variables: Optional[Dict[str, Any]], reconciliation_strategy: ReconciliationStrategy = DEFAULT_RECONCILATION_DIRECTIVES.variables)
reconcile_profiler_rules(self, rules: Optional[Dict[str, Dict[str, Any]]] = None, reconciliation_directives: ReconciliationDirectives = DEFAULT_RECONCILATION_DIRECTIVES)

Profiler “rules” reconciliation involves combining the rules, instantiated from Profiler configuration (e.g., stored in a YAML file managed by the Profiler store), with the rules overrides, provided at run time.

The reconciliation logic for “rules” is of the “procedural” nature: (1) Combine every rule override configuration with any instantiated rule into a reconciled configuration (2) Re-instantiate Rule objects from the reconciled rule configurations

Parameters

rules – rules overrides, supplied in dictionary (configuration) form for each rule name as the key

:param reconciliation_directives directives for how each rule component should be overwritten :return: reconciled rules in their canonical List[Rule] object form

_reconcile_profiler_rules_as_dict(self, rules: Optional[Dict[str, Dict[str, Any]]] = None, reconciliation_directives: ReconciliationDirectives = DEFAULT_RECONCILATION_DIRECTIVES)
static _reconcile_rule_config(existing_rules: Dict[str, Rule], rule_name: str, rule_config: dict, reconciliation_directives: ReconciliationDirectives = DEFAULT_RECONCILATION_DIRECTIVES)

A “rule configuration” reconciliation is the process of combining the configuration of a single candidate override rule with at most one configuration corresponding to the list of rules instantiated from Profiler configuration (e.g., stored in a YAML file managed by the Profiler store).

The reconciliation logic for “Rule configuration” employes the “by construction” principle: (1) Find a common configuration between the variables configuration, possibly supplied as part of the candiate override Rule configuration, and the variables configuration of an instantiated Rule (2) Find a common configuration between the domain builder configuration, possibly supplied as part of the candiate override Rule configuration, and the domain builder configuration of an instantiated Rule (3) Find common configurations between parameter builder configurations, possibly supplied as part of the candiate override Rule configuration, and the parameter builder configurations of an instantiated Rule (4) Find common configurations between expectation configuration builder configurations, possibly supplied as part of the candiate override Rule configuration, and the expectation configuration builder configurations of an instantiated Rule (5) Construct the reconciled Rule configuration dictionary using the formal Rule properties (“domain_builder”, “parameter_builders”, and “expectation_configuration_builders”) as keys and their reconciled configuration dictionaries as values

In order to insure successful instantiation of custom builder classes using “instantiate_class_from_config()”, candidate builder override configurations are required to supply both “class_name” and “module_name” attributes.

Parameters
  • existing_rules – all currently instantiated rules represented as a dictionary, keyed by rule name

  • rule_name – name of the override rule candidate

  • rule_config – configuration of an override rule candidate, supplied in dictionary (configuration) form

:param reconciliation_directives directives for how each rule component should be overwritten :return: reconciled rule configuration, returned in dictionary (configuration) form

static _reconcile_rule_domain_builder_config(domain_builder: DomainBuilder, domain_builder_config: dict, reconciliation_strategy: ReconciliationStrategy = DEFAULT_RECONCILATION_DIRECTIVES.domain_builder)

Rule “domain builder” reconciliation involves combining the domain builder, instantiated from Rule configuration (e.g., stored in a YAML file managed by the Profiler store), with the domain builder override, possibly supplied as part of the candiate override rule configuration.

The reconciliation logic for “domain builder” is of the “replace” nature: An override value complements the original on key “miss”, and replaces the original on key “hit” (or “collision”), because “domain builder” is a unique member for a Rule.

Parameters
  • domain_builder – existing domain builder of a Rule

  • domain_builder_config – domain builder configuration override, supplied in dictionary (configuration) form

  • reconciliation_strategy – one of update, nested_update, or overwrite ways of reconciling overwrites

Returns

reconciled domain builder configuration, returned in dictionary (configuration) form

static _reconcile_rule_parameter_builder_configs(rule: Rule, parameter_builder_configs: List[dict], reconciliation_strategy: ReconciliationStrategy = DEFAULT_RECONCILATION_DIRECTIVES.parameter_builder)

Rule “parameter builders” reconciliation involves combining the parameter builders, instantiated from Rule configuration (e.g., stored in a YAML file managed by the Profiler store), with the parameter builders overrides, possibly supplied as part of the candiate override rule configuration.

The reconciliation logic for “parameter builders” is of the “upsert” nature: A candidate override parameter builder configuration contributes to the parameter builders list of the rule if the corresponding parameter builder name does not exist in the list of instantiated parameter builders of the rule; otherwise, once instnatiated, it replaces the configuration associated with the original parameter builder having the same name.

Parameters
  • rule – Profiler “rule”, subject to parameter builder overrides

  • parameter_builder_configs – parameter builder configuration overrides, supplied in dictionary (configuration) form

  • reconciliation_strategy – one of update, nested_update, or overwrite ways of reconciling overwrites

Returns

reconciled parameter builder configuration, returned in dictionary (configuration) form

static _reconcile_rule_expectation_configuration_builder_configs(rule: Rule, expectation_configuration_builder_configs: List[dict], reconciliation_strategy: ReconciliationStrategy = DEFAULT_RECONCILATION_DIRECTIVES.expectation_configuration_builder)

Rule “expectation configuration builders” reconciliation involves combining the expectation configuration builders, instantiated from Rule configuration (e.g., stored in a YAML file managed by the Profiler store), with the expectation configuration builders overrides, possibly supplied as part of the candiate override rule configuration.

The reconciliation logic for “expectation configuration builders” is of the “upsert” nature: A candidate override expectation configuration builder configuration contributes to the expectation configuration builders list of the rule if the corresponding expectation configuration builder name does not exist in the list of instantiated expectation configuration builders of the rule; otherwise, once instnatiated, it replaces the configuration associated with the original expectation configuration builder having the same name.

Parameters
  • rule – Profiler “rule”, subject to expectations configuration builder overrides

  • expectation_configuration_builder_configs – expectation configuration builder configuration overrides, supplied in dictionary (configuration) form

  • reconciliation_strategy – one of update, nested_update, or overwrite ways of reconciling overwrites

Returns

reconciled expectation configuration builder configuration, returned in dictionary (configuration) form

_get_rules_as_dict(self)
_apply_runtime_environment(self, variables: Optional[ParameterContainer] = None, rules: Optional[List[Rule]] = None, variables_directives_list: Optional[List[RuntimeEnvironmentVariablesDirectives]] = None, domain_type_directives_list: Optional[List[RuntimeEnvironmentDomainTypeDirectives]] = None)

variables: attribute name/value pairs, commonly-used in Builder objects, to modify using “runtime_environment” rules: name/(configuration-dictionary) to modify using “runtime_environment” variables_directives_list: additional/override runtime variables directives (modify “BaseRuleBasedProfiler”) domain_type_directives_list: additional/override runtime domain directives (modify “BaseRuleBasedProfiler”)

static _apply_variables_directives_runtime_environment(rules: Optional[List[Rule]] = None, variables_directives_list: Optional[List[RuntimeEnvironmentVariablesDirectives]] = None)

rules: name/(configuration-dictionary) to modify using “runtime_environment” variables_directives_list: additional/override runtime variables directives (modify “BaseRuleBasedProfiler”)

static _apply_domain_type_directives_runtime_environment(rules: Optional[List[Rule]] = None, domain_type_directives_list: Optional[List[RuntimeEnvironmentDomainTypeDirectives]] = None)

rules: name/(configuration-dictionary) to modify using “runtime_environment” domain_type_directives_list: additional/override runtime domain directives (modify “BaseRuleBasedProfiler”)

static _get_effective_domain_builder_property_value(dest_property_value: Optional[Any] = None, source_property_value: Optional[Any] = None)
static run_profiler(data_context: AbstractDataContext, profiler_store: ProfilerStore, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequestBase, dict]] = None, name: Optional[str] = None, ge_cloud_id: Optional[str] = None, variables: Optional[dict] = None, rules: Optional[dict] = None)
static run_profiler_on_data(data_context: AbstractDataContext, profiler_store: ProfilerStore, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequestBase, dict]] = None, name: Optional[str] = None, ge_cloud_id: Optional[str] = None)
static add_profiler(config: RuleBasedProfilerConfig, data_context: AbstractDataContext, profiler_store: ProfilerStore)
static _check_validity_of_batch_requests_in_config(config: RuleBasedProfilerConfig)
static get_profiler(data_context: AbstractDataContext, profiler_store: ProfilerStore, name: Optional[str] = None, ge_cloud_id: Optional[str] = None)
static delete_profiler(profiler_store: ProfilerStore, name: Optional[str] = None, ge_cloud_id: Optional[str] = None)
static list_profilers(profiler_store: ProfilerStore, ge_cloud_mode: bool = False)
self_check(self, pretty_print: bool = True)

Necessary to enable integration with AbstractDataContext.test_yaml_config :param pretty_print: flag to turn on verbose output

Returns

Dictionary that contains RuleBasedProfiler state

property config(self)
property name(self)
property config_version(self)
property variables(self)
property rules(self)
property rule_states(self)
to_json_dict(self)
__repr__(self)

Return repr(self).

__str__(self)

Return str(self).

class great_expectations.rule_based_profiler.RuleBasedProfiler(name: str, config_version: float, variables: Optional[Dict[str, Any]] = None, rules: Optional[Dict[str, Dict[str, Any]]] = None, data_context: Optional[AbstractDataContext] = None, id: Optional[str] = None)

Bases: great_expectations.rule_based_profiler.rule_based_profiler.BaseRuleBasedProfiler

RuleBasedProfiler object serves to profile, or automatically evaluate a set of rules, upon a given batch / multiple batches of data.

Feature Maturity

icon-9061bc007afb11eda8370242ac110002 Rule-Based Profiler - How-to Guide
Use YAML to configure a flexible Profiler engine, which will then generate an ExpectationSuite for a data set
Maturity: Experimental
Details:
API Stability: Low (instantiation of Profiler and the signature of the run() method will change)
Implementation Completeness: Moderate (some augmentation and/or growth in capabilities is to be expected)
Unit Test Coverage: High (but not complete – additional unit tests will be added, commensurate with the upcoming new functionality)
Integration Infrastructure/Test Coverage: N/A -> TBD
Documentation Completeness: Moderate
Bug Risk: Low/Moderate
Expectation Completeness: Moderate
icon-9061bdea7afb11eda8370242ac110002 Domain Builders - How-to Guide
Use YAML to build domains for ExpectationConfiguration generator (table, column, semantic types, etc.)
Maturity: Experimental
Details:
API Stability: Moderate
Implementation Completeness: Moderate (additional DomainBuilder classes will be developed)
Unit Test Coverage: High (but not complete – additional unit tests will be added, commensurate with the upcoming new functionality)
Integration Infrastructure/Test Coverage: N/A -> TBD
Documentation Completeness: Moderate
Bug Risk: Low/Moderate
Expectation Completeness: Moderate
icon-9061bee47afb11eda8370242ac110002 Parameter Builders - How-to Guide
Use YAML to configure single and multi batch based parameter computation modules for the use by ExpectationConfigurationBuilder classes
Maturity: Experimental
Details:
API Stability: Moderate
Implementation Completeness: Moderate (additional ParameterBuilder classes will be developed)
Unit Test Coverage: High (but not complete – additional unit tests will be added, commensurate with the upcoming new functionality)
Integration Infrastructure/Test Coverage: N/A -> TBD
Documentation Completeness: Moderate
Bug Risk: Low/Moderate
Expectation Completeness: Moderate
icon-9061bfb67afb11eda8370242ac110002 ExpectationConfiguration Builders - How-to Guide
Use YAML to configure ExpectationConfigurationBuilder classes, which emit lists of ExpectationConfiguration objects (e.g., as kwargs and meta arguments)
Maturity: Experimental
Details:
API Stability: Moderate
Implementation Completeness: Moderate (additional ExpectationConfigurationBuilder classes might be developed)
Unit Test Coverage: High (but not complete – additional unit tests will be added, commensurate with the upcoming new functionality)
Integration Infrastructure/Test Coverage: N/A -> TBD
Documentation Completeness: Moderate
Bug Risk: Low/Moderate
Expectation Completeness: Moderate