great_expectations.profile.user_configurable_profiler
¶
Module Contents¶
Classes¶
|
The UserConfigurableProfiler is used to build an expectation suite from a dataset. The expectations built are |
-
great_expectations.profile.user_configurable_profiler.
logger
¶
-
class
great_expectations.profile.user_configurable_profiler.
UserConfigurableProfiler
(profile_dataset: Union[Batch, Dataset, Validator], excluded_expectations: Optional[List[str]] = None, ignored_columns: Optional[List[str]] = None, not_null_only: bool = False, primary_or_compound_key: Optional[List[str]] = None, semantic_types_dict: Optional[Dict[str, List[str]]] = None, table_expectations_only: bool = False, value_set_threshold: str = 'MANY')¶ The UserConfigurableProfiler is used to build an expectation suite from a dataset. The expectations built are strict - they can be used to determine whether two tables are the same.
The profiler may be instantiated with or without a number of configuration arguments. Once a profiler is instantiated, if these arguments change, a new profiler will be needed.
A profiler is used to build a suite without a config as follows:
profiler = UserConfigurableProfiler(dataset) suite = profiler.build_suite()
A profiler is used to build a suite with a semantic_types dict, as follows:
- semantic_types_dict = {
“numeric”: [“c_acctbal”], “string”: [“c_address”,”c_custkey”], “value_set”: [“c_nationkey”,”c_mktsegment”, ‘c_custkey’, ‘c_name’, ‘c_address’, ‘c_phone’],
}
profiler = UserConfigurableProfiler(dataset, semantic_types_dict=semantic_types_dict) suite = profiler.build_suite()
-
build_suite
(self)¶ User-facing expectation-suite building function. Works with an instantiated UserConfigurableProfiler object. Args:
- Returns
An expectation suite built either with or without a semantic_types dict
-
_send_usage_stats_message
(self)¶
-
_build_expectation_suite_from_semantic_types_dict
(self)¶ Uses a semantic_type dict to determine which expectations to add to the suite, then builds the suite Args:
- Returns
An expectation suite built from a semantic_types dict
-
_profile_and_build_expectation_suite
(self)¶ Profiles the provided dataset to determine which expectations to add to the suite, then builds the suite Args:
- Returns
An expectation suite built after profiling the dataset
-
_validate_semantic_types_dict
(self)¶ Validates a semantic_types dict to ensure correct formatting, that all semantic_types are recognized, and that the semantic_types align with the column data types Args:
- Returns
The validated semantic_types dictionary
-
_add_column_type_to_column_info
(self, profile_dataset, column_name)¶ Adds the data type of a column to the column_info dictionary on self :param profile_dataset: A GE dataset :param column_name: The name of the column for which to retrieve the data type
- Returns
The type of the column
-
static
_get_column_type
(profile_dataset, column)¶ Determines the data type of a column by evaluating the success of expect_column_values_to_be_in_type_list. In the case of type Decimal, this data type is returned as NUMERIC, which contains the type lists for both INTs and FLOATs.
The type_list expectation used here is removed, since it will need to be built once the build_suite function is actually called. This is because calling build_suite wipes any existing expectations, so expectations called during the init of the profiler do not persist.
- Parameters
profile_dataset – A GE dataset
column – The column for which to get the data type
- Returns
The data type of the specified column
-
_add_column_cardinality_to_column_info
(self, profile_dataset, column_name)¶ Adds the cardinality of a column to the column_info dictionary on self :param profile_dataset: A GE Dataset :param column_name: The name of the column for which to add cardinality
- Returns
The cardinality of the column
-
static
_get_column_cardinality
(profile_dataset, column)¶ Determines the cardinality of a column using the get_basic_column_cardinality method from OrderedProfilerCardinality :param profile_dataset: A GE Dataset :param column: The column for which to get cardinality
- Returns
The cardinality of the specified column
-
_add_semantic_types_by_column_from_config_to_column_info
(self, column_name)¶ Adds the semantic type of a column to the column_info dict on self, for display purposes after suite creation :param column_name: The name of the column
- Returns
A list of semantic_types for a given column
-
_build_column_description_metadata
(self, profile_dataset)¶ Adds column description metadata to the suite on a Dataset object :param profile_dataset: A GE Dataset
- Returns
An expectation suite with column description metadata
-
_display_suite_by_column
(self, suite)¶ Displays the expectations of a suite by column, along with the column cardinality, and semantic or data type so that a user can easily see which expectations were created for which columns :param suite: An ExpectationSuite
- Returns
The ExpectationSuite
-
_build_expectations_value_set
(self, profile_dataset, column)¶ Adds a value_set expectation for a given column :param profile_dataset: A GE Dataset :param column: The column for which to add an expectation
- Returns
The GE Dataset
-
_build_expectations_numeric
(self, profile_dataset, column)¶ Adds a set of numeric expectations for a given column :param profile_dataset: A GE Dataset :param column: The column for which to add expectations
- Returns
The GE Dataset
-
_build_expectations_primary_or_compound_key
(self, profile_dataset, column_list)¶ Adds a uniqueness expectation for a given column or set of columns :param profile_dataset: A GE Dataset :param column_list: A list containing one or more columns for which to add a uniqueness expectation
- Returns
The GE Dataset
-
_build_expectations_string
(self, profile_dataset, column)¶ Adds a set of string expectations for a given column. Currently does not do anything. With the 0.12 API there isn’t a quick way to introspect for value_lengths - if we did that, we could build a potentially useful value_lengths expectation here. :param profile_dataset: A GE Dataset :param column: The column for which to add the expectation
- Returns
The GE Dataset
-
_build_expectations_datetime
(self, profile_dataset, column)¶ Adds expect_column_values_to_be_between for a given column :param profile_dataset: A GE Dataset :param column: The column for which to add the expectation
- Returns
The GE Dataset
-
_build_expectations_for_all_column_types
(self, profile_dataset, column)¶ - Adds these expectations for all included columns irrespective of type. Includes:
expect_column_values_to_not_be_null (or expect_column_values_to_be_null)
expect_column_proportion_of_unique_values_to_be_between
expect_column_values_to_be_in_type_list
- Parameters
profile_dataset – A GE Dataset
column – The column for which to add the expectations
- Returns
The GE Dataset
-
_build_expectations_table
(self, profile_dataset)¶ Adds two table level expectations to the dataset :param profile_dataset: A GE Dataset
- Returns
The GE Dataset