great_expectations.profile.user_configurable_profiler

Module Contents

Classes

UserConfigurableProfiler(profile_dataset, excluded_expectations: list = None, ignored_columns: list = None, not_null_only: bool = False, primary_or_compound_key: list = False, semantic_types_dict: dict = None, table_expectations_only: bool = False, value_set_threshold: str = ‘MANY’)

The UserConfigurableProfiler is used to build an expectation suite from a dataset. The expectations built are

great_expectations.profile.user_configurable_profiler.logger
class great_expectations.profile.user_configurable_profiler.UserConfigurableProfiler(profile_dataset, excluded_expectations: list = None, ignored_columns: list = None, not_null_only: bool = False, primary_or_compound_key: list = False, semantic_types_dict: dict = None, table_expectations_only: bool = False, value_set_threshold: str = 'MANY')

The UserConfigurableProfiler is used to build an expectation suite from a dataset. The expectations built are strict - they can be used to determine whether two tables are the same.

The profiler may be instantiated with or without a number of configuration arguments. Once a profiler is instantiated, if these arguments change, a new profiler will be needed.

A profiler is used to build a suite without a config as follows:

profiler = UserConfigurableProfiler(dataset) suite = profiler.build_suite()

A profiler is used to build a suite with a semantic_types dict, as follows:

semantic_types_dict = {

“numeric”: [“c_acctbal”], “string”: [“c_address”,”c_custkey”], “value_set”: [“c_nationkey”,”c_mktsegment”, ‘c_custkey’, ‘c_name’, ‘c_address’, ‘c_phone’],

}

profiler = UserConfigurableProfiler(dataset, semantic_types_dict=semantic_types_dict) suite = profiler.build_suite()

build_suite(self)

User-facing expectation-suite building function. Works with an instantiated UserConfigurableProfiler object. Args:

Returns

An expectation suite built either with or without a semantic_types dict

_build_expectation_suite_from_semantic_types_dict(self)

Uses a semantic_type dict to determine which expectations to add to the suite, then builds the suite Args:

Returns

An expectation suite built from a semantic_types dict

_profile_and_build_expectation_suite(self)

Profiles the provided dataset to determine which expectations to add to the suite, then builds the suite Args:

Returns

An expectation suite built after profiling the dataset

_validate_semantic_types_dict(self, profile_dataset)

Validates a semantic_types dict to ensure correct formatting, that all semantic_types are recognized, and that the semantic_types align with the column data types :param profile_dataset: A GE dataset :param config: A config dictionary

Returns

The validated semantic_types dictionary

_add_column_type_to_column_info(self, profile_dataset, column_name)

Adds the data type of a column to the column_info dictionary on self :param profile_dataset: A GE dataset :param column_name: The name of the column for which to retrieve the data type

Returns

The type of the column

_get_column_type(self, profile_dataset, column)

Determines the data type of a column by evaluating the success of expect_column_values_to_be_in_type_list. In the case of type Decimal, this data type is returned as NUMERIC, which contains the type lists for both INTs and FLOATs.

The type_list expectation used here is removed, since it will need to be built once the build_suite function is actually called. This is because calling build_suite wipes any existing expectations, so expectations called during the init of the profiler do not persist.

Parameters
  • profile_dataset – A GE dataset

  • column – The column for which to get the data type

Returns

The data type of the specified column

_add_column_cardinality_to_column_info(self, profile_dataset, column_name)

Adds the cardinality of a column to the column_info dictionary on self :param profile_dataset: A GE Dataset :param column_name: The name of the column for which to add cardinality

Returns

The cardinality of the column

_get_column_cardinality(self, profile_dataset, column)

Determines the cardinality of a column using the get_basic_column_cardinality method from OrderedProfilerCardinality :param profile_dataset: A GE Dataset :param column: The column for which to get cardinality

Returns

The cardinality of the specified column

_add_semantic_types_by_column_from_config_to_column_info(self, column_name)

Adds the semantic type of a column to the column_info dict on self, for display purposes after suite creation :param column_name: The name of the column

Returns

A list of semantic_types for a given column

_build_column_description_metadata(self, profile_dataset)

Adds column description metadata to the suite on a Dataset object :param profile_dataset: A GE Dataset

Returns

An expectation suite with column description metadata

_display_suite_by_column(self, suite)

Displays the expectations of a suite by column, along with the column cardinality, and semantic or data type so that a user can easily see which expectations were created for which columns :param suite: An ExpectationSuite

Returns

The ExpectationSuite

_build_expectations_value_set(self, profile_dataset, column, **kwargs)

Adds a value_set expectation for a given column :param profile_dataset: A GE Dataset :param column: The column for which to add an expectation :param **kwargs:

Returns

The GE Dataset

_build_expectations_numeric(self, profile_dataset, column, **kwargs)

Adds a set of numeric expectations for a given column :param profile_dataset: A GE Dataset :param column: The column for which to add expectations :param **kwargs:

Returns

The GE Dataset

_build_expectations_primary_or_compound_key(self, profile_dataset, column_list, **kwargs)

Adds a uniqueness expectation for a given column or set of columns :param profile_dataset: A GE Dataset :param column_list: A list containing one or more columns for which to add a uniqueness expectation :param **kwargs:

Returns

The GE Dataset

_build_expectations_string(self, profile_dataset, column, **kwargs)

Adds a set of string expectations for a given column. Currently does not do anything. With the 0.12 API there isn’t a quick way to introspect for value_lengths - if we did that, we could build a potentially useful value_lengths expectation here. :param profile_dataset: A GE Dataset :param column: The column for which to add expectations :param **kwargs:

Returns

The GE Dataset

_build_expectations_datetime(self, profile_dataset, column, **kwargs)

Adds expect_column_values_to_be_between for a given column :param profile_dataset: A GE Dataset :param column: The column for which to add the expectation :param **kwargs:

Returns

The GE Dataset

_build_expectations_for_all_column_types(self, profile_dataset, column, **kwargs)
Adds these expectations for all included columns irrespective of type. Includes:
  • expect_column_values_to_not_be_null (or expect_column_values_to_be_null)

  • expect_column_proportion_of_unique_values_to_be_between

  • expect_column_values_to_be_in_type_list

Parameters
  • profile_dataset – A GE Dataset

  • column – The column for which to add the expectations

  • **kwargs

Returns

The GE Dataset

_build_expectations_table(self, profile_dataset, **kwargs)

Adds two table level expectations to the dataset :param profile_dataset: A GE Dataset :param **kwargs:

Returns

The GE Dataset

_is_nan(self, value)

If value is an array, test element-wise for NaN and return result as a boolean array. If value is a scalar, return boolean. :param value: The value to test

Returns

The results of the test