great_expectations.profile

Package Contents

Classes

BasicDatasetProfiler()

BasicDatasetProfiler is inspired by the beloved pandas_profiling project.

BasicSuiteBuilderProfiler()

This profiler helps build coarse expectations for columns you care about.

ColumnsExistProfiler()

class great_expectations.profile.BasicDatasetProfiler

Bases: great_expectations.profile.basic_dataset_profiler.BasicDatasetProfilerBase

BasicDatasetProfiler is inspired by the beloved pandas_profiling project.

The profiler examines a batch of data and creates a report that answers the basic questions most data practitioners would ask about a dataset during exploratory data analysis. The profiler reports how unique the values in the column are, as well as the percentage of empty values in it. Based on the column’s type it provides a description of the column by computing a number of statistics, such as min, max, mean and median, for numeric columns, and distribution of values, when appropriate.

classmethod _profile(cls, dataset, configuration=None)
class great_expectations.profile.BasicSuiteBuilderProfiler

Bases: great_expectations.profile.basic_dataset_profiler.BasicDatasetProfilerBase

This profiler helps build coarse expectations for columns you care about.

The goal of this profiler is to expedite the process of authoring an expectation suite by building possibly relevant exceptions for columns that you care about. You can then easily edit the suite and adjust or delete these expectations to hone your new suite.

Ranges of acceptable values in the expectations created by this profiler (for example, the min/max of the value in expect_column_values_to_be_between) are created only to demonstrate the functionality and should not be taken as the actual ranges. You should definitely edit this coarse suite.

Configuration is optional, and if not provided, this profiler will create expectations for all columns.

Configuration is a dictionary with a columns key containing a list of the column names you want coarse expectations created for. This dictionary can also contain a excluded_expectations key with a list of expectation names you do not want created or a included_expectations key with a list of expectation names you want created (if applicable).

For example, if you had a wide patients table and you want expectations on three columns, you’d do this:

suite, validation_result = BasicSuiteBuilderProfiler().profile(

dataset, {“columns”: [“id”, “username”, “address”]}

)

For example, if you had a wide patients table and you want expectations on all columns, excluding three statistical expectations, you’d do this:

suite, validation_result = BasicSuiteBuilderProfiler().profile(

dataset, {

“excluded_expectations”: [

“expect_column_mean_to_be_between”, “expect_column_median_to_be_between”, “expect_column_quantile_values_to_be_between”,

],

}

)

For example, if you had a wide patients table and you want only two types of expectations on all applicable columns you’d do this:

suite, validation_result = BasicSuiteBuilderProfiler().profile(

dataset, {

“included_expectations”: [

“expect_column_to_not_be_null”, “expect_column_values_to_be_in_set”,

],

}

)

It can also be used to generate an expectation suite that contains one instance of every interesting expectation type.

When used in this “demo” mode, the suite is intended to demonstrate of the expressive power of expectations and provide a service similar to the one expectations glossary documentation page, but on a users’ own data.

suite, validation_result = BasicSuiteBuilderProfiler().profile(dataset, configuration=”demo”)

classmethod _get_column_type_with_caching(cls, dataset, column_name, cache)
classmethod _get_column_cardinality_with_caching(cls, dataset, column_name, cache)
classmethod _create_expectations_for_low_card_column(cls, dataset, column, column_cache, excluded_expectations=None, included_expectations=None)
classmethod _create_non_nullity_expectations(cls, dataset, column, excluded_expectations=None, included_expectations=None)
classmethod _create_expectations_for_numeric_column(cls, dataset, column, excluded_expectations=None, included_expectations=None)
classmethod _create_expectations_for_string_column(cls, dataset, column, excluded_expectations=None, included_expectations=None)
classmethod _find_next_low_card_column(cls, dataset, columns, profiled_columns, column_cache)
classmethod _find_next_numeric_column(cls, dataset, columns, profiled_columns, column_cache)
classmethod _find_next_string_column(cls, dataset, columns, profiled_columns, column_cache)
classmethod _find_next_datetime_column(cls, dataset, columns, profiled_columns, column_cache)
classmethod _create_expectations_for_datetime_column(cls, dataset, column, excluded_expectations=None, included_expectations=None)
classmethod _profile(cls, dataset, configuration=None)
classmethod _demo_profile(cls, dataset)
classmethod _build_table_row_count_expectation(cls, dataset, tolerance=0.1, excluded_expectations=None, included_expectations=None)
classmethod _build_table_column_expectations(cls, dataset, excluded_expectations=None, included_expectations=None)
classmethod _build_column_description_metadata(cls, dataset)
class great_expectations.profile.ColumnsExistProfiler

Bases: great_expectations.profile.base.DatasetProfiler

classmethod _profile(cls, dataset, configuration=None)

This function will take a dataset and add expectations that each column present exists.

Parameters
  • dataset (great_expectations.dataset) – The dataset to profile and to which to add expectations.

  • configuration – Configuration for select profilers.