great_expectations.profile
¶
Submodules¶
great_expectations.profile.base
great_expectations.profile.basic_dataset_profiler
great_expectations.profile.basic_suite_builder_profiler
great_expectations.profile.columns_exist
great_expectations.profile.json_schema_profiler
great_expectations.profile.metrics_utils
great_expectations.profile.multi_batch_validation_meta_analysis
Package Contents¶
Classes¶
BasicDatasetProfiler is inspired by the beloved pandas_profiling project. |
|
This profiler helps build coarse expectations for columns you care about. |
|
-
class
great_expectations.profile.
BasicDatasetProfiler
¶ Bases:
great_expectations.profile.basic_dataset_profiler.BasicDatasetProfilerBase
BasicDatasetProfiler is inspired by the beloved pandas_profiling project.
The profiler examines a batch of data and creates a report that answers the basic questions most data practitioners would ask about a dataset during exploratory data analysis. The profiler reports how unique the values in the column are, as well as the percentage of empty values in it. Based on the column’s type it provides a description of the column by computing a number of statistics, such as min, max, mean and median, for numeric columns, and distribution of values, when appropriate.
-
classmethod
_profile
(cls, dataset, configuration=None)¶
-
classmethod
-
class
great_expectations.profile.
BasicSuiteBuilderProfiler
¶ Bases:
great_expectations.profile.basic_dataset_profiler.BasicDatasetProfilerBase
This profiler helps build coarse expectations for columns you care about.
The goal of this profiler is to expedite the process of authoring an expectation suite by building possibly relevant expections for columns that you care about. You can then easily edit the suite and adjust or delete these expectations to hone your new suite.
Ranges of acceptable values in the expectations created by this profiler (for example, the min/max of the value in expect_column_values_to_be_between) are created only to demonstrate the functionality and should not be taken as the actual ranges. You should definitely edit this coarse suite.
Configuration is optional, and if not provided, this profiler will create expectations for all columns.
Configuration is a dictionary with a columns key containing a list of the column names you want coarse expectations created for. This dictionary can also contain a excluded_expectations key with a list of expectation names you do not want created or a included_expectations key with a list of expectation names you want created (if applicable).
For example, if you had a wide patients table and you want expectations on three columns, you’d do this:
- suite, validation_result = BasicSuiteBuilderProfiler().profile(
dataset, {“columns”: [“id”, “username”, “address”]}
)
For example, if you had a wide patients table and you want expectations on all columns, excluding three statistical expectations, you’d do this:
- suite, validation_result = BasicSuiteBuilderProfiler().profile(
dataset, {
“excluded_expectations”: [
“expect_column_mean_to_be_between”, “expect_column_median_to_be_between”, “expect_column_quantile_values_to_be_between”,
],
}
)
For example, if you had a wide patients table and you want only two types of expectations on all applicable columns you’d do this:
- suite, validation_result = BasicSuiteBuilderProfiler().profile(
dataset, {
“included_expectations”: [
“expect_column_to_not_be_null”, “expect_column_values_to_be_in_set”,
],
}
)
It can also be used to generate an expectation suite that contains one instance of every interesting expectation type.
When used in this “demo” mode, the suite is intended to demonstrate of the expressive power of expectations and provide a service similar to the one expectations glossary documentation page, but on a users’ own data.
suite, validation_result = BasicSuiteBuilderProfiler().profile(dataset, configuration=”demo”)
-
classmethod
_get_column_type_with_caching
(cls, dataset, column_name, cache)¶
-
classmethod
_get_column_cardinality_with_caching
(cls, dataset, column_name, cache)¶
-
classmethod
_create_expectations_for_low_card_column
(cls, dataset, column, column_cache)¶
-
classmethod
_create_non_nullity_expectations
(cls, dataset, column)¶
-
classmethod
_create_expectations_for_numeric_column
(cls, dataset, column)¶
-
classmethod
_create_expectations_for_string_column
(cls, dataset, column)¶
-
classmethod
_find_next_low_card_column
(cls, dataset, columns, profiled_columns, column_cache)¶
-
classmethod
_find_next_numeric_column
(cls, dataset, columns, profiled_columns, column_cache)¶
-
classmethod
_find_next_string_column
(cls, dataset, columns, profiled_columns, column_cache)¶
-
classmethod
_find_next_datetime_column
(cls, dataset, columns, profiled_columns, column_cache)¶
-
classmethod
_create_expectations_for_datetime_column
(cls, dataset, column)¶
-
classmethod
_profile
(cls, dataset, configuration=None)¶
-
classmethod
_demo_profile
(cls, dataset)¶
-
classmethod
_build_table_row_count_expectation
(cls, dataset, tolerance=0.1)¶
-
classmethod
_build_table_column_expectations
(cls, dataset)¶
-
classmethod
_build_column_description_metadata
(cls, dataset)¶
-
class
great_expectations.profile.
ColumnsExistProfiler
¶ Bases:
great_expectations.profile.base.DatasetProfiler
-
classmethod
_profile
(cls, dataset, configuration=None)¶ This function will take a dataset and add expectations that each column present exists.
- Parameters
dataset (great_expectations.dataset) – The dataset to profile and to which to add expectations.
configuration – Configuration for select profilers.
-
classmethod