great_expectations.dataset.pandas_dataset
¶
Module Contents¶
Classes¶
|
MetaPandasDataset is a thin layer between Dataset and PandasDataset. |
|
PandasDataset instantiates the great_expectations Expectations API as a subclass of a pandas.DataFrame. |
-
great_expectations.dataset.pandas_dataset.
logger
¶
-
class
great_expectations.dataset.pandas_dataset.
MetaPandasDataset
(*args, **kwargs)¶ Bases:
great_expectations.dataset.dataset.Dataset
MetaPandasDataset is a thin layer between Dataset and PandasDataset.
This two-layer inheritance is required to make @classmethod decorators work.
Practically speaking, that means that MetaPandasDataset implements expectation decorators, like column_map_expectation and column_aggregate_expectation, and PandasDataset implements the expectation methods themselves.
-
classmethod
column_map_expectation
(cls, func)¶ Constructs an expectation using column-map semantics.
The MetaPandasDataset implementation replaces the “column” parameter supplied by the user with a pandas Series object containing the actual column from the relevant pandas dataframe. This simplifies the implementing expectation logic while preserving the standard Dataset signature and expected behavior.
See
column_map_expectation
for full documentation of this function.
-
classmethod
column_pair_map_expectation
(cls, func)¶ The column_pair_map_expectation decorator handles boilerplate issues surrounding the common pattern of evaluating truthiness of some condition on a per row basis across a pair of columns.
-
classmethod
multicolumn_map_expectation
(cls, func)¶ The multicolumn_map_expectation decorator handles boilerplate issues surrounding the common pattern of evaluating truthiness of some condition on a per row basis across a set of columns.
-
classmethod
-
class
great_expectations.dataset.pandas_dataset.
PandasDataset
(*args, **kwargs)¶ Bases:
great_expectations.dataset.pandas_dataset.MetaPandasDataset
,pandas.DataFrame
PandasDataset instantiates the great_expectations Expectations API as a subclass of a pandas.DataFrame.
For the full API reference, please see
Dataset
Notes
Samples and Subsets of PandaDataSet have ALL the expectations of the original data frame unless the user specifies the
discard_subset_failing_expectations = True
property on the original data frame.Concatenations, joins, and merges of PandaDataSets contain NO expectations (since no autoinspection is performed by default).
Validation Engine - Pandas - How-to Guide
Use Pandas DataFrame to validate dataMaturity: ProductionDetails:API Stability: StableImplementation Completeness: CompleteUnit Test Coverage: CompleteIntegration Infrastructure/Test Coverage: N/A -> see relevant Datasource evaluationDocumentation Completeness: CompleteBug Risk: LowExpectation Completeness: Complete-
_internal_names
¶
-
_internal_names_set
¶
-
property
_constructor
(self)¶ Used when a manipulation result has the same dimensions as the original.
-
__finalize__
(self, other, method=None, **kwargs)¶ Propagate metadata from other to self.
- Parameters
other (the object from which to get the attributes that we are going) – to propagate
method (optional, a passed method name ; possibly to take different) – types of propagation actions based on this
-
get_row_count
(self)¶ Returns: int, table row count
-
get_column_count
(self)¶ Returns: int, table column count
-
get_table_columns
(self)¶ Returns: List[str], list of column names
-
get_column_sum
(self, column)¶ Returns: float
-
get_column_max
(self, column, parse_strings_as_datetimes=False)¶ Returns: any
-
get_column_min
(self, column, parse_strings_as_datetimes=False)¶ Returns: any
-
get_column_mean
(self, column)¶ Returns: float
-
get_column_nonnull_count
(self, column)¶ Returns: int
-
get_column_value_counts
(self, column, sort='value', collate=None)¶ Get a series containing the frequency counts of unique values from the named column.
- Parameters
column – the column for which to obtain value_counts
sort (string) – must be one of “value”, “count”, or “none”. - if “value” then values in the resulting partition object will be sorted lexigraphically - if “count” then values will be sorted according to descending count (frequency) - if “none” then values will not be sorted
collate (string) – the collate (sort) method to be used on supported backends (SqlAlchemy only)
- Returns
pd.Series of value counts for a column, sorted according to the value requested in sort
-
get_column_unique_count
(self, column)¶ Returns: int
-
get_column_modes
(self, column)¶ Returns: List[any], list of modes (ties OK)
-
get_column_median
(self, column)¶ Returns: any
-
get_column_quantiles
(self, column, quantiles, allow_relative_error=False)¶ Get the values in column closest to the requested quantiles :param column: name of column :type column: string :param quantiles: the quantiles to return. quantiles must be a tuple to ensure caching is possible :type quantiles: tuple of float
- Returns
the nearest values in the dataset to those quantiles
- Return type
List[any]
-
get_column_stdev
(self, column)¶ Returns: float
-
get_column_hist
(self, column, bins)¶ Get a histogram of column values :param column: the column for which to generate the histogram :param bins: the bins to slice the histogram. bins must be a tuple to ensure caching is possible :type bins: tuple
Returns: List[int], a list of counts corresponding to bins
-
get_column_count_in_range
(self, column, min_val=None, max_val=None, strict_min=False, strict_max=True)¶ Returns: int
-
expect_column_values_to_be_unique
(self, column, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect each column value to be unique.
This expectation detects duplicates. All duplicated values are counted as exceptions.
For example, [1, 2, 3, 3, 3] will return [3, 3, 3] in result.exceptions_list, with unexpected_percent = 60.0.
expect_column_values_to_be_unique is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_not_be_null
(self, column, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None, include_nulls=True)¶ Expect column values to not be null.
To be counted as an exception, values must be explicitly null or missing, such as a NULL in PostgreSQL or an np.NaN in pandas. Empty strings don’t count as null unless they have been coerced to a null type.
expect_column_values_to_not_be_null is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
See also
-
expect_column_values_to_be_null
(self, column, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column values to be null.
expect_column_values_to_be_null is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
See also
-
expect_column_values_to_be_of_type
(self, column, type_, **kwargs)¶ The pandas implementation of this expectation takes kwargs mostly, result_format, include_config, catch_exceptions, and meta as other expectations, however it declares **kwargs because it needs to be able to fork into either aggregate or map semantics depending on the column type (see below).
In Pandas, columns may be typed, or they may be of the generic “object” type which can include rows with different storage types in the same column.
To respect that implementation, the expect_column_values_to_be_of_type expectations will first attempt to use the column dtype information to determine whether the column is restricted to the provided type. If that is possible, then expect_column_values_to_be_of_type will return aggregate information including an observed_value, similarly to other backends.
If it is not possible (because the column dtype is “object” but a more specific type was specified), then PandasDataset will use column map semantics: it will return map expectation results and check each value individually, which can be substantially slower.
Unfortunately, the “object” type is also used to contain any string-type columns (including ‘str’ and numpy ‘string_’ (bytes)); consequently, it is not possible to test for string columns using aggregate semantics.
-
_expect_column_values_to_be_of_type__aggregate
(self, column, type_, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶
-
static
_native_type_type_map
(type_)¶
-
_expect_column_values_to_be_of_type__map
(self, column, type_, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶
-
expect_column_values_to_be_in_type_list
(self, column, type_list, **kwargs)¶ The pandas implementation of this expectation takes kwargs mostly, result_format, include_config, catch_exceptions, and meta as other expectations, however it declares **kwargs because it needs to be able to fork into either aggregate or map semantics depending on the column type (see below).
In Pandas, columns may be typed, or they may be of the generic “object” type which can include rows with different storage types in the same column.
To respect that implementation, the expect_column_values_to_be_of_type expectations will first attempt to use the column dtype information to determine whether the column is restricted to the provided type. If that is possible, then expect_column_values_to_be_of_type will return aggregate information including an observed_value, similarly to other backends.
If it is not possible (because the column dtype is “object” but a more specific type was specified), then PandasDataset will use column map semantics: it will return map expectation results and check each value individually, which can be substantially slower.
Unfortunately, the “object” type is also used to contain any string-type columns (including ‘str’ and numpy ‘string_’ (bytes)); consequently, it is not possible to test for string columns using aggregate semantics.
-
_expect_column_values_to_be_in_type_list__aggregate
(self, column, type_list, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶
-
_expect_column_values_to_be_in_type_list__map
(self, column, type_list, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶
-
expect_column_values_to_be_in_set
(self, column, value_set, mostly=None, parse_strings_as_datetimes=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect each column value to be in a given set.
For example:
# my_df.my_col = [1,2,2,3,3,3] >>> my_df.expect_column_values_to_be_in_set( "my_col", [2,3] ) { "success": false "result": { "unexpected_count": 1 "unexpected_percent": 16.66666666666666666, "unexpected_percent_nonmissing": 16.66666666666666666, "partial_unexpected_list": [ 1 ], }, }
expect_column_values_to_be_in_set is a
column_map_expectation
.- Parameters
column (str) – The column name.
value_set (set-like) – A set of objects used for comparison.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
parse_strings_as_datetimes (boolean or None) – If True values provided in value_set will be parsed as datetimes before making comparisons.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_not_be_in_set
(self, column, value_set, mostly=None, parse_strings_as_datetimes=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to not be in the set.
For example:
# my_df.my_col = [1,2,2,3,3,3] >>> my_df.expect_column_values_to_not_be_in_set( "my_col", [1,2] ) { "success": false "result": { "unexpected_count": 3 "unexpected_percent": 50.0, "unexpected_percent_nonmissing": 50.0, "partial_unexpected_list": [ 1, 2, 2 ], }, }
expect_column_values_to_not_be_in_set is a
column_map_expectation
.- Parameters
column (str) – The column name.
value_set (set-like) – A set of objects used for comparison.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
See also
-
expect_column_values_to_be_between
(self, column, min_value=None, max_value=None, strict_min=False, strict_max=False, parse_strings_as_datetimes=None, output_strftime_format=None, allow_cross_type_comparisons=None, mostly=None, row_condition=None, condition_parser=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be between a minimum value and a maximum value (inclusive).
expect_column_values_to_be_between is a
column_map_expectation
.- Parameters
column (str) – The column name.
min_value (comparable type or None) – The minimum value for a column entry.
max_value (comparable type or None) – The maximum value for a column entry.
- Keyword Arguments
strict_min (boolean) – If True, values must be strictly larger than min_value, default=False
strict_max (boolean) – If True, values must be strictly smaller than max_value, default=False allow_cross_type_comparisons (boolean or None) : If True, allow comparisons between types (e.g. integer and string). Otherwise, attempting such comparisons will raise an exception.
parse_strings_as_datetimes (boolean or None) – If True, parse min_value, max_value, and all non-null column values to datetimes before making comparisons.
output_strftime_format (str or None) – A valid strfime format for datetime output. Only used if parse_strings_as_datetimes=True.
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
Notes
min_value and max_value are both inclusive unless strict_min or strict_max are set to True.
If min_value is None, then max_value is treated as an upper bound, and there is no minimum value checked.
If max_value is None, then min_value is treated as a lower bound, and there is no maximum value checked.
-
expect_column_values_to_be_increasing
(self, column, strictly=None, parse_strings_as_datetimes=None, output_strftime_format=None, mostly=None, row_condition=None, condition_parser=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column values to be increasing.
By default, this expectation only works for numeric or datetime data. When parse_strings_as_datetimes=True, it can also parse strings to datetimes.
If strictly=True, then this expectation is only satisfied if each consecutive value is strictly increasing–equal values are treated as failures.
expect_column_values_to_be_increasing is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
strictly (Boolean or None) – If True, values must be strictly greater than previous values
parse_strings_as_datetimes (boolean or None) – If True, all non-null column values to datetimes before making comparisons
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_be_decreasing
(self, column, strictly=None, parse_strings_as_datetimes=None, output_strftime_format=None, mostly=None, row_condition=None, condition_parser=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column values to be decreasing.
By default, this expectation only works for numeric or datetime data. When parse_strings_as_datetimes=True, it can also parse strings to datetimes.
If strictly=True, then this expectation is only satisfied if each consecutive value is strictly decreasing–equal values are treated as failures.
expect_column_values_to_be_decreasing is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
strictly (Boolean or None) – If True, values must be strictly greater than previous values
parse_strings_as_datetimes (boolean or None) – If True, all non-null column values to datetimes before making comparisons
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_value_lengths_to_be_between
(self, column, min_value=None, max_value=None, mostly=None, row_condition=None, condition_parser=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be strings with length between a minimum value and a maximum value (inclusive).
This expectation only works for string-type values. Invoking it on ints or floats will raise a TypeError.
expect_column_value_lengths_to_be_between is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
min_value (int or None) – The minimum value for a column entry length.
max_value (int or None) – The maximum value for a column entry length.
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
Notes
min_value and max_value are both inclusive.
If min_value is None, then max_value is treated as an upper bound, and the number of acceptable rows has no minimum.
If max_value is None, then min_value is treated as a lower bound, and the number of acceptable rows has no maximum.
See also
-
expect_column_value_lengths_to_equal
(self, column, value, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be strings with length equal to the provided value.
This expectation only works for string-type values. Invoking it on ints or floats will raise a TypeError.
expect_column_values_to_be_between is a
column_map_expectation
.- Parameters
column (str) – The column name.
value (int or None) – The expected value for a column entry length.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_match_regex
(self, column, regex, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be strings that match a given regular expression. Valid matches can be found anywhere in the string, for example “[at]+” will identify the following strings as expected: “cat”, “hat”, “aa”, “a”, and “t”, and the following strings as unexpected: “fish”, “dog”.
expect_column_values_to_match_regex is a
column_map_expectation
.- Parameters
column (str) – The column name.
regex (str) – The regular expression the column entries should match.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_not_match_regex
(self, column, regex, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be strings that do NOT match a given regular expression. The regex must not match any portion of the provided string. For example, “[at]+” would identify the following strings as expected: “fish”, “dog”, and the following as unexpected: “cat”, “hat”.
expect_column_values_to_not_match_regex is a
column_map_expectation
.- Parameters
column (str) – The column name.
regex (str) – The regular expression the column entries should NOT match.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_match_regex_list
(self, column, regex_list, match_on='any', mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect the column entries to be strings that can be matched to either any of or all of a list of regular expressions. Matches can be anywhere in the string.
expect_column_values_to_match_regex_list is a
column_map_expectation
.- Parameters
column (str) – The column name.
regex_list (list) – The list of regular expressions which the column entries should match
- Keyword Arguments
match_on= (string) – “any” or “all”. Use “any” if the value should match at least one regular expression in the list. Use “all” if it should match each regular expression in the list.
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_not_match_regex_list
(self, column, regex_list, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect the column entries to be strings that do not match any of a list of regular expressions. Matches can be anywhere in the string.
expect_column_values_to_not_match_regex_list is a
column_map_expectation
.- Parameters
column (str) – The column name.
regex_list (list) – The list of regular expressions which the column entries should not match
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_match_strftime_format
(self, column, strftime_format, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be strings representing a date or time with a given format.
expect_column_values_to_match_strftime_format is a
column_map_expectation
.- Parameters
column (str) – The column name.
strftime_format (str) – A strftime format string to use for matching
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_be_dateutil_parseable
(self, column, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be parsable using dateutil.
expect_column_values_to_be_dateutil_parseable is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_be_json_parseable
(self, column, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be data written in JavaScript Object Notation.
expect_column_values_to_be_json_parseable is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_values_to_match_json_schema
(self, column, json_schema, mostly=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column entries to be JSON objects matching a given JSON schema.
expect_column_values_to_match_json_schema is a
column_map_expectation
.- Parameters
column (str) – The column name.
- Keyword Arguments
mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than
(self, column, distribution, p_value=0.05, params=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect the column values to be distributed similarly to a scipy distribution. This expectation compares the provided column to the specified continuous distribution with a parametric Kolmogorov-Smirnov test. The K-S test compares the provided column to the cumulative density function (CDF) of the specified scipy distribution. If you don’t know the desired distribution shape parameters, use the ge.dataset.util.infer_distribution_parameters() utility function to estimate them.
It returns ‘success’=True if the p-value from the K-S test is greater than or equal to the provided p-value.
expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than
is acolumn_aggregate_expectation
.- Parameters
column (str) – The column name.
distribution (str) – The scipy distribution name. See: https://docs.scipy.org/doc/scipy/reference/stats.html Currently supported distributions are listed in the Notes section below.
p_value (float) – The threshold p-value for a passing test. Default is 0.05.
params (dict or list) – A dictionary or positional list of shape parameters that describe the distribution you want to test the data against. Include key values specific to the distribution from the appropriate scipy distribution CDF function. ‘loc’ and ‘scale’ are used as translational parameters. See https://docs.scipy.org/doc/scipy/reference/stats.html#continuous-distributions
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
Notes
These fields in the result object are customized for this expectation:
{ "details": "expected_params" (dict): The specified or inferred parameters of the distribution to test against "ks_results" (dict): The raw result of stats.kstest() }
The Kolmogorov-Smirnov test’s null hypothesis is that the column is similar to the provided distribution.
Supported scipy distributions:
norm
beta
gamma
uniform
chi2
expon
-
expect_column_bootstrapped_ks_test_p_value_to_be_greater_than
(self, column, partition_object=None, p=0.05, bootstrap_samples=None, bootstrap_sample_size=None, result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect column values to be distributed similarly to the provided continuous partition. This expectation compares continuous distributions using a bootstrapped Kolmogorov-Smirnov test. It returns success=True if values in the column match the distribution of the provided partition.
The expected cumulative density function (CDF) is constructed as a linear interpolation between the bins, using the provided weights. Consequently the test expects a piecewise uniform distribution using the bins from the provided partition object.
expect_column_bootstrapped_ks_test_p_value_to_be_greater_than
is acolumn_aggregate_expectation
.- Parameters
column (str) – The column name.
partition_object (dict) – The expected partition object (see Partition Objects).
p (float) – The p-value threshold for the Kolmogorov-Smirnov test. For values below the specified threshold the expectation will return success=False, rejecting the null hypothesis that the distributions are the same. Defaults to 0.05.
- Keyword Arguments
bootstrap_samples (int) – The number bootstrap rounds. Defaults to 1000.
bootstrap_sample_size (int) – The number of samples to take from the column for each bootstrap. A larger sample will increase the specificity of the test. Defaults to 2 * len(partition_object[‘weights’])
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
Notes
These fields in the result object are customized for this expectation:
{ "observed_value": (float) The true p-value of the KS test "details": { "bootstrap_samples": The number of bootstrap rounds used "bootstrap_sample_size": The number of samples taken from the column in each bootstrap round "observed_cdf": The cumulative density function observed in the data, a dict containing 'x' values and cdf_values (suitable for plotting) "expected_cdf" (dict): The cumulative density function expected based on the partition object, a dict containing 'x' values and cdf_values (suitable for plotting) "observed_partition" (dict): The partition observed on the data, using the provided bins but also expanding from min(column) to max(column) "expected_partition" (dict): The partition expected from the data. For KS test, this will always be the partition_object parameter } }
-
expect_column_pair_values_to_be_equal
(self, column_A, column_B, ignore_row_if='both_values_are_missing', result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect the values in column A to be the same as column B.
- Parameters
column_A (str) – The first column name
column_B (str) – The second column name
- Keyword Arguments
ignore_row_if (str) – “both_values_are_missing”, “either_value_is_missing”, “neither”
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_pair_values_A_to_be_greater_than_B
(self, column_A, column_B, or_equal=None, parse_strings_as_datetimes=None, allow_cross_type_comparisons=None, ignore_row_if='both_values_are_missing', result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect values in column A to be greater than column B.
- Parameters
column_A (str) – The first column name
column_B (str) – The second column name
or_equal (boolean or None) – If True, then values can be equal, not strictly greater
- Keyword Arguments
allow_cross_type_comparisons (boolean or None) – If True, allow comparisons between types (e.g. integer and string). Otherwise, attempting such comparisons will raise an exception.
ignore_row_if (str) – “both_values_are_missing”, “either_value_is_missing”, “neither
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_column_pair_values_to_be_in_set
(self, column_A, column_B, value_pairs_set, ignore_row_if='both_values_are_missing', result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect paired values from columns A and B to belong to a set of valid pairs.
- Parameters
column_A (str) – The first column name
column_B (str) – The second column name
value_pairs_set (list of tuples) – All the valid pairs to be matched
- Keyword Arguments
ignore_row_if (str) – “both_values_are_missing”, “either_value_is_missing”, “never”
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_multicolumn_values_to_be_unique
(self, column_list, ignore_row_if='all_values_are_missing', result_format=None, row_condition=None, condition_parser=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect the values for each row to be unique across the columns listed.
- Parameters
column_list (tuple or list) – The first column name
- Keyword Arguments
ignore_row_if (str) – “all_values_are_missing”, “any_value_is_missing”, “never”
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
An ExpectationSuiteValidationResult
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.