great_expectations.dataset.sparkdf_dataset

Module Contents

Classes

MetaSparkDFDataset(*args, **kwargs)

MetaSparkDFDataset is a thin layer between Dataset and SparkDFDataset.

SparkDFDataset(spark_df, *args, **kwargs)

This class holds an attribute spark_df which is a spark.sql.DataFrame.

great_expectations.dataset.sparkdf_dataset.logger
class great_expectations.dataset.sparkdf_dataset.MetaSparkDFDataset(*args, **kwargs)

Bases: great_expectations.dataset.dataset.Dataset

MetaSparkDFDataset is a thin layer between Dataset and SparkDFDataset. This two-layer inheritance is required to make @classmethod decorators work. Practically speaking, that means that MetaSparkDFDataset implements expectation decorators, like column_map_expectation and column_aggregate_expectation, and SparkDFDataset implements the expectation methods themselves.

classmethod column_map_expectation(cls, func)

Constructs an expectation using column-map semantics.

The MetaSparkDFDataset implementation replaces the “column” parameter supplied by the user with a Spark Dataframe with the actual column data. The current approach for functions implementing expectation logic is to append a column named “__success” to this dataframe and return to this decorator.

See column_map_expectation for full documentation of this function.

classmethod column_pair_map_expectation(cls, func)

The column_pair_map_expectation decorator handles boilerplate issues surrounding the common pattern of evaluating truthiness of some condition on a per row basis across a pair of columns.

classmethod multicolumn_map_expectation(cls, func)

The multicolumn_map_expectation decorator handles boilerplate issues surrounding the common pattern of evaluating truthiness of some condition on a per row basis across a set of columns.

class great_expectations.dataset.sparkdf_dataset.SparkDFDataset(spark_df, *args, **kwargs)

Bases: great_expectations.dataset.sparkdf_dataset.MetaSparkDFDataset

This class holds an attribute spark_df which is a spark.sql.DataFrame.

Feature Maturity

icon-953bc18295bc11ecb09f0242ac110002 Validation Engine - pyspark - Self-Managed - How-to Guide
Use Spark DataFrame to validate data
Maturity: Production
Details:
API Stability: Stable
Implementation Completeness: Moderate
Unit Test Coverage: Complete
Integration Infrastructure/Test Coverage: N/A -> see relevant Datasource evaluation
Documentation Completeness: Complete
Bug Risk: Low/Moderate
Expectation Completeness: Moderate
icon-953bc45c95bc11ecb09f0242ac110002 Validation Engine - Databricks - How-to Guide
Use Spark DataFrame in a Databricks cluster to validate data
Maturity: Beta
Details:
API Stability: Stable
Implementation Completeness: Low (dbfs-specific handling)
Unit Test Coverage: N/A -> implementation not different
Integration Infrastructure/Test Coverage: Minimal (we’ve tested a bit, know others have used it)
Documentation Completeness: Moderate (need docs on managing project configuration via dbfs/etc.)
Bug Risk: Low/Moderate
Expectation Completeness: Moderate
icon-953bc60a95bc11ecb09f0242ac110002 Validation Engine - EMR - Spark - How-to Guide
Use Spark DataFrame in an EMR cluster to validate data
Maturity: Experimental
Details:
API Stability: Stable
Implementation Completeness: Low (need to provide guidance on “known good” paths, and we know there are many “knobs” to tune that we have not explored/tested)
Unit Test Coverage: N/A -> implementation not different
Integration Infrastructure/Test Coverage: Unknown
Documentation Completeness: Low (must install specific/latest version but do not have docs to that effect or of known useful paths)
Bug Risk: Low/Moderate
Expectation Completeness: Moderate
icon-953bc79095bc11ecb09f0242ac110002 Validation Engine - Spark - Other - How-to Guide
Use Spark DataFrame to validate data
Maturity: Experimental
Details:
API Stability: Stable
Implementation Completeness: Other (we haven’t tested possibility, known glue deployment)
Unit Test Coverage: N/A -> implementation not different
Integration Infrastructure/Test Coverage: Unknown
Documentation Completeness: Low (must install specific/latest version but do not have docs to that effect or of known useful paths)
Bug Risk: Low/Moderate
Expectation Completeness: Moderate
classmethod from_dataset(cls, dataset=None)

This base implementation naively passes arguments on to the real constructor, which is suitable really when a constructor knows to take its own type. In general, this should be overridden

head(self, n=5)

Returns a PandasDataset with the first n rows of the given Dataset

get_row_count(self)

Returns: int, table row count

get_column_count(self)

Returns: int, table column count

get_table_columns(self)

Returns: List[str], list of column names

get_column_nonnull_count(self, column)

Returns: int

get_column_mean(self, column)

Returns: float

get_column_sum(self, column)

Returns: float

_describe_column(self, column)
get_column_max(self, column, parse_strings_as_datetimes=False)

Returns: Any

get_column_min(self, column, parse_strings_as_datetimes=False)

Returns: Any

get_column_value_counts(self, column, sort='value', collate=None)

Get a series containing the frequency counts of unique values from the named column.

Parameters
  • column – the column for which to obtain value_counts

  • sort (string) – must be one of “value”, “count”, or “none”. - if “value” then values in the resulting partition object will be sorted lexigraphically - if “count” then values will be sorted according to descending count (frequency) - if “none” then values will not be sorted

  • collate (string) – the collate (sort) method to be used on supported backends (SqlAlchemy only)

Returns

pd.Series of value counts for a column, sorted according to the value requested in sort

get_column_unique_count(self, column)

Returns: int

get_column_modes(self, column)

leverages computation done in _get_column_value_counts

get_column_median(self, column)

Returns: Any

get_column_quantiles(self, column, quantiles, allow_relative_error=False)

Get the values in column closest to the requested quantiles :param column: name of column :type column: string :param quantiles: the quantiles to return. quantiles must be a tuple to ensure caching is possible :type quantiles: tuple of float

Returns

the nearest values in the dataset to those quantiles

Return type

List[Any]

get_column_stdev(self, column)

Returns: float

get_column_hist(self, column, bins)

return a list of counts corresponding to bins

get_column_count_in_range(self, column, min_val=None, max_val=None, strict_min=False, strict_max=True)

Returns: int

static _apply_dateutil_parse(column)
expect_column_values_to_be_in_set(self, column, value_set, mostly=None, parse_strings_as_datetimes=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect each column value to be in a given set.

For example:

# my_df.my_col = [1,2,2,3,3,3]
>>> my_df.expect_column_values_to_be_in_set(
    "my_col",
    [2,3]
)
{
  "success": false
  "result": {
    "unexpected_count": 1
    "unexpected_percent": 16.66666666666666666,
    "unexpected_percent_nonmissing": 16.66666666666666666,
    "partial_unexpected_list": [
      1
    ],
  },
}

expect_column_values_to_be_in_set is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • value_set (set-like) – A set of objects used for comparison.

Keyword Arguments
  • mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

  • parse_strings_as_datetimes (boolean or None) – If True values provided in value_set will be parsed as datetimes before making comparisons.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_not_be_in_set(self, column, value_set, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to not be in the set.

For example:

# my_df.my_col = [1,2,2,3,3,3]
>>> my_df.expect_column_values_to_not_be_in_set(
    "my_col",
    [1,2]
)
{
  "success": false
  "result": {
    "unexpected_count": 3
    "unexpected_percent": 50.0,
    "unexpected_percent_nonmissing": 50.0,
    "partial_unexpected_list": [
      1, 2, 2
    ],
  },
}

expect_column_values_to_not_be_in_set is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • value_set (set-like) – A set of objects used for comparison.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_be_between(self, column, min_value=None, max_value=None, strict_min=False, strict_max=False, parse_strings_as_datetimes=None, output_strftime_format=None, allow_cross_type_comparisons=None, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to be between a minimum value and a maximum value (inclusive).

expect_column_values_to_be_between is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • min_value (comparable type or None) – The minimum value for a column entry.

  • max_value (comparable type or None) – The maximum value for a column entry.

Keyword Arguments
  • strict_min (boolean) – If True, values must be strictly larger than min_value, default=False

  • strict_max (boolean) – If True, values must be strictly smaller than max_value, default=False allow_cross_type_comparisons (boolean or None) : If True, allow comparisons between types (e.g. integer and string). Otherwise, attempting such comparisons will raise an exception.

  • parse_strings_as_datetimes (boolean or None) – If True, parse min_value, max_value, and all non-null column values to datetimes before making comparisons.

  • output_strftime_format (str or None) – A valid strfime format for datetime output. Only used if parse_strings_as_datetimes=True.

  • mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

Notes

  • min_value and max_value are both inclusive unless strict_min or strict_max are set to True.

  • If min_value is None, then max_value is treated as an upper bound, and there is no minimum value checked.

  • If max_value is None, then min_value is treated as a lower bound, and there is no maximum value checked.

expect_column_value_lengths_to_be_between(self, column, min_value=None, max_value=None, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to be strings with length between a minimum value and a maximum value (inclusive).

This expectation only works for string-type values. Invoking it on ints or floats will raise a TypeError.

expect_column_value_lengths_to_be_between is a column_map_expectation.

Parameters

column (str) – The column name.

Keyword Arguments
  • min_value (int or None) – The minimum value for a column entry length.

  • max_value (int or None) – The maximum value for a column entry length.

  • mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

Notes

  • min_value and max_value are both inclusive.

  • If min_value is None, then max_value is treated as an upper bound, and the number of acceptable rows has no minimum.

  • If max_value is None, then min_value is treated as a lower bound, and the number of acceptable rows has no maximum.

expect_column_values_to_be_unique(self, column, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect each column value to be unique.

This expectation detects duplicates. All duplicated values are counted as exceptions.

For example, [1, 2, 3, 3, 3] will return [3, 3, 3] in result.exceptions_list, with unexpected_percent = 60.0.

expect_column_values_to_be_unique is a column_map_expectation.

Parameters

column (str) – The column name.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_value_lengths_to_equal(self, column, value, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to be strings with length equal to the provided value.

This expectation only works for string-type values. Invoking it on ints or floats will raise a TypeError.

expect_column_values_to_be_between is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • value (int or None) – The expected value for a column entry length.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_match_strftime_format(self, column, strftime_format, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to be strings representing a date or time with a given format.

expect_column_values_to_match_strftime_format is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • strftime_format (str) – A strftime format string to use for matching

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_not_be_null(self, column, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column values to not be null.

To be counted as an exception, values must be explicitly null or missing, such as a NULL in PostgreSQL or an np.NaN in pandas. Empty strings don’t count as null unless they have been coerced to a null type.

expect_column_values_to_not_be_null is a column_map_expectation.

Parameters

column (str) – The column name.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_be_null(self, column, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column values to be null.

expect_column_values_to_be_null is a column_map_expectation.

Parameters

column (str) – The column name.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_match_json_schema(self, column, json_schema, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to be JSON objects matching a given JSON schema.

expect_column_values_to_match_json_schema is a column_map_expectation.

Parameters

column (str) – The column name.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_be_json_parseable(self, column, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to be data written in JavaScript Object Notation.

expect_column_values_to_be_json_parseable is a column_map_expectation.

Parameters

column (str) – The column name.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_be_of_type(self, column, type_, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect a column to contain values of a specified data type.

expect_column_values_to_be_of_type is a column_aggregate_expectation for typed-column backends, and also for PandasDataset where the column dtype and provided type_ are unambiguous constraints (any dtype except ‘object’ or dtype of ‘object’ with type_ specified as ‘object’).

For PandasDataset columns with dtype of ‘object’ expect_column_values_to_be_of_type is a column_map_expectation and will independently check each row’s type.

Parameters
  • column (str) – The column name.

  • type\_ (str) – A string representing the data type that each column should have as entries. Valid types are defined by the current backend implementation and are dynamically loaded. For example, valid types for PandasDataset include any numpy dtype values (such as ‘int64’) or native python types (such as ‘int’), whereas valid types for a SqlAlchemyDataset include types named by the current driver such as ‘INTEGER’ in most SQL dialects and ‘TEXT’ in dialects such as postgresql. Valid types for SparkDFDataset include ‘StringType’, ‘BooleanType’ and other pyspark-defined type names.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_be_in_type_list(self, column, type_list: List[str], mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect a column to contain values from a specified type list.

expect_column_values_to_be_in_type_list is a column_aggregate_expectation for typed-column backends, and also for PandasDataset where the column dtype provides an unambiguous constraints (any dtype except ‘object’). For PandasDataset columns with dtype of ‘object’ expect_column_values_to_be_of_type is a column_map_expectation and will independently check each row’s type.

Parameters
  • column (str) – The column name.

  • type_list (str) – A list of strings representing the data type that each column should have as entries. Valid types are defined by the current backend implementation and are dynamically loaded. For example, valid types for PandasDataset include any numpy dtype values (such as ‘int64’) or native python types (such as ‘int’), whereas valid types for a SqlAlchemyDataset include types named by the current driver such as ‘INTEGER’ in most SQL dialects and ‘TEXT’ in dialects such as postgresql. Valid types for SparkDFDataset include ‘StringType’, ‘BooleanType’ and other pyspark-defined type names.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_match_regex(self, column, regex, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to be strings that match a given regular expression. Valid matches can be found anywhere in the string, for example “[at]+” will identify the following strings as expected: “cat”, “hat”, “aa”, “a”, and “t”, and the following strings as unexpected: “fish”, “dog”.

expect_column_values_to_match_regex is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • regex (str) – The regular expression the column entries should match.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_not_match_regex(self, column, regex, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column entries to be strings that do NOT match a given regular expression. The regex must not match any portion of the provided string. For example, “[at]+” would identify the following strings as expected: “fish”, “dog”, and the following as unexpected: “cat”, “hat”.

expect_column_values_to_not_match_regex is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • regex (str) – The regular expression the column entries should NOT match.

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_match_regex_list(self, column, regex_list, match_on='any', mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect the column entries to be strings that can be matched to either any of or all of a list of regular expressions. Matches can be anywhere in the string.

expect_column_values_to_match_regex_list is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • regex_list (list) – The list of regular expressions which the column entries should match

Keyword Arguments
  • match_on= (string) – “any” or “all”. Use “any” if the value should match at least one regular expression in the list. Use “all” if it should match each regular expression in the list.

  • mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_not_match_regex_list(self, column, regex_list, mostly=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect the column entries to be strings that do not match any of a list of regular expressions. Matches can be anywhere in the string.

expect_column_values_to_not_match_regex_list is a column_map_expectation.

Parameters
  • column (str) – The column name.

  • regex_list (list) – The list of regular expressions which the column entries should not match

Keyword Arguments

mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_pair_values_to_be_equal(self, column_A, column_B, ignore_row_if='both_values_are_missing', result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect the values in column A to be the same as column B.

Parameters
  • column_A (str) – The first column name

  • column_B (str) – The second column name

Keyword Arguments

ignore_row_if (str) – “both_values_are_missing”, “either_value_is_missing”, “neither”

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_pair_values_A_to_be_greater_than_B(self, column_A, column_B, or_equal=None, parse_strings_as_datetimes=None, allow_cross_type_comparisons=None, ignore_row_if='both_values_are_missing', result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect values in column A to be greater than column B.

Parameters
  • column_A (str) – The first column name

  • column_B (str) – The second column name

  • or_equal (boolean or None) – If True, then values can be equal, not strictly greater

Keyword Arguments
  • allow_cross_type_comparisons (boolean or None) – If True, allow comparisons between types (e.g. integer and string). Otherwise, attempting such comparisons will raise an exception.

  • ignore_row_if (str) – “both_values_are_missing”, “either_value_is_missing”, “neither

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_pair_values_to_be_in_set(self, column_A, column_B, value_pairs_set, ignore_row_if='both_values_are_missing', result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect paired values from columns A and B to belong to a set of valid pairs.

Parameters
  • column_A (str) – The first column name

  • column_B (str) – The second column name

  • value_pairs_set (list of tuples) – All the valid pairs to be matched

Keyword Arguments

ignore_row_if (str) – “both_values_are_missing”, “either_value_is_missing”, “neither”

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_multicolumn_values_to_be_unique(self, column_list, mostly=None, ignore_row_if='all_values_are_missing', result_format=None, include_config=True, catch_exceptions=None, meta=None)

NOTE: This method is deprecated. Please use expect_select_column_values_to_be_unique_within_record instead Expect the values for each record to be unique across the columns listed. Note that records can be duplicated.

For example:

A B C
1 1 2 Fail
1 2 3 Pass
8 2 7 Pass
1 2 3 Pass
4 4 4 Fail
Parameters

column_list (tuple or list) – The column names to evaluate

Keyword Arguments

ignore_row_if (str) – “all_values_are_missing”, “any_value_is_missing”, “never”

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_select_column_values_to_be_unique_within_record(self, column_list, mostly=None, ignore_row_if='all_values_are_missing', result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect the values for each record to be unique across the columns listed. Note that records can be duplicated.

For example:

A B C
1 1 2 Fail
1 2 3 Pass
8 2 7 Pass
1 2 3 Pass
4 4 4 Fail
Parameters

column_list (tuple or list) – The column names to evaluate

Keyword Arguments

ignore_row_if (str) – “all_values_are_missing”, “any_value_is_missing”, “never”

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_compound_columns_to_be_unique(self, column_list, mostly=None, ignore_row_if='all_values_are_missing', result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect that the columns are unique together, e.g. a multi-column primary key Note that all instances of any duplicates are considered failed

For example:

A B C
1 1 2 Fail
1 2 3 Pass
1 1 2 Fail
2 2 2 Pass
3 2 3 Pass
Parameters

column_list (tuple or list) – The column names to evaluate

Keyword Arguments

ignore_row_if (str) – “all_values_are_missing”, “any_value_is_missing”, “never”

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_be_increasing(self, column, strictly=False, mostly=None, parse_strings_as_datetimes=None, output_strftime_format=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column values to be increasing.

By default, this expectation only works for numeric or datetime data. When parse_strings_as_datetimes=True, it can also parse strings to datetimes.

If strictly=True, then this expectation is only satisfied if each consecutive value is strictly increasing–equal values are treated as failures.

expect_column_values_to_be_increasing is a column_map_expectation.

Parameters

column (str) – The column name.

Keyword Arguments
  • strictly (Boolean or None) – If True, values must be strictly greater than previous values

  • parse_strings_as_datetimes (boolean or None) – If True, all non-null column values to datetimes before making comparisons

  • mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_column_values_to_be_decreasing(self, column, strictly=False, mostly=None, parse_strings_as_datetimes=None, output_strftime_format=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect column values to be decreasing.

By default, this expectation only works for numeric or datetime data. When parse_strings_as_datetimes=True, it can also parse strings to datetimes.

If strictly=True, then this expectation is only satisfied if each consecutive value is strictly decreasing–equal values are treated as failures.

expect_column_values_to_be_decreasing is a column_map_expectation.

Parameters

column (str) – The column name.

Keyword Arguments
  • strictly (Boolean or None) – If True, values must be strictly greater than previous values

  • parse_strings_as_datetimes (boolean or None) – If True, all non-null column values to datetimes before making comparisons

  • mostly (None or a float between 0 and 1) – Return “success”: True if at least mostly fraction of values match the expectation. For more detail, see mostly.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

An ExpectationSuiteValidationResult

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_multicolumn_sum_to_equal(self, column_list, sum_total, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Multi-Column Map Expectation

Expects that the sum of row values is the same for each row, summing only values in columns specified in column_list, and equal to the specific value, sum_total.

Parameters
  • column_list (List[str]) – Set of columns to be checked

  • sum_total (int) – expected sum of columns