great_expectations.data_asset

Package Contents

Classes

DataAsset(*args, **kwargs)

FileDataAsset(file_path=None, *args, **kwargs)

FileDataset instantiates the great_expectations Expectations API as a

class great_expectations.data_asset.DataAsset(*args, **kwargs)
_data_asset_type = DataAsset
list_available_expectation_types(self)
autoinspect(self, profiler)

Deprecated: use profile instead.

Use the provided profiler to evaluate this data_asset and assign the resulting expectation suite as its own.

Parameters

profiler – The profiler to use

Returns

tuple(expectation_suite, validation_results)

profile(self, profiler, profiler_configuration=None)

Use the provided profiler to evaluate this data_asset and assign the resulting expectation suite as its own.

Parameters
  • profiler – The profiler to use

  • profiler_configuration – Optional profiler configuration dict

Returns

tuple(expectation_suite, validation_results)

edit_expectation_suite(self)
classmethod expectation(cls, method_arg_names)

Manages configuration and running of expectation objects.

Expectation builds and saves a new expectation configuration to the DataAsset object. It is the core decorator used by great expectations to manage expectation configurations.

Parameters

method_arg_names (List) – An ordered list of the arguments used by the method implementing the expectation (typically the result of inspection). Positional arguments are explicitly mapped to keyword arguments when the expectation is run.

Notes

Intermediate decorators that call the core @expectation decorator will most likely need to pass their decorated methods’ signature up to the expectation decorator. For example, the MetaPandasDataset column_map_expectation decorator relies on the DataAsset expectation decorator, but will pass through the signature from the implementing method.

@expectation intercepts and takes action based on the following parameters:
  • include_config (boolean or None) : If True, then include the generated expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) : If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • result_format (str or None)Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY.

    For more detail, see result_format.

  • meta (dict or None): A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

_initialize_expectations(self, expectation_suite: Union[Dict, ExpectationSuite, None] = None, expectation_suite_name: Optional[str] = None)

Instantiates _expectation_suite as empty by default or with a specified expectation config. In addition, this always sets the default_expectation_args to:

include_config: False, catch_exceptions: False, output_format: ‘BASIC’

By default, initializes data_asset_type to the name of the implementing class, but subclasses that have interoperable semantics (e.g. Dataset) may override that parameter to clarify their interoperability.

Parameters
  • expectation_suite (json) – A json-serializable expectation config. If None, creates default _expectation_suite with an empty list of expectations and key value data_asset_name as data_asset_name.

  • expectation_suite_name (string) – The name to assign to the expectation_suite.expectation_suite_name

Returns

None

append_expectation(self, expectation_config)

This method is a thin wrapper for ExpectationSuite.append_expectation

find_expectation_indexes(self, expectation_configuration: ExpectationConfiguration, match_type: str = 'domain')

This method is a thin wrapper for ExpectationSuite.find_expectation_indexes

find_expectations(self, expectation_configuration: ExpectationConfiguration, match_type: str = 'domain')

This method is a thin wrapper for ExpectationSuite.find_expectations()

remove_expectation(self, expectation_configuration: ExpectationConfiguration, match_type: str = 'domain', remove_multiple_matches: bool = False)

This method is a thin wrapper for ExpectationSuite.remove()

set_config_value(self, key, value)
get_config_value(self, key)
property batch_kwargs(self)
property batch_id(self)
property batch_markers(self)
property batch_parameters(self)
discard_failing_expectations(self)
get_default_expectation_arguments(self)

Fetch default expectation arguments for this data_asset

Returns

A dictionary containing all the current default expectation arguments for a data_asset

Ex:

{
    "include_config" : True,
    "catch_exceptions" : False,
    "result_format" : 'BASIC'
}

See also

set_default_expectation_arguments

set_default_expectation_argument(self, argument, value)

Set a default expectation argument for this data_asset

Parameters
  • argument (string) – The argument to be replaced

  • value – The New argument to use for replacement

Returns

None

See also

get_default_expectation_arguments

get_expectations_config(self, discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False)
get_expectation_suite(self, discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False, suppress_logging=False)

Returns _expectation_config as a JSON object, and perform some cleaning along the way.

Parameters
  • discard_failed_expectations (boolean) – Only include expectations with success_on_last_run=True in the exported config. Defaults to True.

  • discard_result_format_kwargs (boolean) – In returned expectation objects, suppress the result_format parameter. Defaults to True.

  • discard_include_config_kwargs (boolean) – In returned expectation objects, suppress the include_config parameter. Defaults to True.

  • discard_catch_exceptions_kwargs (boolean) – In returned expectation objects, suppress the catch_exceptions parameter. Defaults to True.

  • suppress_warnings (boolean) – If true, do not include warnings in logging information about the operation.

  • suppress_logging (boolean) – If true, do not create a log entry (useful when using get_expectation_suite programmatically)

Returns

An expectation suite.

Note

get_expectation_suite does not affect the underlying expectation suite at all. The returned suite is a copy of _expectation_suite, not the original object.

save_expectation_suite(self, filepath=None, discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False)

Writes _expectation_config to a JSON file.

Writes the DataAsset’s expectation config to the specified JSON filepath. Failing expectations can be excluded from the JSON expectations config with discard_failed_expectations. The kwarg key-value pairs result_format, include_config, and catch_exceptions are optionally excluded from the JSON expectations config.

Parameters
  • filepath (string) – The location and name to write the JSON config file to.

  • discard_failed_expectations (boolean) – If True, excludes expectations that do not return success = True. If False, all expectations are written to the JSON config file.

  • discard_result_format_kwargs (boolean) – If True, the result_format attribute for each expectation is not written to the JSON config file.

  • discard_include_config_kwargs (boolean) – If True, the include_config attribute for each expectation is not written to the JSON config file.

  • discard_catch_exceptions_kwargs (boolean) – If True, the catch_exceptions attribute for each expectation is not written to the JSON config file.

  • suppress_warnings (boolean) – It True, all warnings raised by Great Expectations, as a result of dropped expectations, are suppressed.

validate(self, expectation_suite=None, run_id=None, data_context=None, evaluation_parameters=None, catch_exceptions=True, result_format=None, only_return_failures=False, run_name=None, run_time=None)

Generates a JSON-formatted report describing the outcome of all expectations.

Use the default expectation_suite=None to validate the expectations config associated with the DataAsset.

Parameters
  • expectation_suite (json or None) – If None, uses the expectations config generated with the DataAsset during the current session. If a JSON file, validates those expectations.

  • run_name (str) – Used to identify this validation result as part of a collection of validations. See DataContext for more information.

  • data_context (DataContext) – A datacontext object to use as part of validation for binding evaluation parameters and registering validation results.

  • evaluation_parameters (dict or None) – If None, uses the evaluation_paramters from the expectation_suite provided or as part of the data_asset. If a dict, uses the evaluation parameters in the dictionary.

  • catch_exceptions (boolean) – If True, exceptions raised by tests will not end validation and will be described in the returned report.

  • result_format (string or None) – If None, uses the default value (‘BASIC’ or as specified). If string, the returned expectation output follows the specified format (‘BOOLEAN_ONLY’,’BASIC’, etc.).

  • only_return_failures (boolean) – If True, expectation results are only returned when success = False

Returns

A JSON-formatted dictionary containing a list of the validation results. An example of the returned format:

{
  "results": [
    {
      "unexpected_list": [unexpected_value_1, unexpected_value_2],
      "expectation_type": "expect_*",
      "kwargs": {
        "column": "Column_Name",
        "output_format": "SUMMARY"
      },
      "success": true,
      "raised_exception: false.
      "exception_traceback": null
    },
    {
      ... (Second expectation results)
    },
    ... (More expectations results)
  ],
  "success": true,
  "statistics": {
    "evaluated_expectations": n,
    "successful_expectations": m,
    "unsuccessful_expectations": n - m,
    "success_percent": m / n
  }
}

Notes

If the configuration object was built with a different version of great expectations then the current environment. If no version was found in the configuration file.

Raises

AttributeError - if 'catch_exceptions'=None and an expectation throws an AttributeError

get_evaluation_parameter(self, parameter_name, default_value=None)

Get an evaluation parameter value that has been stored in meta.

Parameters
  • parameter_name (string) – The name of the parameter to store.

  • default_value (any) – The default value to be returned if the parameter is not found.

Returns

The current value of the evaluation parameter.

set_evaluation_parameter(self, parameter_name, parameter_value)

Provide a value to be stored in the data_asset evaluation_parameters object and used to evaluate parameterized expectations.

Parameters
  • parameter_name (string) – The name of the kwarg to be replaced at evaluation time

  • parameter_value (any) – The value to be used

add_citation(self, comment, batch_kwargs=None, batch_markers=None, batch_parameters=None, citation_date=None)
property expectation_suite_name(self)

Gets the current expectation_suite name of this data_asset as stored in the expectations configuration.

_format_map_output(self, result_format, success, element_count, nonnull_count, unexpected_count, unexpected_list, unexpected_index_list)

Helper function to construct expectation result objects for map_expectations (such as column_map_expectation and file_lines_map_expectation).

Expectations support four result_formats: BOOLEAN_ONLY, BASIC, SUMMARY, and COMPLETE. In each case, the object returned has a different set of populated fields. See result_format for more information.

This function handles the logic for mapping those fields for column_map_expectations.

_calc_map_expectation_success(self, success_count, nonnull_count, mostly)

Calculate success and percent_success for column_map_expectations

Parameters
  • success_count (int) – The number of successful values in the column

  • nonnull_count (int) – The number of nonnull values in the column

  • mostly (float or None) – A value between 0 and 1 (or None), indicating the fraction of successes required to pass the expectation as a whole. If mostly=None, then all values must succeed in order for the expectation as a whole to succeed.

Returns

success (boolean), percent_success (float)

test_expectation_function(self, function, *args, **kwargs)

Test a generic expectation function

Parameters
  • function (func) – The function to be tested. (Must be a valid expectation function.)

  • *args – Positional arguments to be passed the the function

  • **kwargs – Keyword arguments to be passed the the function

Returns

A JSON-serializable expectation result object.

Notes

This function is a thin layer to allow quick testing of new expectation functions, without having to define custom classes, etc. To use developed expectations from the command-line tool, you will still need to define custom classes, etc.

Check out How to create custom Expectations for more information.

class great_expectations.data_asset.FileDataAsset(file_path=None, *args, **kwargs)

Bases: great_expectations.data_asset.file_data_asset.MetaFileDataAsset

FileDataset instantiates the great_expectations Expectations API as a subclass of a python file object. For the full API reference, please see DataAsset

_data_asset_type = FileDataAsset
expect_file_line_regex_match_count_to_be_between(self, regex, expected_min_count=0, expected_max_count=None, skip=None, mostly=None, null_lines_regex='^\s*$', result_format=None, include_config=True, catch_exceptions=None, meta=None, _lines=None)

Expect the number of times a regular expression appears on each line of a file to be between a maximum and minimum value.

Parameters
  • regex – A string that can be compiled as valid regular expression to match

  • expected_min_count (None or nonnegative integer) – Specifies the minimum number of times regex is expected to appear on each line of the file

  • expected_max_count (None or nonnegative integer) – Specifies the maximum number of times regex is expected to appear on each line of the file

Keyword Arguments
  • skip (None or nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations

  • mostly (None or number between 0 and 1) – Specifies an acceptable error for expectations. If the percentage of unexpected lines is less than mostly, the method still returns true even if all lines don’t match the expectation criteria.

  • null_lines_regex (valid regular expression or None) – If not none, a regex to skip lines as null. Defaults to empty or whitespace-only lines.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

  • _lines (list) – The lines over which to operate (provided by the file_lines_map_expectation decorator)

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_line_regex_match_count_to_equal(self, regex, expected_count=0, skip=None, mostly=None, nonnull_lines_regex='^\s*$', result_format=None, include_config=True, catch_exceptions=None, meta=None, _lines=None)

Expect the number of times a regular expression appears on each line of a file to be between a maximum and minimum value.

Parameters
  • regex – A string that can be compiled as valid regular expression to match

  • expected_count (None or nonnegative integer) – Specifies the number of times regex is expected to appear on each line of the file

Keyword Arguments
  • skip (None or nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations

  • mostly (None or number between 0 and 1) – Specifies an acceptable error for expectations. If the percentage of unexpected lines is less than mostly, the method still returns true even if all lines don’t match the expectation criteria.

  • nonnull_lines_regex (valid regular expression or None) – If not none, a regex to skip lines as null. Defaults to empty or whitespace-only lines.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

  • _lines (list) – The lines over which to operate (provided by the file_lines_map_expectation decorator)

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_hash_to_equal(self, value, hash_alg='md5', result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect computed file hash to equal some given value.

Parameters

value – A string to compare with the computed hash value

Keyword Arguments
  • hash_alg (string) – Indicates the hash algorithm to use

  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_size_to_be_between(self, minsize=0, maxsize=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect file size to be between a user specified maxsize and minsize.

Parameters
  • minsize (integer) – minimum expected file size

  • maxsize (integer) – maximum expected file size

Keyword Arguments
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_to_exist(self, filepath=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Checks to see if a file specified by the user actually exists

Parameters

filepath (str or None) – The filepath to evaluate. If none, will check the currently-configured path object of this FileDataAsset.

Keyword Arguments
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_to_have_valid_table_header(self, regex, skip=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Checks to see if a file has a line with unique delimited values, such a line may be used as a table header.

Keyword Arguments
  • skip (nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations

  • regex (string) – A string that can be compiled as valid regular expression. Used to specify the elements of the table header (the column headers)

  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_to_be_valid_json(self, schema=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)
Parameters
  • schema – string optional JSON schema file on which JSON data file is validated against

  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification.

For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.