great_expectations.data_asset.file_data_asset

Module Contents

Classes

MetaFileDataAsset(*args, **kwargs)

MetaFileDataset is a thin layer above FileDataset.

FileDataAsset(file_path=None, *args, **kwargs)

FileDataset instantiates the great_expectations Expectations API as a

class great_expectations.data_asset.file_data_asset.MetaFileDataAsset(*args, **kwargs)

Bases: great_expectations.data_asset.data_asset.DataAsset

MetaFileDataset is a thin layer above FileDataset. This two-layer inheritance is required to make @classmethod decorators work. Practically speaking, that means that MetaFileDataset implements expectation decorators, like file_lines_map_expectation and FileDataset implements the expectation methods themselves.

classmethod file_lines_map_expectation(cls, func)

Constructs an expectation using file lines map semantics. The file_lines_map_expectations decorator handles boilerplate issues surrounding the common pattern of evaluating truthiness of some condition on an line by line basis in a file.

Parameters

func (function) – The function implementing an expectation that will be applied line by line across a file. The function should take a file and return information about how many lines met expectations.

Notes

Users can specify skip value k that will cause the expectation function to disregard the first k lines of the file.

file_lines_map_expectation will add a kwarg _lines to the called function with the nonnull lines to process.

null_lines_regex defines a regex used to skip lines, but can be overridden

See also

expect_file_line_regex_match_count_to_be_between for an example of a file_lines_map_expectation

class great_expectations.data_asset.file_data_asset.FileDataAsset(file_path=None, *args, **kwargs)

Bases: great_expectations.data_asset.file_data_asset.MetaFileDataAsset

FileDataset instantiates the great_expectations Expectations API as a subclass of a python file object. For the full API reference, please see DataAsset

_data_asset_type = FileDataAsset
expect_file_line_regex_match_count_to_be_between(self, regex, expected_min_count=0, expected_max_count=None, skip=None, mostly=None, null_lines_regex='^\s*$', result_format=None, include_config=True, catch_exceptions=None, meta=None, _lines=None)

Expect the number of times a regular expression appears on each line of a file to be between a maximum and minimum value.

Parameters
  • regex – A string that can be compiled as valid regular expression to match

  • expected_min_count (None or nonnegative integer) – Specifies the minimum number of times regex is expected to appear on each line of the file

  • expected_max_count (None or nonnegative integer) – Specifies the maximum number of times regex is expected to appear on each line of the file

Keyword Arguments
  • skip (None or nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations

  • mostly (None or number between 0 and 1) – Specifies an acceptable error for expectations. If the percentage of unexpected lines is less than mostly, the method still returns true even if all lines don’t match the expectation criteria.

  • null_lines_regex (valid regular expression or None) – If not none, a regex to skip lines as null. Defaults to empty or whitespace-only lines.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

  • _lines (list) – The lines over which to operate (provided by the file_lines_map_expectation decorator)

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_line_regex_match_count_to_equal(self, regex, expected_count=0, skip=None, mostly=None, nonnull_lines_regex='^\s*$', result_format=None, include_config=True, catch_exceptions=None, meta=None, _lines=None)

Expect the number of times a regular expression appears on each line of a file to be between a maximum and minimum value.

Parameters
  • regex – A string that can be compiled as valid regular expression to match

  • expected_count (None or nonnegative integer) – Specifies the number of times regex is expected to appear on each line of the file

Keyword Arguments
  • skip (None or nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations

  • mostly (None or number between 0 and 1) – Specifies an acceptable error for expectations. If the percentage of unexpected lines is less than mostly, the method still returns true even if all lines don’t match the expectation criteria.

  • nonnull_lines_regex (valid regular expression or None) – If not none, a regex to skip lines as null. Defaults to empty or whitespace-only lines.

Other Parameters
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

  • _lines (list) – The lines over which to operate (provided by the file_lines_map_expectation decorator)

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_hash_to_equal(self, value, hash_alg='md5', result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect computed file hash to equal some given value.

Parameters

value – A string to compare with the computed hash value

Keyword Arguments
  • hash_alg (string) – Indicates the hash algorithm to use

  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_size_to_be_between(self, minsize=0, maxsize=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Expect file size to be between a user specified maxsize and minsize.

Parameters
  • minsize (integer) – minimum expected file size

  • maxsize (integer) – maximum expected file size

Keyword Arguments
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_to_exist(self, filepath=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Checks to see if a file specified by the user actually exists

Parameters

filepath (str or None) – The filepath to evalutate. If none, will check the currently-configured path object of this FileDataAsset.

Keyword Arguments
  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_to_have_valid_table_header(self, regex, skip=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)

Checks to see if a file has a line with unique delimited values, such a line may be used as a table header.

Keyword Arguments
  • skip (nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations

  • regex (string) – A string that can be compiled as valid regular expression. Used to specify the elements of the table header (the column headers)

  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.

expect_file_to_be_valid_json(self, schema=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)
Parameters
  • schema – string optional JSON schema file on which JSON data file is validated against

  • result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.

  • include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.

  • catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.

  • meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification.

For more detail, see meta.

Returns

A JSON-serializable expectation result object.

Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.