great_expectations.data_asset.file_data_asset
¶
Module Contents¶
Classes¶
|
MetaFileDataset is a thin layer above FileDataset. |
|
FileDataset instantiates the great_expectations Expectations API as a |
-
class
great_expectations.data_asset.file_data_asset.
MetaFileDataAsset
(*args, **kwargs)¶ Bases:
great_expectations.data_asset.data_asset.DataAsset
MetaFileDataset is a thin layer above FileDataset. This two-layer inheritance is required to make @classmethod decorators work. Practically speaking, that means that MetaFileDataset implements expectation decorators, like file_lines_map_expectation and FileDataset implements the expectation methods themselves.
-
classmethod
file_lines_map_expectation
(cls, func)¶ Constructs an expectation using file lines map semantics. The file_lines_map_expectations decorator handles boilerplate issues surrounding the common pattern of evaluating truthiness of some condition on an line by line basis in a file.
- Parameters
func (function) – The function implementing an expectation that will be applied line by line across a file. The function should take a file and return information about how many lines met expectations.
Notes
Users can specify skip value k that will cause the expectation function to disregard the first k lines of the file.
file_lines_map_expectation will add a kwarg _lines to the called function with the nonnull lines to process.
null_lines_regex defines a regex used to skip lines, but can be overridden
See also
expect_file_line_regex_match_count_to_be_between
for an example of a file_lines_map_expectation
-
classmethod
-
class
great_expectations.data_asset.file_data_asset.
FileDataAsset
(file_path=None, *args, **kwargs)¶ Bases:
great_expectations.data_asset.file_data_asset.MetaFileDataAsset
FileDataset instantiates the great_expectations Expectations API as a subclass of a python file object. For the full API reference, please see
DataAsset
-
_data_asset_type
= FileDataAsset¶
-
expect_file_line_regex_match_count_to_be_between
(self, regex, expected_min_count=0, expected_max_count=None, skip=None, mostly=None, null_lines_regex='^\s*$', result_format=None, include_config=True, catch_exceptions=None, meta=None, _lines=None)¶ Expect the number of times a regular expression appears on each line of a file to be between a maximum and minimum value.
- Parameters
regex – A string that can be compiled as valid regular expression to match
expected_min_count (None or nonnegative integer) – Specifies the minimum number of times regex is expected to appear on each line of the file
expected_max_count (None or nonnegative integer) – Specifies the maximum number of times regex is expected to appear on each line of the file
- Keyword Arguments
skip (None or nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations
mostly (None or number between 0 and 1) – Specifies an acceptable error for expectations. If the percentage of unexpected lines is less than mostly, the method still returns true even if all lines don’t match the expectation criteria.
null_lines_regex (valid regular expression or None) – If not none, a regex to skip lines as null. Defaults to empty or whitespace-only lines.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
_lines (list) – The lines over which to operate (provided by the file_lines_map_expectation decorator)
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_line_regex_match_count_to_equal
(self, regex, expected_count=0, skip=None, mostly=None, nonnull_lines_regex='^\s*$', result_format=None, include_config=True, catch_exceptions=None, meta=None, _lines=None)¶ Expect the number of times a regular expression appears on each line of a file to be between a maximum and minimum value.
- Parameters
regex – A string that can be compiled as valid regular expression to match
expected_count (None or nonnegative integer) – Specifies the number of times regex is expected to appear on each line of the file
- Keyword Arguments
skip (None or nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations
mostly (None or number between 0 and 1) – Specifies an acceptable error for expectations. If the percentage of unexpected lines is less than mostly, the method still returns true even if all lines don’t match the expectation criteria.
nonnull_lines_regex (valid regular expression or None) – If not none, a regex to skip lines as null. Defaults to empty or whitespace-only lines.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
_lines (list) – The lines over which to operate (provided by the file_lines_map_expectation decorator)
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_hash_to_equal
(self, value, hash_alg='md5', result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect computed file hash to equal some given value.
- Parameters
value – A string to compare with the computed hash value
- Keyword Arguments
hash_alg (string) – Indicates the hash algorithm to use
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_size_to_be_between
(self, minsize=0, maxsize=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect file size to be between a user specified maxsize and minsize.
- Parameters
minsize (integer) – minimum expected file size
maxsize (integer) – maximum expected file size
- Keyword Arguments
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_to_exist
(self, filepath=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Checks to see if a file specified by the user actually exists
- Parameters
filepath (str or None) – The filepath to evalutate. If none, will check the currently-configured path object of this FileDataAsset.
- Keyword Arguments
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_to_have_valid_table_header
(self, regex, skip=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Checks to see if a file has a line with unique delimited values, such a line may be used as a table header.
- Keyword Arguments
skip (nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations
regex (string) – A string that can be compiled as valid regular expression. Used to specify the elements of the table header (the column headers)
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_to_be_valid_json
(self, schema=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ - Parameters
schema – string optional JSON schema file on which JSON data file is validated against
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification.
For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-