DataAsset Module¶
-
class
great_expectations.data_asset.data_asset.
DataAsset
(*args, **kwargs)¶ -
autoinspect
(profiler)¶ Deprecated: use profile instead.
Use the provided profiler to evaluate this data_asset and assign the resulting expectation suite as its own.
- Parameters
profiler – The profiler to use
- Returns
tuple(expectation_suite, validation_results)
-
profile
(profiler)¶ Use the provided profiler to evaluate this data_asset and assign the resulting expectation suite as its own.
- Parameters
profiler – The profiler to use
- Returns
tuple(expectation_suite, validation_results)
-
edit_expectation_suite
()¶
-
classmethod
expectation
(method_arg_names)¶ Manages configuration and running of expectation objects.
Expectation builds and saves a new expectation configuration to the DataAsset object. It is the core decorator used by great expectations to manage expectation configurations.
- Parameters
method_arg_names (List) – An ordered list of the arguments used by the method implementing the expectation (typically the result of inspection). Positional arguments are explicitly mapped to keyword arguments when the expectation is run.
Notes
Intermediate decorators that call the core @expectation decorator will most likely need to pass their decorated methods’ signature up to the expectation decorator. For example, the MetaPandasDataset column_map_expectation decorator relies on the DataAsset expectation decorator, but will pass through the signature from the implementing method.
- @expectation intercepts and takes action based on the following parameters:
include_config (boolean or None) : If True, then include the generated expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) : If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
- result_format (str or None)Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY.
For more detail, see result_format.
meta (dict or None): A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
-
find_expectation_indexes
(expectation_type=None, column=None, expectation_kwargs=None)¶ Find matching expectations within _expectation_config. :param expectation_type=None: The name of the expectation type to be matched. :param column=None: The name of the column to be matched. :param expectation_kwargs=None: A dictionary of kwargs to match against.
- Returns
A list of indexes for matching expectation objects. If there are no matches, the list will be empty.
-
find_expectations
(expectation_type=None, column=None, expectation_kwargs=None, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True)¶ Find matching expectations within _expectation_config. :param expectation_type=None: The name of the expectation type to be matched. :param column=None: The name of the column to be matched. :param expectation_kwargs=None: A dictionary of kwargs to match against. :param discard_result_format_kwargs=True: In returned expectation object(s), suppress the result_format parameter. :param discard_include_config_kwargs=True: In returned expectation object(s), suppress the include_config parameter. :param discard_catch_exceptions_kwargs=True: In returned expectation object(s), suppress the catch_exceptions parameter.
- Returns
A list of matching expectation objects. If there are no matches, the list will be empty.
-
remove_expectation
(expectation_type=None, column=None, expectation_kwargs=None, remove_multiple_matches=False, dry_run=False)¶ Remove matching expectation(s) from _expectation_config. :param expectation_type=None: The name of the expectation type to be matched. :param column=None: The name of the column to be matched. :param expectation_kwargs=None: A dictionary of kwargs to match against. :param remove_multiple_matches=False: Match multiple expectations :param dry_run=False: Return a list of matching expectations without removing
- Returns
None, unless dry_run=True. If dry_run=True and remove_multiple_matches=False then return the expectation that would be removed. If dry_run=True and remove_multiple_matches=True then return a list of expectations that would be removed.
Note
If remove_expectation doesn’t find any matches, it raises a ValueError. If remove_expectation finds more than one matches and remove_multiple_matches!=True, it raises a ValueError. If dry_run=True, then remove_expectation acts as a thin layer to find_expectations, with the default values for discard_result_format_kwargs, discard_include_config_kwargs, and discard_catch_exceptions_kwargs
-
set_config_value
(key, value)¶
-
get_config_value
(key)¶
-
property
batch_kwargs
¶
-
property
batch_id
¶
-
property
batch_markers
¶
-
property
batch_parameters
¶
-
discard_failing_expectations
()¶
-
get_default_expectation_arguments
()¶ Fetch default expectation arguments for this data_asset
- Returns
A dictionary containing all the current default expectation arguments for a data_asset
Ex:
{ "include_config" : True, "catch_exceptions" : False, "result_format" : 'BASIC' }
See also
set_default_expectation_arguments
-
set_default_expectation_argument
(argument, value)¶ Set a default expectation argument for this data_asset
- Parameters
argument (string) – The argument to be replaced
value – The New argument to use for replacement
- Returns
None
See also
get_default_expectation_arguments
-
get_expectations_config
(discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False)¶
-
get_expectation_suite
(discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False)¶ Returns _expectation_config as a JSON object, and perform some cleaning along the way.
- Parameters
discard_failed_expectations (boolean) – Only include expectations with success_on_last_run=True in the exported config. Defaults to True.
discard_result_format_kwargs (boolean) – In returned expectation objects, suppress the result_format parameter. Defaults to True.
discard_include_config_kwargs (boolean) – In returned expectation objects, suppress the include_config parameter. Defaults to True.
discard_catch_exceptions_kwargs (boolean) – In returned expectation objects, suppress the catch_exceptions parameter. Defaults to True.
- Returns
An expectation suite.
Note
get_expectation_suite does not affect the underlying expectation suite at all. The returned suite is a copy of _expectation_suite, not the original object.
-
save_expectation_suite
(filepath=None, discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False)¶ Writes
_expectation_config
to a JSON file.Writes the DataAsset’s expectation config to the specified JSON
filepath
. Failing expectations can be excluded from the JSON expectations config withdiscard_failed_expectations
. The kwarg key-value pairs result_format, include_config, and catch_exceptions are optionally excluded from the JSON expectations config.- Parameters
filepath (string) – The location and name to write the JSON config file to.
discard_failed_expectations (boolean) – If True, excludes expectations that do not return
success = True
. If False, all expectations are written to the JSON config file.discard_result_format_kwargs (boolean) – If True, the result_format attribute for each expectation is not written to the JSON config file.
discard_include_config_kwargs (boolean) – If True, the include_config attribute for each expectation is not written to the JSON config file.
discard_catch_exceptions_kwargs (boolean) – If True, the catch_exceptions attribute for each expectation is not written to the JSON config file.
suppress_warnings (boolean) – It True, all warnings raised by Great Expectations, as a result of dropped expectations, are suppressed.
-
validate
(expectation_suite=None, run_id=None, data_context=None, evaluation_parameters=None, catch_exceptions=True, result_format=None, only_return_failures=False)¶ Generates a JSON-formatted report describing the outcome of all expectations.
Use the default expectation_suite=None to validate the expectations config associated with the DataAsset.
- Parameters
expectation_suite (json or None) – If None, uses the expectations config generated with the DataAsset during the current session. If a JSON file, validates those expectations.
run_id (str) – A string used to identify this validation result as part of a collection of validations. See DataContext for more information.
data_context (DataContext) – A datacontext object to use as part of validation for binding evaluation parameters and registering validation results.
evaluation_parameters (dict or None) – If None, uses the evaluation_paramters from the expectation_suite provided or as part of the data_asset. If a dict, uses the evaluation parameters in the dictionary.
catch_exceptions (boolean) – If True, exceptions raised by tests will not end validation and will be described in the returned report.
result_format (string or None) – If None, uses the default value (‘BASIC’ or as specified). If string, the returned expectation output follows the specified format (‘BOOLEAN_ONLY’,’BASIC’, etc.).
only_return_failures (boolean) – If True, expectation results are only returned when
success = False
- Returns
A JSON-formatted dictionary containing a list of the validation results. An example of the returned format:
{ "results": [ { "unexpected_list": [unexpected_value_1, unexpected_value_2], "expectation_type": "expect_*", "kwargs": { "column": "Column_Name", "output_format": "SUMMARY" }, "success": true, "raised_exception: false. "exception_traceback": null }, { ... (Second expectation results) }, ... (More expectations results) ], "success": true, "statistics": { "evaluated_expectations": n, "successful_expectations": m, "unsuccessful_expectations": n - m, "success_percent": m / n } }
Notes
If the configuration object was built with a different version of great expectations then the current environment. If no version was found in the configuration file.
- Raises
AttributeError - if 'catch_exceptions'=None and an expectation throws an AttributeError –
-
get_evaluation_parameter
(parameter_name, default_value=None)¶ Get an evaluation parameter value that has been stored in meta.
- Parameters
parameter_name (string) – The name of the parameter to store.
default_value (any) – The default value to be returned if the parameter is not found.
- Returns
The current value of the evaluation parameter.
-
set_evaluation_parameter
(parameter_name, parameter_value)¶ Provide a value to be stored in the data_asset evaluation_parameters object and used to evaluate parameterized expectations.
- Parameters
parameter_name (string) – The name of the kwarg to be replaced at evaluation time
parameter_value (any) – The value to be used
-
property
expectation_suite_name
¶ Gets the current expectation_suite name of this data_asset as stored in the expectations configuration.
-
test_expectation_function
(function, *args, **kwargs)¶ Test a generic expectation function
- Parameters
function (func) – The function to be tested. (Must be a valid expectation function.)
*args – Positional arguments to be passed the the function
**kwargs – Keyword arguments to be passed the the function
- Returns
A JSON-serializable expectation result object.
Notes
This function is a thin layer to allow quick testing of new expectation functions, without having to define custom classes, etc. To use developed expectations from the command-line tool, you will still need to define custom classes, etc.
Check out Building Custom Expectations for more information.
-
-
class
great_expectations.data_asset.data_asset.
ValidationStatistics
(evaluated_expectations, successful_expectations, unsuccessful_expectations, success_percent, success)¶ Bases:
tuple
-
property
evaluated_expectations
¶ Alias for field number 0
-
property
success
¶ Alias for field number 4
-
property
success_percent
¶ Alias for field number 3
-
property
successful_expectations
¶ Alias for field number 1
-
property
unsuccessful_expectations
¶ Alias for field number 2
-
property
FileDataAsset¶
-
class
great_expectations.data_asset.file_data_asset.
MetaFileDataAsset
(*args, **kwargs)¶ Bases:
great_expectations.data_asset.data_asset.DataAsset
MetaFileDataset is a thin layer above FileDataset. This two-layer inheritance is required to make @classmethod decorators work. Practically speaking, that means that MetaFileDataset implements expectation decorators, like file_lines_map_expectation and FileDataset implements the expectation methods themselves.
-
classmethod
file_lines_map_expectation
(func)¶ Constructs an expectation using file lines map semantics. The file_lines_map_expectations decorator handles boilerplate issues surrounding the common pattern of evaluating truthiness of some condition on an line by line basis in a file.
- Parameters
func (function) – The function implementing an expectation that will be applied line by line across a file. The function should take a file and return information about how many lines met expectations.
Notes
Users can specify skip value k that will cause the expectation function to disregard the first k lines of the file.
file_lines_map_expectation will add a kwarg _lines to the called function with the nonnull lines to process.
null_lines_regex defines a regex used to skip lines, but can be overridden
See also
expect_file_line_regex_match_count_to_be_between
for an example of a file_lines_map_expectation
-
classmethod
-
class
great_expectations.data_asset.file_data_asset.
FileDataAsset
(file_path=None, *args, **kwargs)¶ Bases:
great_expectations.data_asset.file_data_asset.MetaFileDataAsset
FileDataset instantiates the great_expectations Expectations API as a subclass of a python file object. For the full API reference, please see
DataAsset
-
expect_file_line_regex_match_count_to_be_between
(regex, expected_min_count=0, expected_max_count=None, skip=None, mostly=None, null_lines_regex='^\\s*$', result_format=None, include_config=True, catch_exceptions=None, meta=None, _lines=None)¶ Expect the number of times a regular expression appears on each line of a file to be between a maximum and minimum value.
- Parameters
regex – A string that can be compiled as valid regular expression to match
expected_min_count (None or nonnegative integer) – Specifies the minimum number of times regex is expected to appear on each line of the file
expected_max_count (None or nonnegative integer) – Specifies the maximum number of times regex is expected to appear on each line of the file
- Keyword Arguments
skip (None or nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations
mostly (None or number between 0 and 1) – Specifies an acceptable error for expectations. If the percentage of unexpected lines is less than mostly, the method still returns true even if all lines don’t match the expectation criteria.
null_lines_regex (valid regular expression or None) – If not none, a regex to skip lines as null. Defaults to empty or whitespace-only lines.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
_lines (list) – The lines over which to operate (provided by the file_lines_map_expectation decorator)
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_line_regex_match_count_to_equal
(regex, expected_count=0, skip=None, mostly=None, nonnull_lines_regex='^\\s*$', result_format=None, include_config=True, catch_exceptions=None, meta=None, _lines=None)¶ Expect the number of times a regular expression appears on each line of a file to be between a maximum and minimum value.
- Parameters
regex – A string that can be compiled as valid regular expression to match
expected_count (None or nonnegative integer) – Specifies the number of times regex is expected to appear on each line of the file
- Keyword Arguments
skip (None or nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations
mostly (None or number between 0 and 1) – Specifies an acceptable error for expectations. If the percentage of unexpected lines is less than mostly, the method still returns true even if all lines don’t match the expectation criteria.
null_lines_regex (valid regular expression or None) – If not none, a regex to skip lines as null. Defaults to empty or whitespace-only lines.
- Other Parameters
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
_lines (list) – The lines over which to operate (provided by the file_lines_map_expectation decorator)
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_hash_to_equal
(value, hash_alg='md5', result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect computed file hash to equal some given value.
- Parameters
value – A string to compare with the computed hash value
- Keyword Arguments
hash_alg (string) – Indicates the hash algorithm to use
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_size_to_be_between
(minsize=0, maxsize=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Expect file size to be between a user specified maxsize and minsize.
- Parameters
minsize (integer) – minimum expected file size
maxsize (integer) – maximum expected file size
- Keyword Arguments
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_to_exist
(filepath=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Checks to see if a file specified by the user actually exists
- Parameters
filepath (str or None) – The filepath to evalutate. If none, will check the currently-configured path object of this FileDataAsset.
- Keyword Arguments
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_to_have_valid_table_header
(regex, skip=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ Checks to see if a file has a line with unique delimited values, such a line may be used as a table header.
- Keyword Arguments
skip (nonnegative integer) – Integer specifying the first lines in the file the method should skip before assessing expectations
regex (string) – A string that can be compiled as valid regular expression. Used to specify the elements of the table header (the column headers)
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification. For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
expect_file_to_be_valid_json
(schema=None, result_format=None, include_config=True, catch_exceptions=None, meta=None)¶ - Parameters
schema – string optional JSON schema file on which JSON data file is validated against
result_format (str or None) – Which output mode to use: BOOLEAN_ONLY, BASIC, COMPLETE, or SUMMARY. For more detail, see result_format.
include_config (boolean) – If True, then include the expectation config as part of the result object. For more detail, see include_config.
catch_exceptions (boolean or None) – If True, then catch exceptions and include them as part of the result object. For more detail, see catch_exceptions.
meta (dict or None) – A JSON-serializable dictionary (nesting allowed) that will be included in the output without modification.
For more detail, see meta.
- Returns
A JSON-serializable expectation result object.
Exact fields vary depending on the values passed to result_format and include_config, catch_exceptions, and meta.
-
util¶
-
great_expectations.data_asset.util.
parse_result_format
(result_format)¶ This is a simple helper utility that can be used to parse a string result_format into the dict format used internally by great_expectations. It is not necessary but allows shorthand for result_format in cases where there is no need to specify a custom partial_unexpected_count.
-
class
great_expectations.data_asset.util.
DocInherit
(mthd)¶ Bases:
object
-
great_expectations.data_asset.util.
recursively_convert_to_json_serializable
(test_obj)¶ Helper function to convert a dict object to one that is serializable
- Parameters
test_obj – an object to attempt to convert a corresponding json-serializable object
- Returns
(dict) A converted test_object
Warning
test_obj may also be converted in place.