Expectation output formats¶
All Expectations accept an output_format parameter. Great Expectations defines four values for output_format: BOOLEAN_ONLY, BASIC, COMPLETE, and SUMMARY. The API also allows you to define new formats that mix, match, extend this initial set.
>> print list(my_df.my_var)
['A', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'E', 'F', 'F', 'F', 'F', 'F', 'F', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H']
>> my_df.expect_column_values_to_be_in_set(
"my_var",
["B", "C", "D", "F", "G", "H"],
output_format="BOOLEAN_ONLY"
)
False
>> my_df.expect_column_values_to_be_in_set(
"my_var",
["B", "C", "D", "F", "G", "H"],
output_format="BASIC"
)
{
'success': False,
'summary_obj': {
'exception_count': 6,
'exception_percent': 0.16666666666666666,
'exception_percent_nonmissing': 0.16666666666666666,
'partial_exception_list': ['A', 'E', 'E', 'E', 'E', 'E']
}
}
>> my_df.expect_column_values_to_be_in_set(
"my_var",
["B", "C", "D", "F", "G", "H"],
output_format="COMPLETE"
)
{
'exception_index_list': [0, 10, 11, 12, 13, 14],
'exception_list': ['A', 'E', 'E', 'E', 'E', 'E'],
'success': False
}
>> expect_column_values_to_match_regex(
"my_column",
"[A-Z][a-z]+",
output_format="SUMMARY"
)
{
'success': False,
'summary_obj': {
'element_count': 36,
'exception_count': 6,
'exception_percent': 0.16666666666666666,
'exception_percent_nonmissing': 0.16666666666666666,
'missing_count': 0,
'missing_percent': 0.0,
'partial_exception_counts': {'A': 1, 'E': 5},
'partial_exception_index_list': [0, 10, 11, 12, 13, 14],
'partial_exception_list': ['A', 'E', 'E', 'E', 'E', 'E']
}
}
The out-of-the-box default is output_format=BASIC.
Note: accepting a single parameter for output_format should make the library of formats relatively easy to extend in the future.
Behavior for BOOLEAN_ONLY result objects¶
…is simple: if the expectation is satisfied, it returns True. Otherwise it returns False.
>> my_df.expect_column_values_to_be_in_set(
"possible_benefactors",
["Joe Gargery", "Mrs. Gargery", "Mr. Pumblechook", "Ms. Havisham", "Mr. Jaggers"]
output_format="BOOLEAN_ONLY"
)
False
>> my_df.expect_column_values_to_be_in_set(
"possible_benefactors",
["Joe Gargery", "Mrs. Gargery", "Mr. Pumblechook", "Ms. Havisham", "Mr. Jaggers", "Mr. Magwitch"]
output_format="BOOLEAN_ONLY"
)
False
Behavior for BASIC result objects¶
…depends on the expectation. Great Expectations has native support for three types of Expectations: column_map_expectation, column_aggregate_expectation, and a base type expectation.
column_map_expectations apply a boolean test function to each element within a column. This format is intended for quick, at-a-glance feedback. For example, it tends to work well in jupyter notebooks.
The basic format is:
{
"success" : Boolean,
"summary_obj" : {
"partial_exception_list" : [A list of up to 20 values that violate the expectation]
"exception_count" : The total count of exceptions in the column
"exception_percent" : The overall percent of exceptions
"exception_percent_nonmissing" : The percent of exceptions, excluding mising values from the denominator
}
}
Note: when exception values are duplicated, exception_list will contain multiple copies of the value.
[1,2,2,3,3,3,None,None,None,None]
expect_column_values_to_be_unique
{
"success" : Boolean,
"summary_obj" : {
"exception_list" : [2,2,3,3,3]
"exception_index_list" : [1,2,3,4,5]
"exception_count" : 5,
"exception_percent" : 0.5,
"exception_percent_nonmissing" : 0.8333333,
}
}
column_aggregate_expectations compute a single value for the column and put it into true_value.
Format:
{
"success" : Boolean,
"true_value" : Depends
}
For example:
expect_table_row_count_to_be_between
{
"success" : true,
"true_value" : 7
}
expect_column_stdev_to_be_between
{
"success" : false
"true_value" : 3.04
}
expect_column_most_common_value_to_be
{
"success" : ...
"true_value" : ...
}
Behavior for SUMMARY result objects¶
SUMMARY provides a summary_obj with values usef of common exception values. For column_map_expectations, the standard format is:
{
'success': False,
'summary_obj': {
'element_count': 36,
'exception_count': 6,
'exception_percent': 0.16666666666666666,
'exception_percent_nonmissing': 0.16666666666666666,
'missing_count': 0,
'missing_percent': 0.0,
'partial_exception_counts': {'A': 1, 'E': 5},
'partial_exception_index_list': [0, 10, 11, 12, 13, 14],
'partial_exception_list': ['A', 'E', 'E', 'E', 'E', 'E']
}
}
For column_aggregate_expectations, SUMMARY output is the same as BASIC output, plus a summary_obj.
{
'success': False,
'true_value': 3.04,
'summary_obj': {
'element_count': 77,
'missing_count': 7,
'missing_percent': 0.1,
}
}
Quick reference¶
Expectation result fields | BASIC | SUMMARY | COMPLETE |
---|---|---|---|
success (boolean) | Included for all 3 output_formats | ||
expectation_type (string) | Included if and only if include_config=True | ||
expectation_kwargs (dict) | Included if and only if include_config=True | ||
raised_exception (boolean) | Included if and only if catch_exceptions=True | ||
exception_traceback (string or None) | Included if and only if catch_exceptions=True | ||
meta (dict) | Included if and only if meta=True | ||
true_value (depends) | Included for all column_aggregate_expectations | ||
exception_index_list (list) | no | no | yes |
exception_list (list) | no | no | yes |
summary_obj (dict) | yes | yes | no |
Fields within summary_obj | BASIC | SUMMARY |
---|---|---|
partial_exception_list | yes* | yes* |
partial_exception_index_list | no | yes* |
exception_count | yes* | yes* |
exception_percent | yes* | yes* |
exception_percent_nonmissing | yes* | yes* |
element_count | no | yes |
missing_count | no | yes |
missing_percent | no | yes |
partial_exception_counts | no | yes* |
Other… | Defined on a case by case basis. |
yes* : These variables are only defined for column_map_expectations.