great_expectations.util

Module Contents

Classes

bidict(*args: List[Any], **kwargs: Dict[str, Any])

Bi-directional hashmap: https://stackoverflow.com/a/21894086

Functions

camel_to_snake(name: str)

underscore(word: str)

Borrowed from inflection.underscore

hyphen(txt: str)

profile(func: Callable)

measure_execution_time(execution_time_holder_object_reference_name: str = ‘execution_time_holder’, execution_time_property_name: str = ‘execution_time’, method: str = ‘process_time’, pretty_print: bool = True, include_arguments: bool = True)

Parameterizes template “execution_time_decorator” function with options, supplied as arguments.

get_project_distribution()

get_currently_executing_function()

get_currently_executing_function_call_arguments(include_module_name: bool = False, include_caller_names: bool = False, **kwargs)

param include_module_name

bool If True, module name will be determined and included in output dictionary (default is False)

verify_dynamic_loading_support(module_name: str, package_name: Optional[str] = None)

param module_name

a possibly-relative name of a module

import_library_module(module_name: str)

param module_name

a fully-qualified name of a module (e.g., “great_expectations.dataset.sqlalchemy_dataset”)

is_library_loadable(library_name: str)

load_class(class_name: str, module_name: str)

_convert_to_dataset_class(df, dataset_class, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

_load_and_convert_to_dataset_class(df, class_name, module_name, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

read_csv(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

read_json(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

read_excel(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

read_table(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

read_feather(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

read_parquet(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

from_pandas(pandas_df, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

read_pickle(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

read_sas(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_sas and return a great_expectations dataset.

build_in_memory_runtime_context()

Create generic in-memory “BaseDataContext” context for manipulations as required by tests.

validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name=’great_expectations.dataset’, data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use

gen_directory_tree_str(startpath)

Print the structure of directory as a tree:

lint_code(code: str)

Lint strings of code passed in. Optional dependency “black” must be installed.

convert_json_string_to_be_python_compliant(code: str)

Cleans JSON-formatted string to adhere to Python syntax

_convert_nulls_to_None(code: str)

_convert_json_bools_to_python_bools(code: str)

filter_properties_dict(properties: Optional[dict] = None, keep_fields: Optional[Set[str]] = None, delete_fields: Optional[Set[str]] = None, clean_nulls: bool = True, clean_falsy: bool = False, keep_falsy_numerics: bool = True, inplace: bool = False)

Filter the entries of the source dictionary according to directives concerning the existing keys and values.

deep_filter_properties_iterable(properties: dict, keep_fields: Optional[Set[str]] = …, delete_fields: Optional[Set[str]] = …, clean_nulls: bool = …, clean_falsy: bool = …, keep_falsy_numerics: bool = …, inplace: bool = …)

deep_filter_properties_iterable(properties: dict, keep_fields: Optional[Set[str]] = …, delete_fields: Optional[Set[str]] = …, clean_nulls: bool = …, clean_falsy: bool = …, keep_falsy_numerics: bool = …, inplace: bool = …)

deep_filter_properties_iterable(properties: dict, keep_fields: Optional[Set[str]] = …, delete_fields: Optional[Set[str]] = …, clean_nulls: bool = …, clean_falsy: bool = …, keep_falsy_numerics: bool = …, inplace: bool = …)

deep_filter_properties_iterable(properties: dict, keep_fields: Optional[Set[str]] = …, delete_fields: Optional[Set[str]] = …, clean_nulls: bool = …, clean_falsy: bool = …, keep_falsy_numerics: bool = …, inplace: bool = …)

deep_filter_properties_iterable(properties: dict, keep_fields: Optional[Set[str]] = …, delete_fields: Optional[Set[str]] = …, clean_nulls: bool = …, clean_falsy: bool = …, keep_falsy_numerics: bool = …, inplace: bool = …)

deep_filter_properties_iterable(properties: dict, keep_fields: Optional[Set[str]] = …, delete_fields: Optional[Set[str]] = …, clean_nulls: bool = …, clean_falsy: bool = …, keep_falsy_numerics: bool = …, inplace: bool = …)

_is_to_be_removed_from_deep_filter_properties_iterable(value: Any, clean_nulls: bool, clean_falsy: bool, keep_falsy_numerics: bool)

is_truthy(value: Any)

is_numeric(value: Any)

is_int(value: Any)

is_float(value: Any)

is_nan(value: Any)

If value is an array, test element-wise for NaN and return result as a boolean array.

convert_decimal_to_float(d: decimal.Decimal)

This method convers “decimal.Decimal” to standard “float” type.

requires_lossy_conversion(d: decimal.Decimal)

This method determines whether or not conversion from “decimal.Decimal” to standard “float” type cannot be lossless.

isclose(operand_a: Union[datetime.datetime, datetime.timedelta, Number], operand_b: Union[datetime.datetime, datetime.timedelta, Number], rtol: float = 1e-05, atol: float = 1e-08, equal_nan: bool = False)

Checks whether or not two numbers (or timestamps) are approximately close to one another.

is_candidate_subset_of_target(candidate: Any, target: Any)

This method checks whether or not candidate object is subset of target object.

is_parseable_date(value: Any, fuzzy: bool = False)

is_ndarray_datetime_dtype(data: np.ndarray, parse_strings_as_datetimes: bool = False, fuzzy: bool = False)

Determine whether or not all elements of 1-D “np.ndarray” argument are “datetime.datetime” type objects.

convert_ndarray_to_datetime_dtype_best_effort(data: np.ndarray, datetime_detected: bool = False, parse_strings_as_datetimes: bool = False, fuzzy: bool = False)

Attempt to parse all elements of 1-D “np.ndarray” argument into “datetime.datetime” type objects.

convert_ndarray_datetime_to_float_dtype_utc_timezone(data: np.ndarray)

Convert all elements of 1-D “np.ndarray” argument from “datetime.datetime” type to “timestamp” “float” type objects.

convert_ndarray_float_to_datetime_dtype(data: np.ndarray)

Convert all elements of 1-D “np.ndarray” argument from “float” type to “datetime.datetime” type objects.

convert_ndarray_float_to_datetime_tuple(data: np.ndarray)

Convert all elements of 1-D “np.ndarray” argument from “float” type to “datetime.datetime” type tuple elements.

is_ndarray_decimal_dtype(data: npt.NDArray)

Determine whether or not all elements of 1-D “np.ndarray” argument are “decimal.Decimal” type objects.

convert_ndarray_decimal_to_float_dtype(data: np.ndarray)

Convert all elements of N-D “np.ndarray” argument from “decimal.Decimal” type to “float” type objects.

get_context(project_config: Optional[Union[‘DataContextConfig’, Mapping]] = None, context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None, cloud_mode: Optional[bool] = None, ge_cloud_base_url: Optional[str] = None, ge_cloud_access_token: Optional[str] = None, ge_cloud_organization_id: Optional[str] = None, ge_cloud_mode: Optional[bool] = None)

Method to return the appropriate DataContext depending on parameters and environment.

is_sane_slack_webhook(url: str)

Really basic sanity checking.

is_list_of_strings(_list)

generate_library_json_from_registered_expectations()

Generate the JSON object used to populate the public gallery

delete_blank_lines(text: str)

generate_temporary_table_name(default_table_name_prefix: str = ‘ge_temp_’, num_digits: int = 8)

get_sqlalchemy_inspector(engine)

get_sqlalchemy_url(drivername, **credentials)

get_sqlalchemy_selectable(selectable: Union[Table, Select])

Beginning from SQLAlchemy 1.4, a select() can no longer be embedded inside of another select() directly,

get_sqlalchemy_subquery_type()

Beginning from SQLAlchemy 1.4, sqlalchemy.sql.Alias has been deprecated in favor of sqlalchemy.sql.Subquery.

get_sqlalchemy_domain_data(domain_data)

import_make_url()

Beginning from SQLAlchemy 1.4, make_url is accessed from sqlalchemy.engine; earlier versions must

get_pyathena_potential_type(type_module, type_)

get_trino_potential_type(type_module: ModuleType, type_: str)

Leverage on Trino Package to return sqlalchemy sql type

pandas_series_between_inclusive(series: pd.Series, min_value: int, max_value: int)

As of Pandas 1.3.0, the ‘inclusive’ arg in between() is an enum: {“left”, “right”, “neither”, “both”}

numpy_quantile(a: np.ndarray, q: float, method: str, axis: Optional[int] = None)

As of NumPy 1.21.0, the ‘interpolation’ arg in quantile() has been renamed to method.

great_expectations.util.black
great_expectations.util.logger
great_expectations.util.sa
great_expectations.util.p1
great_expectations.util.p2
class great_expectations.util.bidict(*args: List[Any], **kwargs: Dict[str, Any])

Bases: dict

Bi-directional hashmap: https://stackoverflow.com/a/21894086

__setitem__(self, key: str, value: Any)

Set self[key] to value.

__delitem__(self, key: str)

Delete self[key].

great_expectations.util.camel_to_snake(name: str) → str
great_expectations.util.underscore(word: str) → str

Borrowed from inflection.underscore Make an underscored, lowercase form from the expression in the string.

Example:

>>> underscore("DeviceType")
'device_type'

As a rule of thumb you can think of underscore() as the inverse of camelize(), though there are cases where that does not hold:

>>> camelize(underscore("IOError"))
'IoError'
great_expectations.util.hyphen(txt: str)
great_expectations.util.profile(func: Callable) → Callable
great_expectations.util.measure_execution_time(execution_time_holder_object_reference_name: str = 'execution_time_holder', execution_time_property_name: str = 'execution_time', method: str = 'process_time', pretty_print: bool = True, include_arguments: bool = True) → Callable

Parameterizes template “execution_time_decorator” function with options, supplied as arguments.

Parameters
  • execution_time_holder_object_reference_name – Handle, provided in “kwargs”, holds execution time property setter.

  • execution_time_property_name – Property attribute nane, provided in “kwargs”, sets execution time value.

  • method – Name of method in “time” module (default: “process_time”) to be used for recording timestamps.

  • pretty_print – If True (default), prints execution time summary to standard output; if False, “silent” mode.

  • include_arguments – If True (default), prints arguments of function, whose execution time is measured.

Note: Method “time.perf_counter()” keeps going during sleep, while method “time.process_time()” does not. Using “time.process_time()” is the better suited method for measuring code computational efficiency.

Returns

Callable – configured “execution_time_decorator” function.

great_expectations.util.get_project_distribution() → Optional[Distribution]
great_expectations.util.get_currently_executing_function() → Callable
great_expectations.util.get_currently_executing_function_call_arguments(include_module_name: bool = False, include_caller_names: bool = False, **kwargs) → dict
Parameters
  • include_module_name – bool If True, module name will be determined and included in output dictionary (default is False)

  • include_caller_names – bool If True, arguments, such as “self” and “cls”, if present, will be included in output dictionary (default is False)

  • kwargs

Returns

dict Output dictionary, consisting of call arguments as attribute “name: value” pairs.

Example usage: # Gather the call arguments of the present function (include the “module_name” and add the “class_name”), filter # out the Falsy values, and set the instance “_config” variable equal to the resulting dictionary. self._config = get_currently_executing_function_call_arguments(

include_module_name=True, **{

“class_name”: self.__class__.__name__,

},

) filter_properties_dict(properties=self._config, clean_falsy=True, inplace=True)

great_expectations.util.verify_dynamic_loading_support(module_name: str, package_name: Optional[str] = None) → None
Parameters
  • module_name – a possibly-relative name of a module

  • package_name – the name of a package, to which the given module belongs

great_expectations.util.import_library_module(module_name: str) → Optional[ModuleType]
Parameters

module_name – a fully-qualified name of a module (e.g., “great_expectations.dataset.sqlalchemy_dataset”)

Returns

raw source code of the module (if can be retrieved)

great_expectations.util.is_library_loadable(library_name: str) → bool
great_expectations.util.load_class(class_name: str, module_name: str)
great_expectations.util._convert_to_dataset_class(df, dataset_class, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

Parameters
  • df – the DataFrame object to convert

  • dataset_class – the class to which to convert the existing DataFrame

  • expectation_suite – the expectation suite that should be attached to the resulting dataset

  • profiler – the profiler to use to generate baseline expectations, if any

Returns

A new Dataset object

great_expectations.util._load_and_convert_to_dataset_class(df, class_name, module_name, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

Parameters
  • df – the DataFrame object to convert

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • expectation_suite – the expectation suite that should be attached to the resulting dataset

  • profiler – the profiler to use to generate baseline expectations, if any

Returns

A new Dataset object

great_expectations.util.read_csv(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_json(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • accessor_func (Callable) – functions to transform the json object in the file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_excel(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset or ordered dict of great_expectations datasets, if multiple worksheets are imported

great_expectations.util.read_table(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_feather(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_parquet(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.from_pandas(pandas_df, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

Parameters
  • pandas_df (Pandas df) – Pandas data frame

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (profiler class) – The profiler that should be run on the dataset to establish a baseline expectation suite.

Returns

great_expectations dataset

great_expectations.util.read_pickle(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_sas(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_sas and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.build_in_memory_runtime_context() → 'BaseDataContext'

Create generic in-memory “BaseDataContext” context for manipulations as required by tests.

great_expectations.util.validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name='great_expectations.dataset', data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use to fetch an expectation_suite if one is not provided, and data_asset_class_name/data_asset_module_name or data_asset_class to use to provide custom expectations.

Parameters
  • data_asset – the asset to validate

  • expectation_suite – the suite to use, or None to fetch one using a DataContext

  • data_asset_name – the name of the data asset to use

  • expectation_suite_name – the name of the expectation_suite to use

  • data_context – data context to use to fetch an an expectation suite, or the path from which to obtain one

  • data_asset_class_name – the name of a class to dynamically load a DataAsset class

  • data_asset_module_name – the name of the module to dynamically load a DataAsset class

  • data_asset_class – a class to use. overrides data_asset_class_name/ data_asset_module_name if provided

  • *args

  • **kwargs

Returns:

great_expectations.util.gen_directory_tree_str(startpath)

Print the structure of directory as a tree:

Ex: project_dir0/

AAA/ BBB/

aaa.txt bbb.txt

#Note: files and directories are sorted alphabetically, so that this method can be used for testing.

great_expectations.util.lint_code(code: str) → str

Lint strings of code passed in. Optional dependency “black” must be installed.

great_expectations.util.convert_json_string_to_be_python_compliant(code: str) → str

Cleans JSON-formatted string to adhere to Python syntax

Substitute instances of ‘null’ with ‘None’ in string representations of Python dictionaries. Additionally, substitutes instances of ‘true’ or ‘false’ with their Python equivalents.

Parameters

code – JSON string to update

Returns

Clean, Python-compliant string

great_expectations.util._convert_nulls_to_None(code: str) → str
great_expectations.util._convert_json_bools_to_python_bools(code: str) → str
great_expectations.util.filter_properties_dict(properties: Optional[dict] = None, keep_fields: Optional[Set[str]] = None, delete_fields: Optional[Set[str]] = None, clean_nulls: bool = True, clean_falsy: bool = False, keep_falsy_numerics: bool = True, inplace: bool = False) → Optional[dict]

Filter the entries of the source dictionary according to directives concerning the existing keys and values.

Parameters
  • properties – source dictionary to be filtered according to the supplied filtering directives

  • keep_fields – list of keys that must be retained, with the understanding that all other entries will be deleted

  • delete_fields – list of keys that must be deleted, with the understanding that all other entries will be retained

  • clean_nulls – If True, then in addition to other filtering directives, delete entries, whose values are None

  • clean_falsy – If True, then in addition to other filtering directives, delete entries, whose values are Falsy

  • the "clean_falsy" argument is specified as "True", then "clean_nulls" is assumed to be "True" as well.) ((If) –

  • inplace – If True, then modify the source properties dictionary; otherwise, make a copy for filtering purposes

  • keep_falsy_numerics – If True, then in addition to other filtering directives, do not delete zero-valued numerics

Returns

The (possibly) filtered properties dictionary (or None if no entries remain after filtering is performed)

great_expectations.util.deep_filter_properties_iterable(properties: dict, keep_fields: Optional[Set[str]] = ..., delete_fields: Optional[Set[str]] = ..., clean_nulls: bool = ..., clean_falsy: bool = ..., keep_falsy_numerics: bool = ..., inplace: bool = ...) → dict
great_expectations.util.deep_filter_properties_iterable(properties: list, keep_fields: Optional[Set[str]] = ..., delete_fields: Optional[Set[str]] = ..., clean_nulls: bool = ..., clean_falsy: bool = ..., keep_falsy_numerics: bool = ..., inplace: bool = ...) → list
great_expectations.util.deep_filter_properties_iterable(properties: set, keep_fields: Optional[Set[str]] = ..., delete_fields: Optional[Set[str]] = ..., clean_nulls: bool = ..., clean_falsy: bool = ..., keep_falsy_numerics: bool = ..., inplace: bool = ...) → set
great_expectations.util.deep_filter_properties_iterable(properties: tuple, keep_fields: Optional[Set[str]] = ..., delete_fields: Optional[Set[str]] = ..., clean_nulls: bool = ..., clean_falsy: bool = ..., keep_falsy_numerics: bool = ..., inplace: bool = ...) → tuple
great_expectations.util.deep_filter_properties_iterable(properties: None, keep_fields: Optional[Set[str]] = ..., delete_fields: Optional[Set[str]] = ..., clean_nulls: bool = ..., clean_falsy: bool = ..., keep_falsy_numerics: bool = ..., inplace: bool = ...) → None
great_expectations.util.deep_filter_properties_iterable(properties: Union[dict, list, set, tuple, None] = None, keep_fields: Optional[Set[str]] = None, delete_fields: Optional[Set[str]] = None, clean_nulls: bool = True, clean_falsy: bool = False, keep_falsy_numerics: bool = True, inplace: bool = False) → Union[dict, list, set, tuple, None]
great_expectations.util._is_to_be_removed_from_deep_filter_properties_iterable(value: Any, clean_nulls: bool, clean_falsy: bool, keep_falsy_numerics: bool) → bool
great_expectations.util.is_truthy(value: Any) → bool
great_expectations.util.is_numeric(value: Any) → bool
great_expectations.util.is_int(value: Any) → bool
great_expectations.util.is_float(value: Any) → bool
great_expectations.util.is_nan(value: Any) → bool

If value is an array, test element-wise for NaN and return result as a boolean array. If value is a scalar, return boolean. :param value: The value to test

Returns

The results of the test

great_expectations.util.convert_decimal_to_float(d: decimal.Decimal) → float

This method convers “decimal.Decimal” to standard “float” type.

great_expectations.util.requires_lossy_conversion(d: decimal.Decimal) → bool

This method determines whether or not conversion from “decimal.Decimal” to standard “float” type cannot be lossless.

great_expectations.util.isclose(operand_a: Union[datetime.datetime, datetime.timedelta, Number], operand_b: Union[datetime.datetime, datetime.timedelta, Number], rtol: float = 1e-05, atol: float = 1e-08, equal_nan: bool = False) → bool

Checks whether or not two numbers (or timestamps) are approximately close to one another.

According to “https://numpy.org/doc/stable/reference/generated/numpy.isclose.html”,

For finite values, isclose uses the following equation to test whether two floating point values are equivalent: “absolute(a - b) <= (atol + rtol * absolute(b))”.

This translates to:

“absolute(operand_a - operand_b) <= (atol + rtol * absolute(operand_b))”, where “operand_a” is “target” quantity

under evaluation for being close to a “control” value, and “operand_b” serves as the “control” (“reference”) value.

The values of the absolute tolerance (“atol”) parameter is chosen as a sufficiently small constant for most floating point machine representations (e.g., 1.0e-8), so that even if the “control” value is small in magnitude and “target” and “control” are close in absolute value, then the accuracy of the assessment can still be high up to the precision of the “atol” value (here, 8 digits as the default). However, when the “control” value is large in magnitude, the relative tolerance (“rtol”) parameter carries a greater weight in the comparison assessment, because the acceptable deviation between the two quantities can be relatively larger for them to be deemed as “close enough” in this case.

great_expectations.util.is_candidate_subset_of_target(candidate: Any, target: Any) → bool

This method checks whether or not candidate object is subset of target object.

great_expectations.util.is_parseable_date(value: Any, fuzzy: bool = False) → bool
great_expectations.util.is_ndarray_datetime_dtype(data: np.ndarray, parse_strings_as_datetimes: bool = False, fuzzy: bool = False) → bool

Determine whether or not all elements of 1-D “np.ndarray” argument are “datetime.datetime” type objects.

great_expectations.util.convert_ndarray_to_datetime_dtype_best_effort(data: np.ndarray, datetime_detected: bool = False, parse_strings_as_datetimes: bool = False, fuzzy: bool = False) → Tuple[bool, bool, np.ndarray]

Attempt to parse all elements of 1-D “np.ndarray” argument into “datetime.datetime” type objects.

Returns

Boolean flag – True if all elements of original “data” were “datetime.datetime” type objects; False, otherwise. Boolean flag – True, if conversion was performed; False, otherwise. Output “np.ndarray” (converted, if necessary).

great_expectations.util.convert_ndarray_datetime_to_float_dtype_utc_timezone(data: np.ndarray) → np.ndarray

Convert all elements of 1-D “np.ndarray” argument from “datetime.datetime” type to “timestamp” “float” type objects.

Note: Conversion of “datetime.datetime” to “float” uses “UTC” TimeZone to normalize all “datetime.datetime” values.

great_expectations.util.convert_ndarray_float_to_datetime_dtype(data: np.ndarray) → np.ndarray

Convert all elements of 1-D “np.ndarray” argument from “float” type to “datetime.datetime” type objects.

Note: Converts to “naive” “datetime.datetime” values (assumes “UTC” TimeZone based floating point timestamps).

great_expectations.util.convert_ndarray_float_to_datetime_tuple(data: np.ndarray) → Tuple[datetime.datetime, ...]

Convert all elements of 1-D “np.ndarray” argument from “float” type to “datetime.datetime” type tuple elements.

Note: Converts to “naive” “datetime.datetime” values (assumes “UTC” TimeZone based floating point timestamps).

great_expectations.util.is_ndarray_decimal_dtype(data: npt.NDArray) → TypeGuard['npt.NDArray']

Determine whether or not all elements of 1-D “np.ndarray” argument are “decimal.Decimal” type objects.

great_expectations.util.convert_ndarray_decimal_to_float_dtype(data: np.ndarray) → np.ndarray

Convert all elements of N-D “np.ndarray” argument from “decimal.Decimal” type to “float” type objects.

great_expectations.util.get_context(project_config: Optional[Union['DataContextConfig', Mapping]] = None, context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None, cloud_mode: Optional[bool] = None, ge_cloud_base_url: Optional[str] = None, ge_cloud_access_token: Optional[str] = None, ge_cloud_organization_id: Optional[str] = None, ge_cloud_mode: Optional[bool] = None) → Union['DataContext', 'BaseDataContext', 'CloudDataContext']

Method to return the appropriate DataContext depending on parameters and environment.

Usage:

import great_expectations as gx my_context = gx.get_context([parameters])

  1. If gx.get_context() is run in a filesystem where great_expectations init has been run, then it will return a

    DataContext

  2. If gx.get_context() is passed in a context_root_dir (which contains great_expectations.yml) then it will return

    a DataContext

  3. If gx.get_context() is passed in an in-memory project_config then it will return BaseDataContext.

    context_root_dir can also be passed in, but the configurations from the in-memory config will override the configurations in the great_expectations.yml file.

  4. If GX is being run in the cloud, and the information needed for ge_cloud_config (ie ge_cloud_base_url,

    ge_cloud_access_token, ge_cloud_organization_id) are passed in as parameters to get_context(), configured as environment variables, or in a .conf file, then get_context() will return a CloudDataContext.

get_context params

Env Not Config’d

Env Config’d

() (cloud_mode=True) (cloud_mode=False)

Local Exception! Local

Cloud Cloud Local

TODO: This method will eventually return FileDataContext and EphemeralDataContext, rather than DataContext and Base

Parameters
  • project_config (dict or DataContextConfig) – In-memory configuration for DataContext.

  • context_root_dir (str) – Path to directory that contains great_expectations.yml file

  • runtime_environment (dict) – A dictionary of values can be passed to a DataContext when it is instantiated. These values will override both values from the config variables file and from environment variables.

  • following parameters are relevant when running ge_cloud (The) –

  • cloud_base_url (str) – url for ge_cloud endpoint.

  • cloud_access_token (str) – access_token for ge_cloud account.

  • cloud_organization_id (str) – org_id for ge_cloud account.

  • cloud_mode (bool) – bool flag to specify whether to run GX in cloud mode (default is None).

Returns

DataContext. Either a DataContext, BaseDataContext, or CloudDataContext depending on environment and/or parameters

great_expectations.util.is_sane_slack_webhook(url: str) → bool

Really basic sanity checking.

great_expectations.util.is_list_of_strings(_list) → TypeGuard[List[str]]
great_expectations.util.generate_library_json_from_registered_expectations()

Generate the JSON object used to populate the public gallery

great_expectations.util.delete_blank_lines(text: str) → str
great_expectations.util.generate_temporary_table_name(default_table_name_prefix: str = 'ge_temp_', num_digits: int = 8) → str
great_expectations.util.get_sqlalchemy_inspector(engine)
great_expectations.util.get_sqlalchemy_url(drivername, **credentials)
great_expectations.util.get_sqlalchemy_selectable(selectable: Union[Table, Select]) → Union[Table, Select]

Beginning from SQLAlchemy 1.4, a select() can no longer be embedded inside of another select() directly, without explicitly turning the inner select() into a subquery first. This helper method ensures that this conversion takes place.

For versions of SQLAlchemy < 1.4 the implicit conversion to a subquery may not always work, so that also needs to be handled here, using the old equivalent method.

https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#change-4617

great_expectations.util.get_sqlalchemy_subquery_type()

Beginning from SQLAlchemy 1.4, sqlalchemy.sql.Alias has been deprecated in favor of sqlalchemy.sql.Subquery. This helper method ensures that the appropriate type is returned.

https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#change-4617

great_expectations.util.get_sqlalchemy_domain_data(domain_data)
great_expectations.util.import_make_url()

Beginning from SQLAlchemy 1.4, make_url is accessed from sqlalchemy.engine; earlier versions must still be accessed from sqlalchemy.engine.url to avoid import errors.

great_expectations.util.get_pyathena_potential_type(type_module, type_) → str
great_expectations.util.get_trino_potential_type(type_module: ModuleType, type_: str) → object

Leverage on Trino Package to return sqlalchemy sql type

great_expectations.util.pandas_series_between_inclusive(series: pd.Series, min_value: int, max_value: int) → pd.Series

As of Pandas 1.3.0, the ‘inclusive’ arg in between() is an enum: {“left”, “right”, “neither”, “both”}

great_expectations.util.numpy_quantile(a: np.ndarray, q: float, method: str, axis: Optional[int] = None) → Union[np.float64, np.ndarray]

As of NumPy 1.21.0, the ‘interpolation’ arg in quantile() has been renamed to method. Source: https://numpy.org/doc/stable/reference/generated/numpy.quantile.html