great_expectations.util

Module Contents

Functions

pluralize(singular_ge_noun)

Pluralizes a Great Expectations singular noun

singularize(plural_ge_noun)

Singularizes a Great Expectations plural noun

underscore(word: str)

Borrowed from inflection.underscore

hyphen(txt: str)

profile(func: Callable = None)

measure_execution_time(pretty_print: bool = False)

get_project_distribution()

get_currently_executing_function()

get_currently_executing_function_call_arguments(include_module_name: bool = False, include_caller_names: bool = False, **kwargs)

param include_module_name

bool If True, module name will be determined and included in output dictionary (default is False)

verify_dynamic_loading_support(module_name: str, package_name: str = None)

param module_name

a possibly-relative name of a module

import_library_module(module_name: str)

param module_name

a fully-qualified name of a module (e.g., “great_expectations.dataset.sqlalchemy_dataset”)

is_library_loadable(library_name: str)

load_class(class_name: str, module_name: str)

_convert_to_dataset_class(df, dataset_class, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

_load_and_convert_to_dataset_class(df, class_name, module_name, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

read_csv(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

read_json(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

read_excel(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

read_table(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

read_feather(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

read_parquet(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

from_pandas(pandas_df, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

read_pickle(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

read_sas(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_sas and return a great_expectations dataset.

validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name=’great_expectations.dataset’, data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use

gen_directory_tree_str(startpath)

Print the structure of directory as a tree:

lint_code(code: str)

Lint strings of code passed in. Optional dependency “black” must be installed.

convert_json_string_to_be_python_compliant(code: str)

Cleans JSON-formatted string to adhere to Python syntax

_convert_nulls_to_None(code: str)

_convert_json_bools_to_python_bools(code: str)

filter_properties_dict(properties: Optional[dict] = None, keep_fields: Optional[Set[str]] = None, delete_fields: Optional[Set[str]] = None, clean_nulls: bool = True, clean_falsy: bool = False, keep_falsy_numerics: bool = True, inplace: bool = False)

Filter the entries of the source dictionary according to directives concerning the existing keys and values.

deep_filter_properties_iterable(properties: Optional[Union[dict, list, set, tuple]] = None, keep_fields: Optional[Set[str]] = None, delete_fields: Optional[Set[str]] = None, clean_nulls: bool = True, clean_falsy: bool = False, keep_falsy_numerics: bool = True, inplace: bool = False)

is_truthy(value: Any)

is_numeric(value: Any)

is_int(value: Any)

is_float(value: Any)

is_nan(value: Any)

If value is an array, test element-wise for NaN and return result as a boolean array.

is_parseable_date(value: Any, fuzzy: bool = False)

get_context()

is_sane_slack_webhook(url: str)

Really basic sanity checking.

is_list_of_strings(_list)

generate_library_json_from_registered_expectations()

Generate the JSON object used to populate the public gallery

delete_blank_lines(text: str)

generate_temporary_table_name(default_table_name_prefix: str = ‘ge_temp_’, num_digits: int = 8)

get_sqlalchemy_inspector(engine)

get_sqlalchemy_url(drivername, **credentials)

get_sqlalchemy_selectable(selectable: Union[Table, Select])

Beginning from SQLAlchemy 1.4, a select() can no longer be embedded inside of another select() directly,

get_sqlalchemy_domain_data(domain_data)

import_make_url()

Beginning from SQLAlchemy 1.4, make_url is accessed from sqlalchemy.engine; earlier versions must

great_expectations.util.black
great_expectations.util.logger
great_expectations.util.sa
great_expectations.util.SINGULAR_TO_PLURAL_LOOKUP_DICT
great_expectations.util.PLURAL_TO_SINGULAR_LOOKUP_DICT
great_expectations.util.pluralize(singular_ge_noun)

Pluralizes a Great Expectations singular noun

great_expectations.util.singularize(plural_ge_noun)

Singularizes a Great Expectations plural noun

great_expectations.util.underscore(word: str) → str

Borrowed from inflection.underscore Make an underscored, lowercase form from the expression in the string.

Example:

>>> underscore("DeviceType")
'device_type'

As a rule of thumb you can think of underscore() as the inverse of camelize(), though there are cases where that does not hold:

>>> camelize(underscore("IOError"))
'IoError'
great_expectations.util.hyphen(txt: str)
great_expectations.util.profile(func: Callable = None) → Callable
great_expectations.util.measure_execution_time(pretty_print: bool = False) → Callable
great_expectations.util.get_project_distribution() → Optional[Distribution]
great_expectations.util.get_currently_executing_function() → Callable
great_expectations.util.get_currently_executing_function_call_arguments(include_module_name: bool = False, include_caller_names: bool = False, **kwargs) → dict
Parameters
  • include_module_name – bool If True, module name will be determined and included in output dictionary (default is False)

  • include_caller_names – bool If True, arguments, such as “self” and “cls”, if present, will be included in output dictionary (default is False)

  • kwargs

Returns

dict Output dictionary, consisting of call arguments as attribute “name: value” pairs.

Example usage: # Gather the call arguments of the present function (include the “module_name” and add the “class_name”), filter # out the Falsy values, and set the instance “_config” variable equal to the resulting dictionary. self._config = get_currently_executing_function_call_arguments(

include_module_name=True, **{

“class_name”: self.__class__.__name__,

},

) filter_properties_dict(properties=self._config, clean_falsy=True, inplace=True)

great_expectations.util.verify_dynamic_loading_support(module_name: str, package_name: str = None) → None
Parameters
  • module_name – a possibly-relative name of a module

  • package_name – the name of a package, to which the given module belongs

great_expectations.util.import_library_module(module_name: str) → Optional[ModuleType]
Parameters

module_name – a fully-qualified name of a module (e.g., “great_expectations.dataset.sqlalchemy_dataset”)

Returns

raw source code of the module (if can be retrieved)

great_expectations.util.is_library_loadable(library_name: str) → bool
great_expectations.util.load_class(class_name: str, module_name: str)
great_expectations.util._convert_to_dataset_class(df, dataset_class, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

Parameters
  • df – the DataFrame object to convert

  • dataset_class – the class to which to convert the existing DataFrame

  • expectation_suite – the expectation suite that should be attached to the resulting dataset

  • profiler – the profiler to use to generate baseline expectations, if any

Returns

A new Dataset object

great_expectations.util._load_and_convert_to_dataset_class(df, class_name, module_name, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

Parameters
  • df – the DataFrame object to convert

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • expectation_suite – the expectation suite that should be attached to the resulting dataset

  • profiler – the profiler to use to generate baseline expectations, if any

Returns

A new Dataset object

great_expectations.util.read_csv(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_json(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • accessor_func (Callable) – functions to transform the json object in the file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_excel(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset or ordered dict of great_expectations datasets, if multiple worksheets are imported

great_expectations.util.read_table(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_feather(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_parquet(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.from_pandas(pandas_df, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

Parameters
  • pandas_df (Pandas df) – Pandas data frame

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (profiler class) – The profiler that should be run on the dataset to establish a baseline expectation suite.

Returns

great_expectations dataset

great_expectations.util.read_pickle(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_sas(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_sas and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name='great_expectations.dataset', data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use to fetch an expectation_suite if one is not provided, and data_asset_class_name/data_asset_module_name or data_asset_class to use to provide custom expectations.

Parameters
  • data_asset – the asset to validate

  • expectation_suite – the suite to use, or None to fetch one using a DataContext

  • data_asset_name – the name of the data asset to use

  • expectation_suite_name – the name of the expectation_suite to use

  • data_context – data context to use to fetch an an expectation suite, or the path from which to obtain one

  • data_asset_class_name – the name of a class to dynamically load a DataAsset class

  • data_asset_module_name – the name of the module to dynamically load a DataAsset class

  • data_asset_class – a class to use. overrides data_asset_class_name/ data_asset_module_name if provided

  • *args

  • **kwargs

Returns:

great_expectations.util.gen_directory_tree_str(startpath)

Print the structure of directory as a tree:

Ex: project_dir0/

AAA/ BBB/

aaa.txt bbb.txt

#Note: files and directories are sorted alphabetically, so that this method can be used for testing.

great_expectations.util.lint_code(code: str) → str

Lint strings of code passed in. Optional dependency “black” must be installed.

great_expectations.util.convert_json_string_to_be_python_compliant(code: str) → str

Cleans JSON-formatted string to adhere to Python syntax

Substitute instances of ‘null’ with ‘None’ in string representations of Python dictionaries. Additionally, substitutes instances of ‘true’ or ‘false’ with their Python equivalents.

Parameters

code – JSON string to update

Returns

Clean, Python-compliant string

great_expectations.util._convert_nulls_to_None(code: str) → str
great_expectations.util._convert_json_bools_to_python_bools(code: str) → str
great_expectations.util.filter_properties_dict(properties: Optional[dict] = None, keep_fields: Optional[Set[str]] = None, delete_fields: Optional[Set[str]] = None, clean_nulls: bool = True, clean_falsy: bool = False, keep_falsy_numerics: bool = True, inplace: bool = False) → Optional[dict]

Filter the entries of the source dictionary according to directives concerning the existing keys and values.

Parameters
  • properties – source dictionary to be filtered according to the supplied filtering directives

  • keep_fields – list of keys that must be retained, with the understanding that all other entries will be deleted

  • delete_fields – list of keys that must be deleted, with the understanding that all other entries will be retained

  • clean_nulls – If True, then in addition to other filtering directives, delete entries, whose values are None

  • clean_falsy – If True, then in addition to other filtering directives, delete entries, whose values are Falsy

  • the "clean_falsy" argument is specified at "True", then "clean_nulls" is assumed to be "True" as well.) ((If) –

  • inplace – If True, then modify the source properties dictionary; otherwise, make a copy for filtering purposes

  • keep_falsy_numerics – If True, then in addition to other filtering directives, do not delete zero-valued numerics

Returns

The (possibly) filtered properties dictionary (or None if no entries remain after filtering is performed)

great_expectations.util.deep_filter_properties_iterable(properties: Optional[Union[dict, list, set, tuple]] = None, keep_fields: Optional[Set[str]] = None, delete_fields: Optional[Set[str]] = None, clean_nulls: bool = True, clean_falsy: bool = False, keep_falsy_numerics: bool = True, inplace: bool = False) → Optional[Union[dict, list, set]]
great_expectations.util.is_truthy(value: Any) → bool
great_expectations.util.is_numeric(value: Any) → bool
great_expectations.util.is_int(value: Any) → bool
great_expectations.util.is_float(value: Any) → bool
great_expectations.util.is_nan(value: Any) → bool

If value is an array, test element-wise for NaN and return result as a boolean array. If value is a scalar, return boolean. :param value: The value to test

Returns

The results of the test

great_expectations.util.is_parseable_date(value: Any, fuzzy: bool = False) → bool
great_expectations.util.get_context()
great_expectations.util.is_sane_slack_webhook(url: str) → bool

Really basic sanity checking.

great_expectations.util.is_list_of_strings(_list) → bool
great_expectations.util.generate_library_json_from_registered_expectations()

Generate the JSON object used to populate the public gallery

great_expectations.util.delete_blank_lines(text: str) → str
great_expectations.util.generate_temporary_table_name(default_table_name_prefix: str = 'ge_temp_', num_digits: int = 8) → str
great_expectations.util.get_sqlalchemy_inspector(engine)
great_expectations.util.get_sqlalchemy_url(drivername, **credentials)
great_expectations.util.get_sqlalchemy_selectable(selectable: Union[Table, Select]) → Union[Table, Select]

Beginning from SQLAlchemy 1.4, a select() can no longer be embedded inside of another select() directly, without explicitly turning the inner select() into a subquery first. This helper method ensures that this conversion takes place.

https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#change-4617

great_expectations.util.get_sqlalchemy_domain_data(domain_data)
great_expectations.util.import_make_url()

Beginning from SQLAlchemy 1.4, make_url is accessed from sqlalchemy.engine; earlier versions must still be accessed from sqlalchemy.engine.url to avoid import errors.