great_expectations

Subpackages

Package Contents

Classes

CloudMigrator(context: BaseDataContext, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None)

DataContext(context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_mode: bool = False, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None, ge_cloud_mode: bool = False, ge_cloud_base_url: Optional[str] = None, ge_cloud_access_token: Optional[str] = None, ge_cloud_organization_id: Optional[str] = None)

A DataContext represents a Great Expectations project. It is the primary entry point for a Great Expectations

Functions

get_versions()

Get version information or return default if unable to do so.

from_pandas(pandas_df, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

get_context(project_config: Optional[Union[‘DataContextConfig’, Mapping]] = None, context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None, cloud_mode: Optional[bool] = None, ge_cloud_base_url: Optional[str] = None, ge_cloud_access_token: Optional[str] = None, ge_cloud_organization_id: Optional[str] = None, ge_cloud_mode: Optional[bool] = None)

Method to return the appropriate DataContext depending on parameters and environment.

read_csv(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

read_excel(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

read_feather(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

read_json(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

read_parquet(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

read_pickle(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

read_sas(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_sas and return a great_expectations dataset.

read_table(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name=’great_expectations.dataset’, data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use

great_expectations.get_versions()

Get version information or return default if unable to do so.

great_expectations.__version__
class great_expectations.CloudMigrator(context: BaseDataContext, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None)
classmethod migrate(cls, context: BaseDataContext, test_migrate: bool, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None)

Migrate your Data Context to GX Cloud.

Parameters
  • context – The Data Context you wish to migrate.

  • test_migrate – True if this is a test, False if you want to perform the migration.

  • cloud_base_url – Optional, you may provide this alternatively via environment variable GX_CLOUD_BASE_URL

  • cloud_access_token – Optional, you may provide this alternatively via environment variable GX_CLOUD_ACCESS_TOKEN

  • cloud_organization_id – Optional, you may provide this alternatively via environment variable GX_CLOUD_ORGANIZATION_ID

Returns

CloudMigrator instance

retry_migrate_validation_results(self)
_migrate_to_cloud(self, test_migrate: bool)
_emit_log_stmts(self, configuration_bundle: ConfigurationBundle, test_migrate: bool)
_log_about_test_migrate(self)
_log_about_usage_stats_disabled(self)
_log_about_bundle_contains_datasources(self)
_print_configuration_bundle_summary(self, configuration_bundle: ConfigurationBundle)
_print_object_summary(self, obj_name: str, obj_collection: List[AbstractConfig])
_serialize_configuration_bundle(self, configuration_bundle: ConfigurationBundle)
_prepare_validation_results(self, serialized_bundle: dict)
_send_configuration_bundle(self, serialized_bundle: dict, test_migrate: bool)
_send_validation_results(self, serialized_validation_results: Dict[str, dict], test_migrate: bool)
_process_validation_results(self, serialized_validation_results: Dict[str, dict], test_migrate: bool)
_post_to_cloud_backend(self, resource_name: str, resource_type: str, attributes_key: str, attributes_value: dict)
_print_unsuccessful_validation_message(self)
_print_migration_introduction_message(self)
_print_migration_conclusion_message(self, test_migrate: bool)
class great_expectations.DataContext(context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_mode: bool = False, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None, ge_cloud_mode: bool = False, ge_cloud_base_url: Optional[str] = None, ge_cloud_access_token: Optional[str] = None, ge_cloud_organization_id: Optional[str] = None)

Bases: great_expectations.data_context.data_context.base_data_context.BaseDataContext

A DataContext represents a Great Expectations project. It is the primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.

The DataContext is configured via a yml file stored in a directory called great_expectations; this configuration file as well as managed Expectation Suites should be stored in version control. There are other ways to create a Data Context that may be better suited for your particular deployment e.g. ephemerally or backed by GX Cloud (coming soon). Please refer to our documentation for more details.

You can Validate data or generate Expectations using Execution Engines including:

  • SQL (multiple dialects supported)

  • Spark

  • Pandas

Your data can be stored in common locations including:

  • databases / data warehouses

  • files in s3, GCS, Azure, local storage

  • dataframes (spark and pandas) loaded into memory

Please see our documentation for examples on how to set up Great Expectations, connect to your data, create Expectations, and Validate data.

Other configuration options you can apply to a DataContext besides how to access data include things like where to store Expectations, Profilers, Checkpoints, Metrics, Validation Results and Data Docs and how those Stores are configured. Take a look at our documentation for more configuration options.

–Public API–

--Documentation--
classmethod create(cls, project_root_dir: Optional[str] = None, usage_statistics_enabled: bool = True, runtime_environment: Optional[dict] = None)

Build a new great_expectations directory and DataContext object in the provided project_root_dir.

create will create a new “great_expectations” directory in the provided folder, provided one does not already exist. Then, it will initialize a new DataContext in that folder and write the resulting config.

--Documentation--
Parameters
  • project_root_dir – path to the root directory in which to create a new great_expectations directory

  • usage_statistics_enabled – boolean directive specifying whether or not to gather usage statistics

  • runtime_environment – a dictionary of config variables that override both those set in config_variables.yml and the environment

Returns

DataContext

classmethod all_uncommitted_directories_exist(cls, ge_dir: str)

Check if all uncommitted directories exist.

classmethod config_variables_yml_exist(cls, ge_dir: str)

Check if all config_variables.yml exists.

classmethod write_config_variables_template_to_disk(cls, uncommitted_dir: str)
classmethod write_project_template_to_disk(cls, ge_dir: str, usage_statistics_enabled: bool = True)
classmethod scaffold_directories(cls, base_dir: str)

Safely create GX directories for a new project.

classmethod scaffold_custom_data_docs(cls, plugins_dir: str)

Copy custom data docs templates

_save_project_config(self)

See parent ‘AbstractDataContext._save_project_config()` for more information.

Explicitly override base class implementation to retain legacy behavior.

_attach_datasource_to_context(self, datasource: XDatasource)
property sources(self)
_init_cloud_config(self, cloud_mode: bool, cloud_base_url: Optional[str], cloud_access_token: Optional[str], cloud_organization_id: Optional[str])
_init_context_root_directory(self, context_root_dir: Optional[str])
_check_for_usage_stats_sync(self, project_config: DataContextConfig)

If there are differences between the DataContextConfig used to instantiate the DataContext and the DataContextConfig assigned to self.config, we want to save those changes to disk so that subsequent instantiations will utilize the same values.

A small caveat is that if that difference stems from a global override (env var or conf file), we don’t want to write to disk. This is due to the fact that those mechanisms allow for dynamic values and saving them will make them static.

Parameters

project_config – The DataContextConfig used to instantiate the DataContext.

Returns

A boolean signifying whether or not the current DataContext’s config needs to be persisted in order to recognize changes made to usage statistics.

_load_project_config(self)

Reads the project configuration from the project configuration file. The file may contain ${SOME_VARIABLE} variables - see self.project_config_with_variables_substituted for how these are substituted.

For Data Contexts in GX Cloud mode, a user-specific template is retrieved from the Cloud API - see CloudDataContext.retrieve_data_context_config_from_cloud for more details.

Returns

the configuration object read from the file or template

add_store(self, store_name, store_config)

Add a new Store to the DataContext and (for convenience) return the instantiated Store object.

Parameters
  • store_name (str) – a key for the new Store in in self._stores

  • store_config (dict) – a config for the Store to add

Returns

store (Store)

add_datasource(self, name: str, **kwargs: dict)

Add named datasource, with options to initialize (and return) the datasource and save_config.

Current version will call super(), which preserves the usage_statistics decorator in the current method. A subsequence refactor will migrate the usage_statistics to parent and sibling classes.

Parameters
  • name (str) – Name of Datasource

  • initialize (bool) – Should GX add and initialize the Datasource? If true then current method will return initialized Datasource

  • save_changes (Optional[bool]) – should GX save the Datasource config?

  • Optional[dict] (**kwargs) – Additional kwargs that define Datasource initialization kwargs

Returns

Datasource that was added

update_datasource(self, datasource: Union[LegacyDatasource, BaseDatasource])

See parent BaseDataContext.update_datasource for more details. Note that this method persists changes using an underlying Store.

delete_datasource(self, name: str)

Delete a data source :param datasource_name: The name of the datasource to delete. :param save_changes: Whether or not to save changes to disk.

Raises

ValueError – If the datasource name isn’t provided or cannot be found.

classmethod find_context_root_dir(cls)
classmethod get_ge_config_version(cls, context_root_dir: Optional[str] = None)
classmethod set_ge_config_version(cls, config_version: Union[int, float], context_root_dir: Optional[str] = None, validate_config_version: bool = True)
classmethod find_context_yml_file(cls, search_start_dir: Optional[str] = None)

Search for the yml file starting here and moving upward.

classmethod does_config_exist_on_disk(cls, context_root_dir: str)

Return True if the great_expectations.yml exists on disk.

classmethod is_project_initialized(cls, ge_dir: str)

Return True if the project is initialized.

To be considered initialized, all of the following must be true: - all project directories exist (including uncommitted directories) - a valid great_expectations.yml is on disk - a config_variables.yml is on disk - the project has at least one datasource - the project has at least one suite

classmethod does_project_have_a_datasource_in_config_file(cls, ge_dir: str)
classmethod _does_context_have_at_least_one_datasource(cls, ge_dir: str)
classmethod _does_context_have_at_least_one_suite(cls, ge_dir: str)
classmethod _attempt_context_instantiation(cls, ge_dir: str)
great_expectations.from_pandas(pandas_df, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

Parameters
  • pandas_df (Pandas df) – Pandas data frame

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (profiler class) – The profiler that should be run on the dataset to establish a baseline expectation suite.

Returns

great_expectations dataset

great_expectations.get_context(project_config: Optional[Union['DataContextConfig', Mapping]] = None, context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None, cloud_mode: Optional[bool] = None, ge_cloud_base_url: Optional[str] = None, ge_cloud_access_token: Optional[str] = None, ge_cloud_organization_id: Optional[str] = None, ge_cloud_mode: Optional[bool] = None) → Union['DataContext', 'BaseDataContext', 'CloudDataContext']

Method to return the appropriate DataContext depending on parameters and environment.

Usage:

import great_expectations as gx my_context = gx.get_context([parameters])

  1. If gx.get_context() is run in a filesystem where great_expectations init has been run, then it will return a

    DataContext

  2. If gx.get_context() is passed in a context_root_dir (which contains great_expectations.yml) then it will return

    a DataContext

  3. If gx.get_context() is passed in an in-memory project_config then it will return BaseDataContext.

    context_root_dir can also be passed in, but the configurations from the in-memory config will override the configurations in the great_expectations.yml file.

  4. If GX is being run in the cloud, and the information needed for ge_cloud_config (ie ge_cloud_base_url,

    ge_cloud_access_token, ge_cloud_organization_id) are passed in as parameters to get_context(), configured as environment variables, or in a .conf file, then get_context() will return a CloudDataContext.

get_context params

Env Not Config’d

Env Config’d

() (cloud_mode=True) (cloud_mode=False)

Local Exception! Local

Cloud Cloud Local

TODO: This method will eventually return FileDataContext and EphemeralDataContext, rather than DataContext and Base

Parameters
  • project_config (dict or DataContextConfig) – In-memory configuration for DataContext.

  • context_root_dir (str) – Path to directory that contains great_expectations.yml file

  • runtime_environment (dict) – A dictionary of values can be passed to a DataContext when it is instantiated. These values will override both values from the config variables file and from environment variables.

  • following parameters are relevant when running ge_cloud (The) –

  • cloud_base_url (str) – url for ge_cloud endpoint.

  • cloud_access_token (str) – access_token for ge_cloud account.

  • cloud_organization_id (str) – org_id for ge_cloud account.

  • cloud_mode (bool) – bool flag to specify whether to run GX in cloud mode (default is None).

Returns

DataContext. Either a DataContext, BaseDataContext, or CloudDataContext depending on environment and/or parameters

great_expectations.read_csv(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_excel(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset or ordered dict of great_expectations datasets, if multiple worksheets are imported

great_expectations.read_feather(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_json(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • accessor_func (Callable) – functions to transform the json object in the file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_parquet(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_pickle(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_sas(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_sas and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_table(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name='great_expectations.dataset', data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use to fetch an expectation_suite if one is not provided, and data_asset_class_name/data_asset_module_name or data_asset_class to use to provide custom expectations.

Parameters
  • data_asset – the asset to validate

  • expectation_suite – the suite to use, or None to fetch one using a DataContext

  • data_asset_name – the name of the data asset to use

  • expectation_suite_name – the name of the expectation_suite to use

  • data_context – data context to use to fetch an an expectation suite, or the path from which to obtain one

  • data_asset_class_name – the name of a class to dynamically load a DataAsset class

  • data_asset_module_name – the name of the module to dynamically load a DataAsset class

  • data_asset_class – a class to use. overrides data_asset_class_name/ data_asset_module_name if provided

  • *args

  • **kwargs

Returns:

great_expectations.rtd_url_ge_version