great_expectations.data_context
¶
Subpackages¶
great_expectations.data_context.config_validator
great_expectations.data_context.data_context
great_expectations.data_context.data_context.abstract_data_context
great_expectations.data_context.data_context.base_data_context
great_expectations.data_context.data_context.cloud_data_context
great_expectations.data_context.data_context.data_context
great_expectations.data_context.data_context.ephemeral_data_context
great_expectations.data_context.data_context.explorer_data_context
great_expectations.data_context.data_context.file_data_context
great_expectations.data_context.migrator
great_expectations.data_context.store
great_expectations.data_context.store._store_backend
great_expectations.data_context.store.checkpoint_store
great_expectations.data_context.store.configuration_store
great_expectations.data_context.store.data_context_store
great_expectations.data_context.store.database_store_backend
great_expectations.data_context.store.datasource_store
great_expectations.data_context.store.expectations_store
great_expectations.data_context.store.ge_cloud_store_backend
great_expectations.data_context.store.gx_cloud_store_backend
great_expectations.data_context.store.html_site_store
great_expectations.data_context.store.in_memory_store_backend
great_expectations.data_context.store.inline_store_backend
great_expectations.data_context.store.json_site_store
great_expectations.data_context.store.metric_store
great_expectations.data_context.store.profiler_store
great_expectations.data_context.store.query_store
great_expectations.data_context.store.store
great_expectations.data_context.store.store_backend
great_expectations.data_context.store.tuple_store_backend
great_expectations.data_context.store.validations_store
great_expectations.data_context.types
Submodules¶
Package Contents¶
Classes¶
|
Base class for all DataContexts that contain all context-agnostic data context operations. |
|
This class implements most of the functionality of DataContext, with a few exceptions. |
|
Subclass of AbstractDataContext that contains functionality necessary to hydrate state from cloud |
|
A DataContext represents a Great Expectations project. It is the primary entry point for a Great Expectations |
|
Will contain functionality to create DataContext at runtime (ie. passed in config object or from stores). Users will |
|
A DataContext represents a Great Expectations project. It is the primary entry point for a Great Expectations |
|
Extends AbstractDataContext, contains only functionality necessary to hydrate state from disk. |
-
class
great_expectations.data_context.
AbstractDataContext
(runtime_environment: Optional[dict] = None)¶ Bases:
abc.ABC
Base class for all DataContexts that contain all context-agnostic data context operations.
The class encapsulates most store / core components and convenience methods used to access them, meaning the majority of DataContext functionality lives here.
-
FALSEY_STRINGS
= ['FALSE', 'false', 'False', 'f', 'F', '0']¶
-
GLOBAL_CONFIG_PATHS
¶
-
DOLLAR_SIGN_ESCAPE_STRING
= \$¶
-
MIGRATION_WEBSITE
:str = https://docs.greatexpectations.io/docs/guides/miscellaneous/migration_guide#migrating-to-the-batch-request-v3-api¶
-
PROFILING_ERROR_CODE_TOO_MANY_DATA_ASSETS
= 2¶
-
PROFILING_ERROR_CODE_SPECIFIED_DATA_ASSETS_NOT_FOUND
= 3¶
-
PROFILING_ERROR_CODE_NO_BATCH_KWARGS_GENERATORS_FOUND
= 4¶
-
PROFILING_ERROR_CODE_MULTIPLE_BATCH_KWARGS_GENERATORS_FOUND
= 5¶
-
_init_config_provider
(self)¶
-
_register_providers
(self, config_provider: _ConfigurationProvider)¶ Registers any relevant ConfigurationProvider instances to self._config_provider.
Note that order matters here - if there is a namespace collision, later providers will overwrite the values derived from previous ones. The order of precedence is as follows:
Config variables
Environment variables
Runtime environment
-
abstract
_init_variables
(self)¶
-
_save_project_config
(self)¶ Each DataContext will define how its project_config will be saved through its internal ‘variables’. - FileDataContext : Filesystem. - CloudDataContext : Cloud endpoint - Ephemeral : not saved, and logging message outputted
-
save_expectation_suite
(self, expectation_suite: ExpectationSuite, expectation_suite_name: Optional[str] = None, overwrite_existing: bool = True, include_rendered_content: Optional[bool] = None, **kwargs: Optional[dict])¶ Each DataContext will define how ExpectationSuite will be saved.
-
property
instance_id
(self)¶
-
property
config_variables
(self)¶ Loads config variables into cache, by calling _load_config_variables()
Returns: A dictionary containing config_variables from file or empty dictionary.
-
property
config
(self)¶ Returns current DataContext’s project_config
-
property
config_provider
(self)¶
-
property
root_directory
(self)¶ The root directory for configuration objects in the data context; the location in which
great_expectations.yml
is located.
-
property
project_config_with_variables_substituted
(self)¶
-
property
plugins_directory
(self)¶ The directory in which custom plugin modules should be placed.
-
property
stores
(self)¶ A single holder for all Stores in this context
-
property
expectations_store_name
(self)¶
-
property
expectations_store
(self)¶
-
property
evaluation_parameter_store_name
(self)¶
-
property
evaluation_parameter_store
(self)¶
-
property
validations_store_name
(self)¶
-
property
validations_store
(self)¶
-
property
checkpoint_store_name
(self)¶
-
property
checkpoint_store
(self)¶
-
property
profiler_store_name
(self)¶
-
property
profiler_store
(self)¶
-
property
concurrency
(self)¶
-
property
assistants
(self)¶
-
set_config
(self, project_config: DataContextConfig)¶
-
save_datasource
(self, datasource: Union[LegacyDatasource, BaseDatasource])¶ Save a Datasource to the configured DatasourceStore.
Stores the underlying DatasourceConfig in the store and Data Context config, updates the cached Datasource and returns the Datasource. The cached and returned Datasource is re-constructed from the config that was stored as some store implementations make edits to the stored config (e.g. adding identifiers).
- Parameters
datasource – Datasource to store.
- Returns
The datasource, after storing and retrieving the stored config.
-
add_datasource
(self, name: str, initialize: bool = True, save_changes: Optional[bool] = None, **kwargs: Optional[dict])¶ Add a new datasource to the data context, with configuration provided as kwargs. :param name: the name for the new datasource to add :param initialize: if False, add the datasource to the config, but do not
initialize it, for example if a user needs to debug database connectivity.
- Parameters
save_changes (bool) – should GX save the Datasource config?
kwargs (keyword arguments) – the configuration for the new datasource
- Returns
datasource (Datasource)
-
update_datasource
(self, datasource: Union[LegacyDatasource, BaseDatasource], save_changes: Optional[bool] = None)¶ Updates a DatasourceConfig that already exists in the store.
- Parameters
datasource_config – The config object to persist using the DatasourceStore.
save_changes – do I save changes to disk?
-
get_site_names
(self)¶ Get a list of configured site names.
-
get_config_with_variables_substituted
(self, config: Optional[DataContextConfig] = None)¶ Substitute vars in config of form ${var} or $(var) with values found in the following places, in order of precedence: ge_cloud_config (for Data Contexts in GX Cloud mode), runtime_environment, environment variables, config_variables, or ge_cloud_config_variable_defaults (allows certain variables to be optional in GX Cloud mode).
-
get_batch
(self, arg1: Any = None, arg2: Any = None, arg3: Any = None, **kwargs)¶ Get exactly one batch, based on a variety of flexible input types. The method get_batch is the main user-facing method for getting batches; it supports both the new (V3) and the Legacy (V2) Datasource schemas. The version-specific implementations are contained in “_get_batch_v2()” and “_get_batch_v3()”, respectively, both of which are in the present module.
For the V3 API parameters, please refer to the signature and parameter description of method “_get_batch_v3()”. For the Legacy usage, please refer to the signature and parameter description of the method “_get_batch_v2()”.
- Parameters
arg1 – the first positional argument (can take on various types)
arg2 – the second positional argument (can take on various types)
arg3 – the third positional argument (can take on various types)
**kwargs – variable arguments
- Returns
Batch (V3) or DataAsset (V2) – the requested batch
Processing Steps: 1. Determine the version (possible values are “v3” or “v2”). 2. Convert the positional arguments to the appropriate named arguments, based on the version. 3. Package the remaining arguments as variable keyword arguments (applies only to V3). 4. Call the version-specific method (“_get_batch_v3()” or “_get_batch_v2()”) with the appropriate arguments.
-
_get_data_context_version
(self, arg1: Any, **kwargs)¶ arg1: the first positional argument (can take on various types)
**kwargs: variable arguments
First check: Returns “v3” if the “0.13” entities are specified in the **kwargs.
Otherwise: Returns None if no datasources have been configured (or if there is an exception while getting the datasource). Returns “v3” if the datasource is a subclass of the BaseDatasource class. Returns “v2” if the datasource is an instance of the LegacyDatasource class.
-
_get_batch_v2
(self, batch_kwargs: Union[dict, BatchKwargs], expectation_suite_name: Union[str, ExpectationSuite], data_asset_type=None, batch_parameters=None)¶ Build a batch of data using batch_kwargs, and return a DataAsset with expectation_suite_name attached. If batch_parameters are included, they will be available as attributes of the batch. :param batch_kwargs: the batch_kwargs to use; must include a datasource key :param expectation_suite_name: The ExpectationSuite or the name of the expectation_suite to get :param data_asset_type: the type of data_asset to build, with associated expectation implementations. This can
generally be inferred from the datasource.
- Parameters
batch_parameters – optional parameters to store as the reference description of the batch. They should reflect parameters that would provide the passed BatchKwargs.
- Returns
DataAsset
-
_get_batch_v3
(self, datasource_name: Optional[str] = None, data_connector_name: Optional[str] = None, data_asset_name: Optional[str] = None, *, batch_request: Optional[BatchRequestBase] = None, batch_data: Optional[Any] = None, data_connector_query: Optional[Union[IDDict, dict]] = None, batch_identifiers: Optional[dict] = None, limit: Optional[int] = None, index: Optional[Union[int, list, tuple, slice, str]] = None, custom_filter_function: Optional[Callable] = None, batch_spec_passthrough: Optional[dict] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, splitter_method: Optional[str] = None, splitter_kwargs: Optional[dict] = None, runtime_parameters: Optional[dict] = None, query: Optional[str] = None, path: Optional[str] = None, batch_filter_parameters: Optional[dict] = None, **kwargs)¶ Get exactly one batch, based on a variety of flexible input types.
- Parameters
datasource_name –
data_connector_name –
data_asset_name –
batch_request –
batch_data –
data_connector_query –
batch_identifiers –
batch_filter_parameters –
limit –
index –
custom_filter_function –
batch_spec_passthrough –
sampling_method –
sampling_kwargs –
splitter_method –
splitter_kwargs –
**kwargs –
- Returns
(Batch) The requested batch
This method does not require typed or nested inputs. Instead, it is intended to help the user pick the right parameters.
This method attempts to return exactly one batch. If 0 or more than 1 batches would be returned, it raises an error.
-
list_stores
(self)¶ List currently-configured Stores on this context
-
list_active_stores
(self)¶ List active Stores on this context. Active stores are identified by setting the following parameters: expectations_store_name, validations_store_name, evaluation_parameter_store_name, checkpoint_store_name profiler_store_name
-
list_checkpoints
(self)¶
-
list_profilers
(self)¶
-
save_profiler
(self, profiler: RuleBasedProfiler)¶
-
_determine_key_for_profiler_save
(self, name: str, id: Optional[str])¶
-
get_datasource
(self, datasource_name: str = 'default')¶ Get the named datasource
- Parameters
datasource_name (str) – the name of the datasource from the configuration
- Returns
datasource (Datasource)
-
_serialize_substitute_and_sanitize_datasource_config
(self, serializer: AbstractConfigSerializer, datasource_config: DatasourceConfig)¶ Serialize, then make substitutions and sanitize config (mask passwords), return as dict.
- Parameters
serializer – Serializer to use when converting config to dict for substitutions.
datasource_config – Datasource config to process.
- Returns
Dict of config with substitutions and sanitizations applied.
-
add_store
(self, store_name: str, store_config: dict)¶ Add a new Store to the DataContext and (for convenience) return the instantiated Store object.
- Parameters
store_name (str) – a key for the new Store in in self._stores
store_config (dict) – a config for the Store to add
- Returns
store (Store)
-
list_datasources
(self)¶ List currently-configured datasources on this context. Masks passwords.
- Returns
each dictionary includes “name”, “class_name”, and “module_name” keys
- Return type
List(dict)
-
delete_datasource
(self, datasource_name: Optional[str], save_changes: Optional[bool] = None)¶ Delete a datasource :param datasource_name: The name of the datasource to delete.
- Raises
ValueError – If the datasource name isn’t provided or cannot be found.
-
add_checkpoint
(self, name: str, config_version: Optional[Union[int, float]] = None, template_name: Optional[str] = None, module_name: Optional[str] = None, class_name: Optional[str] = None, run_name_template: Optional[str] = None, expectation_suite_name: Optional[str] = None, batch_request: Optional[dict] = None, action_list: Optional[List[dict]] = None, evaluation_parameters: Optional[dict] = None, runtime_configuration: Optional[dict] = None, validations: Optional[List[dict]] = None, profilers: Optional[List[dict]] = None, validation_operator_name: Optional[str] = None, batches: Optional[List[dict]] = None, site_names: Optional[Union[str, List[str]]] = None, slack_webhook: Optional[str] = None, notify_on: Optional[str] = None, notify_with: Optional[Union[str, List[str]]] = None, ge_cloud_id: Optional[str] = None, expectation_suite_ge_cloud_id: Optional[str] = None, default_validation_id: Optional[str] = None)¶
-
get_checkpoint
(self, name: Optional[str] = None, ge_cloud_id: Optional[str] = None)¶
-
delete_checkpoint
(self, name: Optional[str] = None, ge_cloud_id: Optional[str] = None)¶
-
run_checkpoint
(self, checkpoint_name: Optional[str] = None, ge_cloud_id: Optional[str] = None, template_name: Optional[str] = None, run_name_template: Optional[str] = None, expectation_suite_name: Optional[str] = None, batch_request: Optional[BatchRequestBase] = None, action_list: Optional[List[dict]] = None, evaluation_parameters: Optional[dict] = None, runtime_configuration: Optional[dict] = None, validations: Optional[List[dict]] = None, profilers: Optional[List[dict]] = None, run_id: Optional[Union[str, int, float]] = None, run_name: Optional[str] = None, run_time: Optional[datetime.datetime] = None, result_format: Optional[str] = None, expectation_suite_ge_cloud_id: Optional[str] = None, **kwargs)¶ Validate against a pre-defined Checkpoint. (Experimental)
- Parameters
checkpoint_name – The name of a Checkpoint defined via the CLI or by manually creating a yml file
template_name – The name of a Checkpoint template to retrieve from the CheckpointStore
run_name_template – The template to use for run_name
expectation_suite_name – Expectation suite to be used by Checkpoint run
batch_request – Batch request to be used by Checkpoint run
action_list – List of actions to be performed by the Checkpoint
evaluation_parameters – $parameter_name syntax references to be evaluated at runtime
runtime_configuration – Runtime configuration override parameters
validations – Validations to be performed by the Checkpoint run
profilers – Profilers to be used by the Checkpoint run
run_id – The run_id for the validation; if None, a default value will be used
run_name – The run_name for the validation; if None, a default value will be used
run_time – The date/time of the run
result_format – One of several supported formatting directives for expectation validation results
ge_cloud_id – Great Expectations Cloud id for the checkpoint
expectation_suite_ge_cloud_id – Great Expectations Cloud id for the expectation suite
**kwargs – Additional kwargs to pass to the validation operator
- Returns
CheckpointResult
-
store_evaluation_parameters
(self, validation_results, target_store_name=None)¶ Stores ValidationResult EvaluationParameters to defined store
-
list_expectation_suite_names
(self)¶ Lists the available expectation suite names.
-
list_expectation_suites
(self)¶ Return a list of available expectation suite keys.
-
get_validator
(self, datasource_name: Optional[str] = None, data_connector_name: Optional[str] = None, data_asset_name: Optional[str] = None, batch: Optional[Batch] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[BatchRequestBase] = None, batch_request_list: Optional[List[BatchRequestBase]] = None, batch_data: Optional[Any] = None, data_connector_query: Optional[Union[IDDict, dict]] = None, batch_identifiers: Optional[dict] = None, limit: Optional[int] = None, index: Optional[Union[int, list, tuple, slice, str]] = None, custom_filter_function: Optional[Callable] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, splitter_method: Optional[str] = None, splitter_kwargs: Optional[dict] = None, runtime_parameters: Optional[dict] = None, query: Optional[str] = None, path: Optional[str] = None, batch_filter_parameters: Optional[dict] = None, expectation_suite_ge_cloud_id: Optional[str] = None, batch_spec_passthrough: Optional[dict] = None, expectation_suite_name: Optional[str] = None, expectation_suite: Optional[ExpectationSuite] = None, create_expectation_suite_with_name: Optional[str] = None, include_rendered_content: Optional[bool] = None, **kwargs: Optional[dict])¶ This method applies only to the new (V3) Datasource schema.
-
get_validator_using_batch_list
(self, expectation_suite: ExpectationSuite, batch_list: Sequence[Union[Batch, XBatch]], include_rendered_content: Optional[bool] = None, **kwargs: Optional[dict])¶ - Parameters
() (**kwargs) –
() –
() –
() –
Returns:
-
get_batch_list
(self, datasource_name: Optional[str] = None, data_connector_name: Optional[str] = None, data_asset_name: Optional[str] = None, batch_request: Optional[BatchRequestBase] = None, batch_data: Optional[Any] = None, data_connector_query: Optional[dict] = None, batch_identifiers: Optional[dict] = None, limit: Optional[int] = None, index: Optional[Union[int, list, tuple, slice, str]] = None, custom_filter_function: Optional[Callable] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, splitter_method: Optional[str] = None, splitter_kwargs: Optional[dict] = None, runtime_parameters: Optional[dict] = None, query: Optional[str] = None, path: Optional[str] = None, batch_filter_parameters: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, **kwargs: Optional[dict])¶ Get the list of zero or more batches, based on a variety of flexible input types. This method applies only to the new (V3) Datasource schema.
- Parameters
batch_request –
datasource_name –
data_connector_name –
data_asset_name –
batch_request –
batch_data –
query –
path –
runtime_parameters –
data_connector_query –
batch_identifiers –
batch_filter_parameters –
limit –
index –
custom_filter_function –
sampling_method –
sampling_kwargs –
splitter_method –
splitter_kwargs –
batch_spec_passthrough –
**kwargs –
- Returns
(Batch) The requested batch
get_batch is the main user-facing API for getting batches. In contrast to virtually all other methods in the class, it does not require typed or nested inputs. Instead, this method is intended to help the user pick the right parameters
This method attempts to return any number of batches, including an empty list.
-
create_expectation_suite
(self, expectation_suite_name: str, overwrite_existing: bool = False, **kwargs: Optional[dict])¶ Build a new expectation suite and save it into the data_context expectation store.
- Parameters
expectation_suite_name – The name of the expectation_suite to create
overwrite_existing (boolean) – Whether to overwrite expectation suite if expectation suite with given name already exists.
- Returns
A new (empty) expectation suite.
-
delete_expectation_suite
(self, expectation_suite_name: Optional[str] = None, ge_cloud_id: Optional[str] = None)¶ Delete specified expectation suite from data_context expectation store.
- Parameters
expectation_suite_name – The name of the expectation_suite to create
- Returns
True for Success and False for Failure.
-
get_expectation_suite
(self, expectation_suite_name: Optional[str] = None, include_rendered_content: Optional[bool] = None, ge_cloud_id: Optional[str] = None)¶ Get an Expectation Suite by name or GX Cloud ID :param expectation_suite_name: The name of the Expectation Suite :type expectation_suite_name: str :param include_rendered_content: Whether or not to re-populate rendered_content for each
ExpectationConfiguration.
- Parameters
ge_cloud_id (str) – The GX Cloud ID for the Expectation Suite.
- Returns
An existing ExpectationSuite
-
add_profiler
(self, name: str, config_version: float, rules: Dict[str, dict], variables: Optional[dict] = None)¶
-
get_profiler
(self, name: Optional[str] = None, ge_cloud_id: Optional[str] = None)¶
-
delete_profiler
(self, name: Optional[str] = None, ge_cloud_id: Optional[str] = None)¶
-
run_profiler_with_dynamic_arguments
(self, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequestBase, dict]] = None, name: Optional[str] = None, ge_cloud_id: Optional[str] = None, variables: Optional[dict] = None, rules: Optional[dict] = None)¶ Retrieve a RuleBasedProfiler from a ProfilerStore and run it with rules/variables supplied at runtime.
- Parameters
batch_list – Explicit list of Batch objects to supply data at runtime
batch_request – Explicit batch_request used to supply data at runtime
name – Identifier used to retrieve the profiler from a store.
ge_cloud_id – Identifier used to retrieve the profiler from a store (GX Cloud specific).
variables – Attribute name/value pairs (overrides)
rules – Key-value pairs of name/configuration-dictionary (overrides)
- Returns
Set of rule evaluation results in the form of an RuleBasedProfilerResult
- Raises
AssertionError if both a name and ge_cloud_id are provided. –
AssertionError if both an expectation_suite and expectation_suite_name are provided. –
-
run_profiler_on_data
(self, batch_list: Optional[List[Batch]] = None, batch_request: Optional[BatchRequestBase] = None, name: Optional[str] = None, ge_cloud_id: Optional[str] = None)¶ Retrieve a RuleBasedProfiler from a ProfilerStore and run it with a batch request supplied at runtime.
- Parameters
batch_list – Explicit list of Batch objects to supply data at runtime.
batch_request – Explicit batch_request used to supply data at runtime.
name – Identifier used to retrieve the profiler from a store.
ge_cloud_id – Identifier used to retrieve the profiler from a store (GX Cloud specific).
- Returns
Set of rule evaluation results in the form of an RuleBasedProfilerResult
- Raises
ProfilerConfigurationError is both "batch_list" and "batch_request" arguments are specified. –
AssertionError if both a name and ge_cloud_id are provided. –
AssertionError if both an expectation_suite and expectation_suite_name are provided. –
-
add_validation_operator
(self, validation_operator_name: str, validation_operator_config: dict)¶ Add a new ValidationOperator to the DataContext and (for convenience) return the instantiated object.
- Parameters
validation_operator_name (str) – a key for the new ValidationOperator in in self._validation_operators
validation_operator_config (dict) – a config for the ValidationOperator to add
- Returns
validation_operator (ValidationOperator)
-
run_validation_operator
(self, validation_operator_name: str, assets_to_validate: List, run_id: Optional[Union[str, RunIdentifier]] = None, evaluation_parameters: Optional[dict] = None, run_name: Optional[str] = None, run_time: Optional[Union[str, datetime.datetime]] = None, result_format: Optional[Union[str, dict]] = None, **kwargs)¶ Run a validation operator to validate data assets and to perform the business logic around validation that the operator implements.
- Parameters
validation_operator_name – name of the operator, as appears in the context’s config file
assets_to_validate – a list that specifies the data assets that the operator will validate. The members of the list can be either batches, or a tuple that will allow the operator to fetch the batch: (batch_kwargs, expectation_suite_name)
evaluation_parameters – $parameter_name syntax references to be evaluated at runtime
run_id – The run_id for the validation; if None, a default value will be used
run_name – The run_name for the validation; if None, a default value will be used
run_time – The date/time of the run
result_format – one of several supported formatting directives for expectation validation results
**kwargs – Additional kwargs to pass to the validation operator
- Returns
ValidationOperatorResult
-
list_validation_operators
(self)¶ List currently-configured Validation Operators on this context
-
list_validation_operator_names
(self)¶
-
profile_data_asset
(self, datasource_name, batch_kwargs_generator_name=None, data_asset_name=None, batch_kwargs=None, expectation_suite_name=None, profiler=BasicDatasetProfiler, profiler_configuration=None, run_id=None, additional_batch_kwargs=None, run_name=None, run_time=None)¶ Profile a data asset
- Parameters
datasource_name – the name of the datasource to which the profiled data asset belongs
batch_kwargs_generator_name – the name of the batch kwargs generator to use to get batches (only if batch_kwargs are not provided)
data_asset_name – the name of the profiled data asset
batch_kwargs – optional - if set, the method will use the value to fetch the batch to be profiled. If not passed, the batch kwargs generator (generator_name arg) will choose a batch
profiler – the profiler class to use
profiler_configuration – Optional profiler configuration dict
run_name – optional - if set, the validation result created by the profiler will be under the provided run_name
additional_batch_kwargs –
- :returns
A dictionary:
{ "success": True/False, "results": List of (expectation_suite, EVR) tuples for each of the data_assets found in the datasource }
When success = False, the error details are under “error” key
-
add_batch_kwargs_generator
(self, datasource_name, batch_kwargs_generator_name, class_name, **kwargs)¶ Add a batch kwargs generator to the named datasource, using the provided configuration.
- Parameters
datasource_name – name of datasource to which to add the new batch kwargs generator
batch_kwargs_generator_name – name of the generator to add
class_name – class of the batch kwargs generator to add
**kwargs – batch kwargs generator configuration, provided as kwargs
Returns:
-
get_available_data_asset_names
(self, datasource_names=None, batch_kwargs_generator_names=None)¶ Inspect datasource and batch kwargs generators to provide available data_asset objects.
- Parameters
datasource_names – list of datasources for which to provide available data_asset_name objects. If None, return available data assets for all datasources.
batch_kwargs_generator_names – list of batch kwargs generators for which to provide available
objects. (data_asset_name) –
- Returns
Dictionary describing available data assets
{ datasource_name: { batch_kwargs_generator_name: [ data_asset_1, data_asset_2, ... ] ... } ... }
- Return type
data_asset_names (dict)
-
build_batch_kwargs
(self, datasource, batch_kwargs_generator, data_asset_name=None, partition_id=None, **kwargs)¶ Builds batch kwargs using the provided datasource, batch kwargs generator, and batch_parameters.
- Parameters
datasource (str) – the name of the datasource for which to build batch_kwargs
batch_kwargs_generator (str) – the name of the batch kwargs generator to use to build batch_kwargs
data_asset_name (str) – an optional name batch_parameter
**kwargs – additional batch_parameters
- Returns
BatchKwargs
-
open_data_docs
(self, resource_identifier: Optional[str] = None, site_name: Optional[str] = None, only_if_exists: bool = True)¶ A stdlib cross-platform way to open a file in a browser.
- Parameters
resource_identifier – ExpectationSuiteIdentifier, ValidationResultIdentifier or any other type’s identifier. The argument is optional - when not supplied, the method returns the URL of the index page.
site_name – Optionally specify which site to open. If not specified, open all docs found in the project.
only_if_exists – Optionally specify flag to pass to “self.get_docs_sites_urls()”.
-
get_docs_sites_urls
(self, resource_identifier=None, site_name: Optional[str] = None, only_if_exists=True, site_names: Optional[List[str]] = None)¶ Get URLs for a resource for all data docs sites.
This function will return URLs for any configured site even if the sites have not been built yet.
- Parameters
resource_identifier (object) – optional. It can be an identifier of ExpectationSuite’s, ValidationResults and other resources that have typed identifiers. If not provided, the method will return the URLs of the index page.
site_name – Optionally specify which site to open. If not specified, return all urls in the project.
site_names – Optionally specify which sites are active. Sites not in this list are not processed, even if specified in site_name.
- Returns
- a list of URLs. Each item is the URL for the resource for a
data docs site
- Return type
list
-
_load_site_builder_from_site_config
(self, site_config)¶
-
clean_data_docs
(self, site_name=None)¶ Clean a given data docs site.
This removes all files from the configured Store.
- Parameters
site_name (str) – Optional, the name of the site to clean. If not
all sites will be cleaned. (specified,) –
-
_clean_data_docs_site
(self, site_name: str)¶
-
static
_default_profilers_exist
(directory_path: Optional[str])¶ Helper method. Do default profilers exist in directory_path?
-
static
_get_global_config_value
(environment_variable: str, conf_file_section: Optional[str] = None, conf_file_option: Optional[str] = None)¶ Method to retrieve config value. Looks for config value in environment_variable and config file section
- Parameters
environment_variable (str) – name of environment_variable to retrieve
conf_file_section (str) – section of config
conf_file_option (str) – key in section
- Returns
Optional string representing config value
-
static
_get_metric_configuration_tuples
(metric_configuration: Union[str, dict], base_kwargs: Optional[dict] = None)¶
-
classmethod
get_or_create_data_context_config
(cls, project_config: Union[DataContextConfig, Mapping])¶
-
_normalize_absolute_or_relative_path
(self, path: Optional[str])¶ Why does this exist in AbstractDataContext? CloudDataContext and FileDataContext both use it
-
_apply_global_config_overrides
(self, config: DataContextConfig)¶ - Applies global configuration overrides for
usage_statistics being enabled
data_context_id for usage_statistics
global_usage_statistics_url
- Parameters
config (DataContextConfig) – Config that is passed into the DataContext constructor
- Returns
DataContextConfig with the appropriate overrides
-
_load_config_variables
(self)¶
-
static
_is_usage_stats_enabled
()¶ - Checks the following locations to see if usage_statistics is disabled in any of the following locations:
GE_USAGE_STATS, which is an environment_variable
GLOBAL_CONFIG_PATHS
If GE_USAGE_STATS exists AND its value is one of the FALSEY_STRINGS, usage_statistics is disabled (return False) Also checks GLOBAL_CONFIG_PATHS to see if config file contains override for anonymous_usage_statistics Returns True otherwise
- Returns
bool that tells you whether usage_statistics is on or off
-
_get_data_context_id_override
(self)¶ Return data_context_id from environment variable.
- Returns
Optional string that represents data_context_id
-
_get_usage_stats_url_override
(self)¶ Return GE_USAGE_STATISTICS_URL from environment variable if it exists
- Returns
Optional string that represents GE_USAGE_STATISTICS_URL
-
_build_store_from_config
(self, store_name: str, store_config: dict)¶
-
property
variables
(self)¶
-
property
usage_statistics_handler
(self)¶
-
property
anonymous_usage_statistics
(self)¶
-
property
progress_bars
(self)¶
-
property
include_rendered_content
(self)¶
-
property
notebooks
(self)¶
-
property
datasources
(self)¶ A single holder for all Datasources in this context
-
property
data_context_id
(self)¶
-
_init_stores
(self, store_configs: Dict[str, dict])¶ Initialize all Stores for this DataContext.
- Stores are a good fit for reading/writing objects that:
follow a clear key-value pattern, and
are usually edited programmatically, using the Context
Note that stores do NOT manage plugins.
-
abstract
_init_datasource_store
(self)¶ Internal utility responsible for creating a DatasourceStore to persist and manage a user’s Datasources.
Please note that the DatasourceStore lacks the same extensibility that other analagous Stores do; a default implementation is provided based on the user’s environment but is not customizable.
-
_update_config_variables
(self)¶ Updates config_variables cache by re-calling _load_config_variables(). Necessary after running methods that modify config AND could contain config_variables for credentials (example is add_datasource())
-
_initialize_usage_statistics
(self, usage_statistics_config: AnonymizedUsageStatisticsConfig)¶ Initialize the usage statistics system.
-
_init_datasources
(self)¶ Initialize the datasources in store
-
_instantiate_datasource_from_config
(self, raw_config: DatasourceConfig, substituted_config: DatasourceConfig)¶ Instantiate a new datasource. :param config: Datasource config.
- Returns
Datasource instantiated from config.
- Raises
-
_build_datasource_from_config
(self, raw_config: DatasourceConfig, substituted_config: DatasourceConfig)¶ Instantiate a Datasource from a config.
- Parameters
config – DatasourceConfig object defining the datsource to instantiate.
- Returns
Datasource instantiated from config.
- Raises
-
_perform_substitutions_on_datasource_config
(self, config: DatasourceConfig)¶ Substitute variables in a datasource config e.g. from env vars, config_vars.yml
Config must be persisted with ${VARIABLES} syntax but hydrated at time of use.
- Parameters
config – Datasource Config
- Returns
Datasource Config with substitutions performed.
-
_instantiate_datasource_from_config_and_update_project_config
(self, config: DatasourceConfig, initialize: bool, save_changes: bool)¶ Perform substitutions and optionally initialize the Datasource and/or store the config.
- Parameters
config – Datasource Config to initialize and/or store.
initialize – Whether to initialize the datasource, alternatively you can store without initializing.
save_changes – Whether to store the configuration in your configuration store (GX cloud or great_expectations.yml)
- Returns
Datasource object if initialized.
- Raises
-
_construct_data_context_id
(self)¶
-
_compile_evaluation_parameter_dependencies
(self)¶
-
get_validation_result
(self, expectation_suite_name, run_id=None, batch_identifier=None, validations_store_name=None, failed_only=False, include_rendered_content=None)¶ Get validation results from a configured store.
- Parameters
expectation_suite_name – expectation_suite name for which to get validation result (default: “default”)
run_id – run_id for which to get validation result (if None, fetch the latest result by alphanumeric sort)
validations_store_name – the name of the store from which to get validation results
failed_only – if True, filter the result to return only failed expectations
include_rendered_content – whether to re-populate the validation_result rendered_content
- Returns
validation_result
-
store_validation_result_metrics
(self, requested_metrics, validation_results, target_store_name)¶
-
_store_metrics
(self, requested_metrics, validation_results, target_store_name)¶ requested_metrics is a dictionary like this:
- requested_metrics:
- : The asterisk here matches *any expectation suite name
use the ‘kwargs’ key to request metrics that are defined by kwargs, for example because they are defined only for a particular column - column:
- Age:
expect_column_min_to_be_between.result.observed_value
statistics.evaluated_expectations
statistics.successful_expectations
-
send_usage_message
(self, event: str, event_payload: Optional[dict], success: Optional[bool] = None)¶ - helper method to send a usage method using DataContext. Used when sending usage events from
classes like ExpectationSuite. event
- Parameters
event (str) – str representation of event
event_payload (dict) – optional event payload
success (bool) – optional success param
- Returns
None
-
_determine_if_expectation_suite_include_rendered_content
(self, include_rendered_content: Optional[bool] = None)¶
-
_determine_if_expectation_validation_result_include_rendered_content
(self, include_rendered_content: Optional[bool] = None)¶
-
static
_determine_save_changes_flag
(save_changes: Optional[bool])¶ This method is meant to enable the gradual deprecation of the save_changes boolean flag on various Datasource CRUD methods. Moving forward, we will always persist changes made by these CRUD methods (a.k.a. the behavior created by save_changes=True).
As part of this effort, save_changes has been set to None as a default value and will be automatically converted to True within this method. If a user passes in a boolean value (thereby bypassing the default arg of None), a deprecation warning will be raised.
-
test_yaml_config
(self, yaml_config: str, name: Optional[str] = None, class_name: Optional[str] = None, runtime_environment: Optional[dict] = None, pretty_print: bool = True, return_mode: Literal['instantiated_class', 'report_object'] = 'instantiated_class', shorten_tracebacks: bool = False)¶ Convenience method for testing yaml configs
test_yaml_config is a convenience method for configuring the moving parts of a Great Expectations deployment. It allows you to quickly test out configs for system components, especially Datasources, Checkpoints, and Stores.
For many deployments of Great Expectations, these components (plus Expectations) are the only ones you’ll need.
test_yaml_config is mainly intended for use within notebooks and tests.
- --Documentation--
- Parameters
yaml_config – A string containing the yaml config to be tested
name – (Optional) A string containing the name of the component to instantiate
pretty_print – Determines whether to print human-readable output
return_mode – Determines what type of object test_yaml_config will return. Valid modes are “instantiated_class” and “report_object”
shorten_tracebacks – If true, catch any errors during instantiation and print only the last element of the traceback stack. This can be helpful for rapid iteration on configs in a notebook, because it can remove the need to scroll up and down a lot.
- Returns
The instantiated component (e.g. a Datasource) OR a json object containing metadata from the component’s self_check method. The returned object is determined by return_mode.
-
profile_datasource
(self, datasource_name, batch_kwargs_generator_name=None, data_assets=None, max_data_assets=20, profile_all_data_assets=True, profiler=BasicDatasetProfiler, profiler_configuration=None, dry_run=False, run_id=None, additional_batch_kwargs=None, run_name=None, run_time=None)¶ Profile the named datasource using the named profiler.
- Parameters
datasource_name – the name of the datasource for which to profile data_assets
batch_kwargs_generator_name – the name of the batch kwargs generator to use to get batches
data_assets – list of data asset names to profile
max_data_assets – if the number of data assets the batch kwargs generator yields is greater than this max_data_assets, profile_all_data_assets=True is required to profile all
profile_all_data_assets – when True, all data assets are profiled, regardless of their number
profiler – the profiler class to use
profiler_configuration – Optional profiler configuration dict
dry_run – when true, the method checks arguments and reports if can profile or specifies the arguments that are missing
additional_batch_kwargs – Additional keyword arguments to be provided to get_batch when loading the data asset.
- Returns
A dictionary:
{ "success": True/False, "results": List of (expectation_suite, EVR) tuples for each of the data_assets found in the datasource }
When success = False, the error details are under “error” key
-
build_data_docs
(self, site_names=None, resource_identifiers=None, dry_run=False, build_index: bool = True)¶ Build Data Docs for your project.
These make it simple to visualize data quality in your project. These include Expectations, Validations & Profiles. The are built for all Datasources from JSON artifacts in the local repo including validations & profiles from the uncommitted directory.
- Parameters
site_names – if specified, build data docs only for these sites, otherwise, build all the sites specified in the context’s config
resource_identifiers – a list of resource identifiers (ExpectationSuiteIdentifier, ValidationResultIdentifier). If specified, rebuild HTML (or other views the data docs sites are rendering) only for the resources in this list. This supports incremental build of data docs sites (e.g., when a new validation result is created) and avoids full rebuild.
dry_run – a flag, if True, the method returns a structure containing the URLs of the sites that would be built, but it does not build these sites. The motivation for adding this flag was to allow the CLI to display the the URLs before building and to let users confirm.
build_index – a flag if False, skips building the index page
- Returns
A dictionary with the names of the updated data documentation sites as keys and the the location info of their index.html files as values
-
_init_site_builder_for_data_docs_site_creation
(self, site_name: str, site_config: dict)¶
-
escape_all_config_variables
(self, value: T, dollar_sign_escape_string: str = DOLLAR_SIGN_ESCAPE_STRING, skip_if_substitution_variable: bool = True)¶ Replace all $ characters with the DOLLAR_SIGN_ESCAPE_STRING
- Parameters
value – config variable value
dollar_sign_escape_string – replaces instances of $
skip_if_substitution_variable – skip if the value is of the form ${MYVAR} or $MYVAR
- Returns
input value with all $ characters replaced with the escape string
-
save_config_variable
(self, config_variable_name: str, value: Any, skip_if_substitution_variable: bool = True)¶ Save config variable value Escapes $ unless they are used in substitution variables e.g. the $ characters in ${SOME_VAR} or $SOME_VAR are not escaped
- Parameters
config_variable_name – name of the property
value – the value to save for the property
skip_if_substitution_variable – set to False to escape $ in values in substitution variable form e.g. ${SOME_VAR} -> r”${SOME_VAR}” or $SOME_VAR -> r”$SOME_VAR”
- Returns
None
-
-
class
great_expectations.data_context.
BaseDataContext
(project_config: Union[DataContextConfig, Mapping], context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_mode: bool = False, cloud_config: Optional[GXCloudConfig] = None, ge_cloud_mode: bool = False, ge_cloud_config: Optional[GXCloudConfig] = None)¶ Bases:
great_expectations.data_context.data_context.ephemeral_data_context.EphemeralDataContext
,great_expectations.core.config_peer.ConfigPeer
This class implements most of the functionality of DataContext, with a few exceptions.
BaseDataContext does not attempt to keep its project_config in sync with a file on disc.
- BaseDataContext doesn’t attempt to “guess” paths or objects types. Instead, that logic is pushed
into DataContext class.
Together, these changes make BaseDataContext class more testable.
OS - Linux - How-to GuideTODO: OS - Linux DescriptionMaturity: ProductionDetails:API Stability: N/AImplementation Completeness: N/AUnit Test Coverage: CompleteIntegration Infrastructure/Test Coverage: CompleteDocumentation Completeness: CompleteBug Risk: LowOS - MacOS - How-to GuideTODO: OS - MacOS DescriptionMaturity: ProductionDetails:API Stability: N/AImplementation Completeness: N/AUnit Test Coverage: Complete (local only)Integration Infrastructure/Test Coverage: Complete (local only)Documentation Completeness: CompleteBug Risk: LowOS - Windows - How-to GuideTODO: OS - Windows DescriptionMaturity: BetaDetails:API Stability: N/AImplementation Completeness: N/AUnit Test Coverage: MinimalIntegration Infrastructure/Test Coverage: MinimalDocumentation Completeness: CompleteBug Risk: ModerateCreate and Edit Expectations - suite scaffold - How-to GuideCreating Expectation Suites through an interactive development loop using suite scaffoldMaturity: Experimental (expect exciting changes to Profiler capability)Details:API Stability: N/AImplementation Completeness: N/AUnit Test Coverage: N/AIntegration Infrastructure/Test Coverage: PartialDocumentation Completeness: CompleteBug Risk: LowCreate and Edit Expectations - CLI - How-to GuideCreating a Expectation Suite great_expectations suite new commandMaturity: Experimental (expect exciting changes to Profiler and Suite Renderer capability)Details:API Stability: N/AImplementation Completeness: N/AUnit Test Coverage: N/AIntegration Infrastructure/Test Coverage: PartialDocumentation Completeness: CompleteBug Risk: LowCreate and Edit Expectations - Json schema - How-to GuideCreating a new Expectation Suite using JsonSchemaProfiler function and json schema fileMaturity: Experimental (expect exciting changes to Profiler capability)Details:API Stability: N/AImplementation Completeness: N/AUnit Test Coverage: N/AIntegration Infrastructure/Test Coverage: PartialDocumentation Completeness: CompleteBug Risk: Low-
UNCOMMITTED_DIRECTORIES
= ['data_docs', 'validations']¶
-
GX_UNCOMMITTED_DIR
= uncommitted¶
-
BASE_DIRECTORIES
¶
-
GX_DIR
= great_expectations¶
-
GX_YML
= great_expectations.yml¶
-
GX_EDIT_NOTEBOOK_DIR
¶
-
DOLLAR_SIGN_ESCAPE_STRING
= \$¶
-
property
ge_cloud_config
(self)¶
-
property
cloud_mode
(self)¶
-
property
ge_cloud_mode
(self)¶
-
_synchronize_self_with_underlying_data_context
(self)¶ This is a helper method that only exists during the DataContext refactor that is occurring 202206.
Until the composition-pattern is complete for BaseDataContext, we have to load the private properties from the private self._data_context object into properties in self
This is a helper method that performs this loading.
-
delete_datasource
(self, datasource_name: str, save_changes: Optional[bool] = None)¶ Delete a data source :param datasource_name: The name of the datasource to delete. :param save_changes: Whether or not to save changes to disk.
- Raises
ValueError – If the datasource name isn’t provided or cannot be found.
-
add_datasource
(self, name: str, initialize: bool = True, save_changes: Optional[bool] = None, **kwargs: dict)¶ Add named datasource, with options to initialize (and return) the datasource and save_config.
Current version will call super(), which preserves the usage_statistics decorator in the current method. A subsequence refactor will migrate the usage_statistics to parent and sibling classes.
- Parameters
name (str) – Name of Datasource
initialize (bool) – Should GX add and initialize the Datasource? If true then current method will return initialized Datasource
save_changes (Optional[bool]) – should GX save the Datasource config?
Optional[dict] (**kwargs) – Additional kwargs that define Datasource initialization kwargs
- Returns
Datasource that was added
-
create_expectation_suite
(self, expectation_suite_name: str, overwrite_existing: bool = False, **kwargs)¶ See AbstractDataContext.create_expectation_suite for more information.
-
get_expectation_suite
(self, expectation_suite_name: Optional[str] = None, include_rendered_content: Optional[bool] = None, ge_cloud_id: Optional[str] = None)¶ - Parameters
expectation_suite_name (str) – The name of the Expectation Suite
include_rendered_content (bool) – Whether or not to re-populate rendered_content for each ExpectationConfiguration.
ge_cloud_id (str) – The GX Cloud ID for the Expectation Suite.
- Returns
An existing ExpectationSuite
-
delete_expectation_suite
(self, expectation_suite_name: Optional[str] = None, ge_cloud_id: Optional[str] = None)¶ See AbstractDataContext.delete_expectation_suite for more information.
-
property
root_directory
(self)¶ The root directory for configuration objects in the data context; the location in which
great_expectations.yml
is located.
-
add_checkpoint
(self, name: str, config_version: Optional[Union[int, float]] = None, template_name: Optional[str] = None, module_name: Optional[str] = None, class_name: Optional[str] = None, run_name_template: Optional[str] = None, expectation_suite_name: Optional[str] = None, batch_request: Optional[dict] = None, action_list: Optional[List[dict]] = None, evaluation_parameters: Optional[dict] = None, runtime_configuration: Optional[dict] = None, validations: Optional[List[dict]] = None, profilers: Optional[List[dict]] = None, validation_operator_name: Optional[str] = None, batches: Optional[List[dict]] = None, site_names: Optional[Union[str, List[str]]] = None, slack_webhook: Optional[str] = None, notify_on: Optional[str] = None, notify_with: Optional[Union[str, List[str]]] = None, ge_cloud_id: Optional[str] = None, expectation_suite_ge_cloud_id: Optional[str] = None, default_validation_id: Optional[str] = None)¶ See parent ‘AbstractDataContext.add_checkpoint()’ for more information
-
save_expectation_suite
(self, expectation_suite: ExpectationSuite, expectation_suite_name: Optional[str] = None, overwrite_existing: bool = True, include_rendered_content: Optional[bool] = None, **kwargs: Optional[dict])¶ Each DataContext will define how ExpectationSuite will be saved.
-
list_checkpoints
(self)¶
-
list_profilers
(self)¶
-
list_expectation_suites
(self)¶ See parent ‘AbstractDataContext.list_expectation_suites()` for more information.
-
list_expectation_suite_names
(self)¶ See parent ‘AbstractDataContext.list_expectation_suite_names()` for more information.
-
_instantiate_datasource_from_config_and_update_project_config
(self, config: DatasourceConfig, initialize: bool, save_changes: bool)¶ Instantiate datasource and optionally persist datasource config to store and/or initialize datasource for use.
- Parameters
config – Config for the datasource.
initialize – Whether to initialize the datasource or return None.
save_changes – Whether to save the datasource config to the configured Datasource store.
- Returns
If initialize=True return an instantiated Datasource object, else None.
-
_determine_key_for_profiler_save
(self, name: str, id: Optional[str])¶
-
class
great_expectations.data_context.
CloudDataContext
(project_config: Optional[Union[DataContextConfig, Mapping]] = None, context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None, ge_cloud_base_url: Optional[str] = None, ge_cloud_access_token: Optional[str] = None, ge_cloud_organization_id: Optional[str] = None)¶ Bases:
great_expectations.data_context.data_context.abstract_data_context.AbstractDataContext
Subclass of AbstractDataContext that contains functionality necessary to hydrate state from cloud
-
_register_providers
(self, config_provider: _ConfigurationProvider)¶ To ensure that Cloud credentials are accessible downstream, we want to ensure that we register a CloudConfigurationProvider.
Note that it is registered last as it takes the highest precedence.
-
classmethod
is_cloud_config_available
(cls, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None)¶ Helper method called by gx.get_context() method to determine whether all the information needed to build a cloud_config is available.
If provided as explicit arguments, cloud_base_url, cloud_access_token and cloud_organization_id will use runtime values instead of environment variables or conf files.
If any of the values are missing, the method will return False. It will return True otherwise.
- Parameters
cloud_base_url – Optional, you may provide this alternatively via environment variable GX_CLOUD_BASE_URL or within a config file.
cloud_access_token – Optional, you may provide this alternatively via environment variable GX_CLOUD_ACCESS_TOKEN or within a config file.
cloud_organization_id – Optional, you may provide this alternatively via environment variable GX_CLOUD_ORGANIZATION_ID or within a config file.
- Returns
Is all the information needed to build a cloud_config is available?
- Return type
bool
-
classmethod
determine_context_root_directory
(cls, context_root_dir: Optional[str])¶
-
classmethod
retrieve_data_context_config_from_cloud
(cls, cloud_config: GXCloudConfig)¶ Utilizes the GXCloudConfig instantiated in the constructor to create a request to the Cloud API. Given proper authorization, the request retrieves a data context config that is pre-populated with GX objects specific to the user’s Cloud environment (datasources, data connectors, etc).
Please note that substitution for ${VAR} variables is performed in GX Cloud before being sent over the wire.
- Returns
the configuration object retrieved from the Cloud API
-
classmethod
get_cloud_config
(cls, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None)¶ Build a GXCloudConfig object. Config attributes are collected from any combination of args passed in at runtime, environment variables, or a global great_expectations.conf file (in order of precedence).
If provided as explicit arguments, cloud_base_url, cloud_access_token and cloud_organization_id will use runtime values instead of environment variables or conf files.
- Parameters
cloud_base_url – Optional, you may provide this alternatively via environment variable GX_CLOUD_BASE_URL or within a config file.
cloud_access_token – Optional, you may provide this alternatively via environment variable GX_CLOUD_ACCESS_TOKEN or within a config file.
cloud_organization_id – Optional, you may provide this alternatively via environment variable GX_CLOUD_ORGANIZATION_ID or within a config file.
- Returns
GXCloudConfig
- Raises
GXCloudError if a GX Cloud variable is missing –
-
classmethod
_get_cloud_config_dict
(cls, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None)¶
-
classmethod
_get_cloud_env_var
(cls, primary_environment_variable: GXCloudEnvironmentVariable, deprecated_environment_variable: GXCloudEnvironmentVariable, conf_file_section: str, conf_file_option: str)¶ Utility to gradually deprecate environment variables prefixed with GE.
This method is aimed to initially attempt retrieval with the GX prefix and only attempt to grab the deprecated value if unsuccessful.
-
_init_datasource_store
(self)¶ Internal utility responsible for creating a DatasourceStore to persist and manage a user’s Datasources.
Please note that the DatasourceStore lacks the same extensibility that other analagous Stores do; a default implementation is provided based on the user’s environment but is not customizable.
-
list_expectation_suite_names
(self)¶ Lists the available expectation suite names. If in ge_cloud_mode, a list of GX Cloud ids is returned instead.
-
property
ge_cloud_config
(self)¶
-
property
cloud_mode
(self)¶
-
property
ge_cloud_mode
(self)¶
-
_init_variables
(self)¶
-
_construct_data_context_id
(self)¶ Choose the id of the currently-configured expectations store, if available and a persistent store. If not, it should choose the id stored in DataContextConfig. :returns: UUID to use as the data_context_id
-
get_config_with_variables_substituted
(self, config: Optional[DataContextConfig] = None)¶ Substitute vars in config of form ${var} or $(var) with values found in the following places, in order of precedence: ge_cloud_config (for Data Contexts in GX Cloud mode), runtime_environment, environment variables, config_variables, or ge_cloud_config_variable_defaults (allows certain variables to be optional in GX Cloud mode).
-
create_expectation_suite
(self, expectation_suite_name: str, overwrite_existing: bool = False, **kwargs: Optional[dict])¶ Build a new expectation suite and save it into the data_context expectation store.
- Parameters
expectation_suite_name – The name of the expectation_suite to create
overwrite_existing (boolean) – Whether to overwrite expectation suite if expectation suite with given name already exists.
- Returns
A new (empty) expectation suite.
-
delete_expectation_suite
(self, expectation_suite_name: Optional[str] = None, ge_cloud_id: Optional[str] = None)¶ Delete specified expectation suite from data_context expectation store.
- Parameters
expectation_suite_name – The name of the expectation_suite to create
- Returns
True for Success and False for Failure.
-
get_expectation_suite
(self, expectation_suite_name: Optional[str] = None, include_rendered_content: Optional[bool] = None, ge_cloud_id: Optional[str] = None)¶ Get an Expectation Suite by name or GX Cloud ID :param expectation_suite_name: The name of the Expectation Suite :type expectation_suite_name: str :param include_rendered_content: Whether or not to re-populate rendered_content for each
ExpectationConfiguration.
- Parameters
ge_cloud_id (str) – The GX Cloud ID for the Expectation Suite.
- Returns
An existing ExpectationSuite
-
save_expectation_suite
(self, expectation_suite: ExpectationSuite, expectation_suite_name: Optional[str] = None, overwrite_existing: bool = True, include_rendered_content: Optional[bool] = None, **kwargs: Optional[dict])¶ Save the provided expectation suite into the DataContext.
- Parameters
expectation_suite – The suite to save.
expectation_suite_name – The name of this Expectation Suite. If no name is provided, the name will be read from the suite.
overwrite_existing – Whether to overwrite the suite if it already exists.
include_rendered_content – Whether to save the prescriptive rendered content for each expectation.
- Returns
None
-
_validate_suite_unique_constaints_before_save
(self, key: GXCloudIdentifier)¶
-
property
root_directory
(self)¶ The root directory for configuration objects in the data context; the location in which
great_expectations.yml
is located.Why does this exist in AbstractDataContext? CloudDataContext and FileDataContext both use it
-
add_checkpoint
(self, name: str, config_version: Optional[Union[int, float]] = None, template_name: Optional[str] = None, module_name: Optional[str] = None, class_name: Optional[str] = None, run_name_template: Optional[str] = None, expectation_suite_name: Optional[str] = None, batch_request: Optional[dict] = None, action_list: Optional[List[dict]] = None, evaluation_parameters: Optional[dict] = None, runtime_configuration: Optional[dict] = None, validations: Optional[List[dict]] = None, profilers: Optional[List[dict]] = None, validation_operator_name: Optional[str] = None, batches: Optional[List[dict]] = None, site_names: Optional[Union[str, List[str]]] = None, slack_webhook: Optional[str] = None, notify_on: Optional[str] = None, notify_with: Optional[Union[str, List[str]]] = None, ge_cloud_id: Optional[str] = None, expectation_suite_ge_cloud_id: Optional[str] = None, default_validation_id: Optional[str] = None)¶ See AbstractDataContext.add_checkpoint for more information.
-
list_checkpoints
(self)¶
-
list_profilers
(self)¶
-
_init_site_builder_for_data_docs_site_creation
(self, site_name: str, site_config: dict)¶ Note that this explicitly overriding the AbstractDataContext helper method called in self.build_data_docs().
The only difference here is the inclusion of ge_cloud_mode in the runtime_environment used in SiteBuilder instantiation.
-
_determine_key_for_profiler_save
(self, name: str, id: Optional[str])¶ Note that this explicitly overriding the AbstractDataContext helper method called in self.save_profiler().
The only difference here is the creation of a Cloud-specific GXCloudIdentifier instead of the usual ConfigurationIdentifier for Store interaction.
-
-
class
great_expectations.data_context.
DataContext
(context_root_dir: Optional[str] = None, runtime_environment: Optional[dict] = None, cloud_mode: bool = False, cloud_base_url: Optional[str] = None, cloud_access_token: Optional[str] = None, cloud_organization_id: Optional[str] = None, ge_cloud_mode: bool = False, ge_cloud_base_url: Optional[str] = None, ge_cloud_access_token: Optional[str] = None, ge_cloud_organization_id: Optional[str] = None)¶ Bases:
great_expectations.data_context.data_context.base_data_context.BaseDataContext
A DataContext represents a Great Expectations project. It is the primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.
The DataContext is configured via a yml file stored in a directory called great_expectations; this configuration file as well as managed Expectation Suites should be stored in version control. There are other ways to create a Data Context that may be better suited for your particular deployment e.g. ephemerally or backed by GX Cloud (coming soon). Please refer to our documentation for more details.
You can Validate data or generate Expectations using Execution Engines including:
SQL (multiple dialects supported)
Spark
Pandas
Your data can be stored in common locations including:
databases / data warehouses
files in s3, GCS, Azure, local storage
dataframes (spark and pandas) loaded into memory
Please see our documentation for examples on how to set up Great Expectations, connect to your data, create Expectations, and Validate data.
Other configuration options you can apply to a DataContext besides how to access data include things like where to store Expectations, Profilers, Checkpoints, Metrics, Validation Results and Data Docs and how those Stores are configured. Take a look at our documentation for more configuration options.
–Public API–
- --Documentation--
-
classmethod
create
(cls, project_root_dir: Optional[str] = None, usage_statistics_enabled: bool = True, runtime_environment: Optional[dict] = None)¶ Build a new great_expectations directory and DataContext object in the provided project_root_dir.
create will create a new “great_expectations” directory in the provided folder, provided one does not already exist. Then, it will initialize a new DataContext in that folder and write the resulting config.
- --Documentation--
- Parameters
project_root_dir – path to the root directory in which to create a new great_expectations directory
usage_statistics_enabled – boolean directive specifying whether or not to gather usage statistics
runtime_environment – a dictionary of config variables that override both those set in config_variables.yml and the environment
- Returns
DataContext
-
classmethod
all_uncommitted_directories_exist
(cls, ge_dir: str)¶ Check if all uncommitted directories exist.
-
classmethod
config_variables_yml_exist
(cls, ge_dir: str)¶ Check if all config_variables.yml exists.
-
classmethod
write_config_variables_template_to_disk
(cls, uncommitted_dir: str)¶
-
classmethod
write_project_template_to_disk
(cls, ge_dir: str, usage_statistics_enabled: bool = True)¶
-
classmethod
scaffold_directories
(cls, base_dir: str)¶ Safely create GX directories for a new project.
-
classmethod
scaffold_custom_data_docs
(cls, plugins_dir: str)¶ Copy custom data docs templates
-
_save_project_config
(self)¶ See parent ‘AbstractDataContext._save_project_config()` for more information.
Explicitly override base class implementation to retain legacy behavior.
-
_attach_datasource_to_context
(self, datasource: XDatasource)¶
-
property
sources
(self)¶
-
_init_cloud_config
(self, cloud_mode: bool, cloud_base_url: Optional[str], cloud_access_token: Optional[str], cloud_organization_id: Optional[str])¶
-
_init_context_root_directory
(self, context_root_dir: Optional[str])¶
-
_check_for_usage_stats_sync
(self, project_config: DataContextConfig)¶ If there are differences between the DataContextConfig used to instantiate the DataContext and the DataContextConfig assigned to self.config, we want to save those changes to disk so that subsequent instantiations will utilize the same values.
A small caveat is that if that difference stems from a global override (env var or conf file), we don’t want to write to disk. This is due to the fact that those mechanisms allow for dynamic values and saving them will make them static.
- Parameters
project_config – The DataContextConfig used to instantiate the DataContext.
- Returns
A boolean signifying whether or not the current DataContext’s config needs to be persisted in order to recognize changes made to usage statistics.
-
_load_project_config
(self)¶ Reads the project configuration from the project configuration file. The file may contain ${SOME_VARIABLE} variables - see self.project_config_with_variables_substituted for how these are substituted.
For Data Contexts in GX Cloud mode, a user-specific template is retrieved from the Cloud API - see CloudDataContext.retrieve_data_context_config_from_cloud for more details.
- Returns
the configuration object read from the file or template
-
add_store
(self, store_name, store_config)¶ Add a new Store to the DataContext and (for convenience) return the instantiated Store object.
- Parameters
store_name (str) – a key for the new Store in in self._stores
store_config (dict) – a config for the Store to add
- Returns
store (Store)
-
add_datasource
(self, name: str, **kwargs: dict)¶ Add named datasource, with options to initialize (and return) the datasource and save_config.
Current version will call super(), which preserves the usage_statistics decorator in the current method. A subsequence refactor will migrate the usage_statistics to parent and sibling classes.
- Parameters
name (str) – Name of Datasource
initialize (bool) – Should GX add and initialize the Datasource? If true then current method will return initialized Datasource
save_changes (Optional[bool]) – should GX save the Datasource config?
Optional[dict] (**kwargs) – Additional kwargs that define Datasource initialization kwargs
- Returns
Datasource that was added
-
update_datasource
(self, datasource: Union[LegacyDatasource, BaseDatasource])¶ See parent BaseDataContext.update_datasource for more details. Note that this method persists changes using an underlying Store.
-
delete_datasource
(self, name: str)¶ Delete a data source :param datasource_name: The name of the datasource to delete. :param save_changes: Whether or not to save changes to disk.
- Raises
ValueError – If the datasource name isn’t provided or cannot be found.
-
classmethod
find_context_root_dir
(cls)¶
-
classmethod
get_ge_config_version
(cls, context_root_dir: Optional[str] = None)¶
-
classmethod
set_ge_config_version
(cls, config_version: Union[int, float], context_root_dir: Optional[str] = None, validate_config_version: bool = True)¶
-
classmethod
find_context_yml_file
(cls, search_start_dir: Optional[str] = None)¶ Search for the yml file starting here and moving upward.
-
classmethod
does_config_exist_on_disk
(cls, context_root_dir: str)¶ Return True if the great_expectations.yml exists on disk.
-
classmethod
is_project_initialized
(cls, ge_dir: str)¶ Return True if the project is initialized.
To be considered initialized, all of the following must be true: - all project directories exist (including uncommitted directories) - a valid great_expectations.yml is on disk - a config_variables.yml is on disk - the project has at least one datasource - the project has at least one suite
-
classmethod
does_project_have_a_datasource_in_config_file
(cls, ge_dir: str)¶
-
classmethod
_does_context_have_at_least_one_datasource
(cls, ge_dir: str)¶
-
classmethod
_does_context_have_at_least_one_suite
(cls, ge_dir: str)¶
-
classmethod
_attempt_context_instantiation
(cls, ge_dir: str)¶
-
class
great_expectations.data_context.
EphemeralDataContext
(project_config: DataContextConfig, runtime_environment: Optional[dict] = None)¶ Bases:
great_expectations.data_context.data_context.abstract_data_context.AbstractDataContext
Will contain functionality to create DataContext at runtime (ie. passed in config object or from stores). Users will be able to use EphemeralDataContext for having a temporary or in-memory DataContext
TODO: Most of the BaseDataContext code will be migrated to this class, which will continue to exist for backwards compatibility reasons.
-
_init_variables
(self)¶
-
_init_datasource_store
(self)¶ Internal utility responsible for creating a DatasourceStore to persist and manage a user’s Datasources.
Please note that the DatasourceStore lacks the same extensibility that other analagous Stores do; a default implementation is provided based on the user’s environment but is not customizable.
-
-
class
great_expectations.data_context.
ExplorerDataContext
(context_root_dir=None, expectation_explorer=True)¶ Bases:
great_expectations.data_context.data_context.data_context.DataContext
A DataContext represents a Great Expectations project. It is the primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.
The DataContext is configured via a yml file stored in a directory called great_expectations; this configuration file as well as managed Expectation Suites should be stored in version control. There are other ways to create a Data Context that may be better suited for your particular deployment e.g. ephemerally or backed by GX Cloud (coming soon). Please refer to our documentation for more details.
You can Validate data or generate Expectations using Execution Engines including:
SQL (multiple dialects supported)
Spark
Pandas
Your data can be stored in common locations including:
databases / data warehouses
files in s3, GCS, Azure, local storage
dataframes (spark and pandas) loaded into memory
Please see our documentation for examples on how to set up Great Expectations, connect to your data, create Expectations, and Validate data.
Other configuration options you can apply to a DataContext besides how to access data include things like where to store Expectations, Profilers, Checkpoints, Metrics, Validation Results and Data Docs and how those Stores are configured. Take a look at our documentation for more configuration options.
–Public API–
- --Documentation--
-
update_return_obj
(self, data_asset, return_obj)¶ Helper called by data_asset.
- Parameters
data_asset – The data_asset whose validation produced the current return object
return_obj – the return object to update
- Returns
the return object, potentially changed into a widget by the configured expectation explorer
- Return type
return_obj
-
class
great_expectations.data_context.
FileDataContext
(project_config: DataContextConfig, context_root_dir: str, runtime_environment: Optional[dict] = None)¶ Bases:
great_expectations.data_context.data_context.abstract_data_context.AbstractDataContext
Extends AbstractDataContext, contains only functionality necessary to hydrate state from disk.
TODO: Most of the functionality in DataContext will be refactored into this class, and the current DataContext class will exist only for backwards-compatibility reasons.
-
GX_YML
= great_expectations.yml¶
-
_init_datasource_store
(self)¶ Internal utility responsible for creating a DatasourceStore to persist and manage a user’s Datasources.
Please note that the DatasourceStore lacks the same extensibility that other analagous Stores do; a default implementation is provided based on the user’s environment but is not customizable.
-
property
root_directory
(self)¶ The root directory for configuration objects in the data context; the location in which
great_expectations.yml
is located.Why does this exist in AbstractDataContext? CloudDataContext and FileDataContext both use it
-
_init_variables
(self)¶
-