Data Context Module¶
great_expectations.data_context.DataContext¶
- class
great_expectations.data_context.
DataContext
(context_root_dir=None, expectation_explorer=False, data_asset_name_delimiter='/')¶Bases:
object
A DataContext represents a Great Expectations project. It organizes storage and access for expectation suites, datasources, notification settings, and data fixtures.
The DataContext is configured via a yml file stored in a directory called great_expectations; the configuration file as well as managed expectation suites should be stored in version control.
Use the create classmethod to create a new empty config, or instantiate the DataContext by passing the path to an existing data context root directory.
DataContexts use data sources you’re already familiar with. Generators help introspect data stores and data execution frameworks (such as airflow, Nifi, dbt, or dagster) to describe and produce batches of data ready for analysis. This enables fetching, validation, profiling, and documentation of your data in a way that is meaningful within your existing infrastructure and work environment.
DataContexts use a datasource-based namespace, where each accessible type of data has a three-part normalized data_asset_name, consisting of datasource/generator/generator_asset.
The datasource actually connects to a source of materialized data and returns Great Expectations DataAssets connected to a compute environment and ready for validation.
The Generator knows how to introspect datasources and produce identifying “batch_kwargs” that define particular slices of data.
The generator_asset is a specific name – often a table name or other name familiar to users – that generators can slice into batches.
An expectation suite is a collection of expectations ready to be applied to a batch of data. Since in many projects it is useful to have different expectations evaluate in different contexts–profiling vs. testing; warning vs. error; high vs. low compute; ML model or dashboard–suites provide a namespace option for selecting which expectations a DataContext returns.
In many simple projects, the datasource or generator name may be omitted and the DataContext will infer the correct name when there is no ambiguity.
Similarly, if no expectation suite name is provided, the DataContext will assume the name “default”.
PROFILING_ERROR_CODE_TOO_MANY_DATA_ASSETS
= 2¶
PROFILING_ERROR_CODE_SPECIFIED_DATA_ASSETS_NOT_FOUND
= 3¶
- classmethod
create
(project_root_dir=None)¶Build a new great_expectations directory and DataContext object in the provided project_root_dir.
create will not create a new “great_expectations” directory in the provided folder, provided one does not already exist. Then, it will initialize a new DataContext in that folder and write the resulting config.
- Parameters
project_root_dir – path to the root directory in which to create a new great_expectations directory
- Returns
DataContext
- property
root_directory
¶The root directory for configuration objects in the data context; the location in which
great_expectations.yml
is located.
- property
plugins_directory
¶The directory in which custom plugin modules should be placed.
- property
expectations_directory
¶The directory in which custom plugin modules should be placed.
- property
validations_store
¶The configuration for the store where validations should be stored
- property
data_asset_name_delimiter
¶Configurable delimiter character used to parse data asset name strings into
NormalizedDataAssetName
objects.
get_validation_location
(data_asset_name, expectation_suite_name, run_id, validations_store=None)¶Get the local path where a validation result is stored, given full asset name and run id
- Parameters
data_asset_name – name of data asset for which to get validation location
expectation_suite_name – name of expectation suite for which to get validation location
run_id – run_id of validation to get. If no run_id is specified, fetch the latest run_id according to alphanumeric sort (by default, the latest run_id if using ISO 8601 formatted timestamps for run_id
validations_store – the store in which validations are located
- Returns
path to the validation location for the specified data_asset, expectation_suite and run_id
- Return type
path (str)
get_validation_doc_filepath
(data_asset_name, expectation_suite_name)¶Get the local path where a the rendered html doc for a validation result is stored, given full asset name.
- Parameters
data_asset_name – name of data asset for which to get documentation filepath
expectation_suite_name – name of expectation suite for which to get validation location
- Returns
Path to the location
- Return type
path (str)
move_validation_to_fixtures
(data_asset_name, expectation_suite_name, run_id)¶Move validation results from uncommitted to fixtures/validations to make available for the data doc renderer
- Parameters
data_asset_name – name of data asset for which to get documentation filepath
expectation_suite_name – name of expectation suite for which to get validation location
run_id – run_id of validation to get. If no run_id is specified, fetch the latest run_id according to alphanumeric sort (by default, the latest run_id if using ISO 8601 formatted timestamps for run_id
- Returns
None
get_project_config
()¶
get_profile_credentials
(profile_name)¶Get named profile credentials.
- Parameters
profile_name (str) – name of the profile for which to get credentials
- Returns
dictionary of credentials
- Return type
credentials (dict)
add_profile_credentials
(profile_name, **kwargs)¶Add named profile credentials.
- Parameters
profile_name – name of the profile for which to add credentials
**kwargs – credential key-value pairs
- Returns
None
get_datasource_config
(datasource_name)¶Get the configuration for a configured datasource
- Parameters
datasource_name – The datasource for which to get the config
- Returns
dictionary containing datasource configuration
- Return type
datasource_config (dict)
get_available_data_asset_names
(datasource_names=None, generator_names=None)¶Inspect datasource and generators to provide available data_asset objects.
- Parameters
datasource_names – list of datasources for which to provide available data_asset_name objects. If None, return available data assets for all datasources.
generator_names – list of generators for which to provide available data_asset_name objects.
- Returns
Dictionary describing available data assets
{ datasource_name: { generator_name: [ data_asset_1, data_asset_2, ... ] ... } ... }- Return type
data_asset_names (dict)
get_batch
(data_asset_name, expectation_suite_name='default', batch_kwargs=None, **kwargs)¶Get a batch of data from the specified data_asset_name. Attaches the named expectation_suite, and uses the provided batch_kwargs.
- Parameters
data_asset_name – name of the data asset. The name will be normalized. (See
_normalize_data_asset_name()
)expectation_suite_name – name of the expectation suite to attach to the data_asset returned
batch_kwargs – key-value pairs describing the batch of data the datasource should fetch. (See
BatchGenerator
) If no batch_kwargs are specified, then the context will get the nextbatch_kwargs for the data_asset. (available) –
**kwargs – additional key-value pairs to pass to the datasource when fetching the batch.
- Returns
Great Expectations data_asset with attached expectation_suite and DataContext
add_datasource
(name, type_, **kwargs)¶Add a new datasource to the data context.
The type_ parameter must match one of the recognized types for the DataContext
- Parameters
name (str) – the name for the new datasource to add
type (str) – the type of datasource to add
- Returns
datasource (Datasource)
get_config
()¶
get_datasource
(datasource_name='default')¶Get the named datasource
- Parameters
datasource_name (str) – the name of the datasource from the configuration
- Returns
datasource (Datasource)
list_expectation_suites
()¶Returns currently-defined expectation suites available in a nested dictionary structure reflecting the namespace provided by this DataContext.
- Returns
Dictionary of currently-defined expectation suites:
{ datasource: { generator: { generator_asset: [list_of_expectation_suites] } } ... }
list_datasources
()¶List currently-configured datasources on this context.
- Returns
each dictionary includes “name” and “type” keys
- Return type
List(dict)
get_expectation_suite
(data_asset_name, expectation_suite_name='default')¶Get or create a named expectation suite for the provided data_asset_name.
- Parameters
data_asset_name (str or NormalizedDataAssetName) – the data asset name to which the expectation suite belongs
expectation_suite_name (str) – the name for the expectation suite
- Returns
expectation_suite
save_expectation_suite
(expectation_suite, data_asset_name=None, expectation_suite_name=None)¶Save the provided expectation suite into the DataContext.
- Parameters
expectation_suite – the suite to save
data_asset_name – the data_asset_name for this expectation suite. If no name is provided, the name will be read from the suite
expectation_suite_name – the name of this expectation suite. If no name is provided the name will be read from the suite
- Returns
None
bind_evaluation_parameters
(run_id)¶Return current evaluation parameters stored for the provided run_id, ready to be bound to parameterized expectation values.
- Parameters
run_id – the run_id for which to return evaluation parameters
- Returns
evaluation_parameters (dict)
register_validation_results
(run_id, validation_results, data_asset=None)¶
- Process results of a validation run. This method is called by data_asset objects that are connected to
- a DataContext during validation. It performs several actions:
store the validation results to a validations_store, if one is configured
store a snapshot of the data_asset, if so configured and a compatible data_asset is available
perform a callback action using the validation results, if one is configured
retrieve validation results referenced in other parameterized expectations and store them in the evaluation parameter store.
- Parameters
run_id – the run_id for which to register validation results
validation_results – the validation results object
data_asset – the data_asset to snapshot, if snapshot is configured
- Returns
Validation results object, with updated meta information including references to stored data, if appropriate
- Return type
validation_results
store_validation_param
(run_id, key, value)¶Store a new validation parameter.
- Parameters
run_id – current run_id
key – parameter key
value – parameter value
- Returns
None
get_validation_param
(run_id, key)¶Get a new validation parameter.
- Parameters
run_id – run_id for desired value
key – parameter key
- Returns
value stored in evaluation_parameter_store for the provided run_id and key
write_resource
(resource, resource_name, resource_store, resource_namespace=None, data_asset_name=None, expectation_suite_name=None, run_id=None)¶Writes the bytes in “resource” according to the resource_store’s writing method, with a name constructed as follows:
resource_namespace/run_id/data_asset_name/expectation_suite_name/resource_name
If any of those components is None, it is omitted from the namespace.
- Parameters
resource –
resource_name –
resource_store –
resource_namespace –
data_asset_name –
expectation_suite_name –
run_id –
- Returns
A dictionary describing how to locate the resource (specific to resource_store type)
list_validation_results
(validations_store=None)¶
- Returns
A dictionary describing validation results in the following format:
{ "run_id": "datasource": { "generator": { "generator_asset": [expectation_suite_1, expectation_suite_1, ...] } } }
get_validation_result
(data_asset_name, expectation_suite_name='default', run_id=None, validations_store=None, failed_only=False)¶Get validation results from a configured store.
- Parameters
data_asset_name – name of data asset for which to get validation result
expectation_suite_name – expectation_suite name for which to get validation result (default: “default”)
run_id – run_id for which to get validation result (if None, fetch the latest result by alphanumeric sort)
validations_store – the store from which to get validation results
failed_only – if True, filter the result to return only failed expectations
- Returns
validation_result
update_return_obj
(data_asset, return_obj)¶Helper called by data_asset.
- Parameters
data_asset – The data_asset whose validation produced the current return object
return_obj – the return object to update
- Returns
the return object, potentially changed into a widget by the configured expectation explorer
- Return type
return_obj
build_data_documentation
(site_names=None, data_asset_name=None)¶TODO!!!!
- Returns
A dictionary with the names of the updated data documentation sites as keys and the the location info of their index.html files as values
get_absolute_path
(path)¶
profile_datasource
(datasource_name, generator_name=None, data_assets=None, max_data_assets=20, profile_all_data_assets=True, profiler=<class 'great_expectations.profile.basic_dataset_profiler.BasicDatasetProfiler'>, dry_run=False, additional_batch_kwargs=None)¶Profile the named datasource using the named profiler.
- Parameters
datasource_name – the name of the datasource for which to profile data_assets
generator_name – the name of the generator to use to get batches
data_assets – list of data asset names to profile
max_data_assets – if the number of data assets the generator yields is greater than this max_data_assets, profile_all_data_assets=True is required to profile all
profile_all_data_assets – when True, all data assets are profiled, regardless of their number
profiler – the profiler class to use
dry_run – when true, the method checks arguments and reports if can profile or specifies the arguments that are missing
additional_batch_kwargs – Additional keyword arguments to be provided to get_batch when loading the data asset.
- Returns
A dictionary:
{ "success": True/False, "results": List of (expectation_suite, EVR) tuples for each of the data_assets found in the datasource }When success = False, the error details are under “error” key
-
great_expectations.data_context.util.
build_slack_notification_request
(validation_json=None)¶
-
great_expectations.data_context.util.
get_slack_callback
(webhook)¶
-
great_expectations.data_context.util.
safe_mmkdir
(directory, exist_ok=True)¶ Simple wrapper since exist_ok is not available in python 2