Data Context Module

great_expectations.data_context.DataContext

class great_expectations.data_context.DataContext(context_root_dir=None, expectation_explorer=False, data_asset_name_delimiter='/')

Bases: object

A DataContext represents a Great Expectations project. It organizes storage and access for expectation suites, datasources, notification settings, and data fixtures.

The DataContext is configured via a yml file stored in a directory called great_expectations; the configuration file as well as managed expectation suites should be stored in version control.

Use the create classmethod to create a new empty config, or instantiate the DataContext by passing the path to an existing data context root directory.

DataContexts use data sources you’re already familiar with. Generators help introspect data stores and data execution frameworks (such as airflow, Nifi, dbt, or dagster) to describe and produce batches of data ready for analysis. This enables fetching, validation, profiling, and documentation of your data in a way that is meaningful within your existing infrastructure and work environment.

DataContexts use a datasource-based namespace, where each accessible type of data has a three-part normalized data_asset_name, consisting of datasource/generator/generator_asset.

  • The datasource actually connects to a source of materialized data and returns Great Expectations DataAssets connected to a compute environment and ready for validation.

  • The Generator knows how to introspect datasources and produce identifying “batch_kwargs” that define particular slices of data.

  • The generator_asset is a specific name – often a table name or other name familiar to users – that generators can slice into batches.

An expectation suite is a collection of expectations ready to be applied to a batch of data. Since in many projects it is useful to have different expectations evaluate in different contexts–profiling vs. testing; warning vs. error; high vs. low compute; ML model or dashboard–suites provide a namespace option for selecting which expectations a DataContext returns.

In many simple projects, the datasource or generator name may be omitted and the DataContext will infer the correct name when there is no ambiguity.

Similarly, if no expectation suite name is provided, the DataContext will assume the name “default”.

PROFILING_ERROR_CODE_TOO_MANY_DATA_ASSETS = 2
PROFILING_ERROR_CODE_SPECIFIED_DATA_ASSETS_NOT_FOUND = 3
classmethod create(project_root_dir=None)

Build a new great_expectations directory and DataContext object in the provided project_root_dir.

create will not create a new “great_expectations” directory in the provided folder, provided one does not already exist. Then, it will initialize a new DataContext in that folder and write the resulting config.

Parameters

project_root_dir – path to the root directory in which to create a new great_expectations directory

Returns

DataContext

property root_directory

The root directory for configuration objects in the data context; the location in which great_expectations.yml is located.

property plugins_directory

The directory in which custom plugin modules should be placed.

property expectations_directory

The directory in which custom plugin modules should be placed.

property validations_store

The configuration for the store where validations should be stored

property data_asset_name_delimiter

Configurable delimiter character used to parse data asset name strings into NormalizedDataAssetName objects.

get_validation_location(data_asset_name, expectation_suite_name, run_id, validations_store=None)

Get the local path where a validation result is stored, given full asset name and run id

Parameters
  • data_asset_name – name of data asset for which to get validation location

  • expectation_suite_name – name of expectation suite for which to get validation location

  • run_id – run_id of validation to get. If no run_id is specified, fetch the latest run_id according to alphanumeric sort (by default, the latest run_id if using ISO 8601 formatted timestamps for run_id

  • validations_store – the store in which validations are located

Returns

path to the validation location for the specified data_asset, expectation_suite and run_id

Return type

path (str)

get_validation_doc_filepath(data_asset_name, expectation_suite_name)

Get the local path where a the rendered html doc for a validation result is stored, given full asset name.

Parameters
  • data_asset_name – name of data asset for which to get documentation filepath

  • expectation_suite_name – name of expectation suite for which to get validation location

Returns

Path to the location

Return type

path (str)

move_validation_to_fixtures(data_asset_name, expectation_suite_name, run_id)

Move validation results from uncommitted to fixtures/validations to make available for the data doc renderer

Parameters
  • data_asset_name – name of data asset for which to get documentation filepath

  • expectation_suite_name – name of expectation suite for which to get validation location

  • run_id – run_id of validation to get. If no run_id is specified, fetch the latest run_id according to alphanumeric sort (by default, the latest run_id if using ISO 8601 formatted timestamps for run_id

Returns

None

get_project_config()
get_profile_credentials(profile_name)

Get named profile credentials.

Parameters

profile_name (str) – name of the profile for which to get credentials

Returns

dictionary of credentials

Return type

credentials (dict)

add_profile_credentials(profile_name, **kwargs)

Add named profile credentials.

Parameters
  • profile_name – name of the profile for which to add credentials

  • **kwargs – credential key-value pairs

Returns

None

get_datasource_config(datasource_name)

Get the configuration for a configured datasource

Parameters

datasource_name – The datasource for which to get the config

Returns

dictionary containing datasource configuration

Return type

datasource_config (dict)

get_available_data_asset_names(datasource_names=None, generator_names=None)

Inspect datasource and generators to provide available data_asset objects.

Parameters
  • datasource_names – list of datasources for which to provide available data_asset_name objects. If None, return available data assets for all datasources.

  • generator_names – list of generators for which to provide available data_asset_name objects.

Returns

Dictionary describing available data assets

{
  datasource_name: {
    generator_name: [ data_asset_1, data_asset_2, ... ]
    ...
  }
  ...
}

Return type

data_asset_names (dict)

get_batch(data_asset_name, expectation_suite_name='default', batch_kwargs=None, **kwargs)

Get a batch of data from the specified data_asset_name. Attaches the named expectation_suite, and uses the provided batch_kwargs.

Parameters
  • data_asset_name – name of the data asset. The name will be normalized. (See _normalize_data_asset_name() )

  • expectation_suite_name – name of the expectation suite to attach to the data_asset returned

  • batch_kwargs – key-value pairs describing the batch of data the datasource should fetch. (See BatchGenerator ) If no batch_kwargs are specified, then the context will get the next

  • batch_kwargs for the data_asset. (available) –

  • **kwargs – additional key-value pairs to pass to the datasource when fetching the batch.

Returns

Great Expectations data_asset with attached expectation_suite and DataContext

add_datasource(name, type_, **kwargs)

Add a new datasource to the data context.

The type_ parameter must match one of the recognized types for the DataContext

Parameters
  • name (str) – the name for the new datasource to add

  • type (str) – the type of datasource to add

Returns

datasource (Datasource)

get_config()
get_datasource(datasource_name='default')

Get the named datasource

Parameters

datasource_name (str) – the name of the datasource from the configuration

Returns

datasource (Datasource)

list_expectation_suites()

Returns currently-defined expectation suites available in a nested dictionary structure reflecting the namespace provided by this DataContext.

Returns

Dictionary of currently-defined expectation suites:

{
  datasource: {
    generator: {
      generator_asset: [list_of_expectation_suites]
    }
  }
  ...
}

list_datasources()

List currently-configured datasources on this context.

Returns

each dictionary includes “name” and “type” keys

Return type

List(dict)

get_expectation_suite(data_asset_name, expectation_suite_name='default')

Get or create a named expectation suite for the provided data_asset_name.

Parameters
  • data_asset_name (str or NormalizedDataAssetName) – the data asset name to which the expectation suite belongs

  • expectation_suite_name (str) – the name for the expectation suite

Returns

expectation_suite

save_expectation_suite(expectation_suite, data_asset_name=None, expectation_suite_name=None)

Save the provided expectation suite into the DataContext.

Parameters
  • expectation_suite – the suite to save

  • data_asset_name – the data_asset_name for this expectation suite. If no name is provided, the name will be read from the suite

  • expectation_suite_name – the name of this expectation suite. If no name is provided the name will be read from the suite

Returns

None

bind_evaluation_parameters(run_id)

Return current evaluation parameters stored for the provided run_id, ready to be bound to parameterized expectation values.

Parameters

run_id – the run_id for which to return evaluation parameters

Returns

evaluation_parameters (dict)

register_validation_results(run_id, validation_results, data_asset=None)
Process results of a validation run. This method is called by data_asset objects that are connected to
a DataContext during validation. It performs several actions:
  • store the validation results to a validations_store, if one is configured

  • store a snapshot of the data_asset, if so configured and a compatible data_asset is available

  • perform a callback action using the validation results, if one is configured

  • retrieve validation results referenced in other parameterized expectations and store them in the evaluation parameter store.

Parameters
  • run_id – the run_id for which to register validation results

  • validation_results – the validation results object

  • data_asset – the data_asset to snapshot, if snapshot is configured

Returns

Validation results object, with updated meta information including references to stored data, if appropriate

Return type

validation_results

store_validation_param(run_id, key, value)

Store a new validation parameter.

Parameters
  • run_id – current run_id

  • key – parameter key

  • value – parameter value

Returns

None

get_validation_param(run_id, key)

Get a new validation parameter.

Parameters
  • run_id – run_id for desired value

  • key – parameter key

Returns

value stored in evaluation_parameter_store for the provided run_id and key

write_resource(resource, resource_name, resource_store, resource_namespace=None, data_asset_name=None, expectation_suite_name=None, run_id=None)

Writes the bytes in “resource” according to the resource_store’s writing method, with a name constructed as follows:

resource_namespace/run_id/data_asset_name/expectation_suite_name/resource_name

If any of those components is None, it is omitted from the namespace.

Parameters
  • resource

  • resource_name

  • resource_store

  • resource_namespace

  • data_asset_name

  • expectation_suite_name

  • run_id

Returns

A dictionary describing how to locate the resource (specific to resource_store type)

list_validation_results(validations_store=None)
Returns

A dictionary describing validation results in the following format:

{
  "run_id":
    "datasource": {
        "generator": {
            "generator_asset": [expectation_suite_1, expectation_suite_1, ...]
        }
    }
}

get_validation_result(data_asset_name, expectation_suite_name='default', run_id=None, validations_store=None, failed_only=False)

Get validation results from a configured store.

Parameters
  • data_asset_name – name of data asset for which to get validation result

  • expectation_suite_name – expectation_suite name for which to get validation result (default: “default”)

  • run_id – run_id for which to get validation result (if None, fetch the latest result by alphanumeric sort)

  • validations_store – the store from which to get validation results

  • failed_only – if True, filter the result to return only failed expectations

Returns

validation_result

update_return_obj(data_asset, return_obj)

Helper called by data_asset.

Parameters
  • data_asset – The data_asset whose validation produced the current return object

  • return_obj – the return object to update

Returns

the return object, potentially changed into a widget by the configured expectation explorer

Return type

return_obj

build_data_documentation(site_names=None, data_asset_name=None)

TODO!!!!

Returns

A dictionary with the names of the updated data documentation sites as keys and the the location info of their index.html files as values

get_absolute_path(path)
profile_datasource(datasource_name, generator_name=None, data_assets=None, max_data_assets=20, profile_all_data_assets=True, profiler=<class 'great_expectations.profile.basic_dataset_profiler.BasicDatasetProfiler'>, dry_run=False, additional_batch_kwargs=None)

Profile the named datasource using the named profiler.

Parameters
  • datasource_name – the name of the datasource for which to profile data_assets

  • generator_name – the name of the generator to use to get batches

  • data_assets – list of data asset names to profile

  • max_data_assets – if the number of data assets the generator yields is greater than this max_data_assets, profile_all_data_assets=True is required to profile all

  • profile_all_data_assets – when True, all data assets are profiled, regardless of their number

  • profiler – the profiler class to use

  • dry_run – when true, the method checks arguments and reports if can profile or specifies the arguments that are missing

  • additional_batch_kwargs – Additional keyword arguments to be provided to get_batch when loading the data asset.

Returns

A dictionary:

{
    "success": True/False,
    "results": List of (expectation_suite, EVR) tuples for each of the data_assets found in the datasource
}

When success = False, the error details are under “error” key

great_expectations.data_context.util.build_slack_notification_request(validation_json=None)
great_expectations.data_context.util.get_slack_callback(webhook)
great_expectations.data_context.util.safe_mmkdir(directory, exist_ok=True)

Simple wrapper since exist_ok is not available in python 2