How to instantiate a Data Context without a yml file¶
This guide will help you instantiate a Data Context without a yml file, aka configure a Data Context in code. If you are working in an environment without easy access to a local filesystem (e.g. AWS Spark EMR, Databricks, etc.) you may wish to configure your Data Context in code, within your notebook or workflow tool (e.g. Airflow DAG node).
Prerequisites: This how-to guide assumes you have already:
Note
See also our companion video for this guide: Data Contexts In Code.
Steps¶
Create a DataContextConfig
The DataContextConfig holds all of the associated configuration parameters to build a DataContext. There are defaults set for you to minimize configuration in typical cases, but please note that every parameter is configurable and all defaults are overridable. Also note that DatasourceConfig also has defaults which can be overridden.
Here we will show a few examples of common configurations, using the
store_backend_defaults
parameter. Note that you can continue with the existing API sans defaults by omitting this parameter, and you can override all of the parameters as shown in the last example. Note that a parameter set inDataContextConfig
will override a parameter set instore_backend_defaults
if both are used.- The following
store_backend_defaults
are currently available:
The following example shows a Data Context configuration with an SQLAlchemy datasource and an AWS s3 bucket for all metadata stores, using default prefixes. Note that you can still substitute environment variables as in the YAML based configuration to keep sensitive credentials out of your code.
from great_expectations.data_context.types.base import DataContextConfig, DatasourceConfig from great_expectations.data_context import BaseDataContext data_context_config = DataContextConfig( datasources={ "my_sqlalchemy_datasource": DatasourceConfig( class_name="SqlAlchemyDatasource", credentials={ "drivername": "custom_drivername", "host": "custom_host", "port": "custom_port", "username": "${USERNAME_FROM_ENVIRONMENT_VARIABLE}", "password": "${PASSWORD_FROM_ENVIRONMENT_VARIABLE}", "database": "custom_database", }, ) }, store_backend_defaults=S3StoreBackendDefaults(default_bucket_name="my_default_bucket"), )
The following example shows a Data Context configuration with a Pandas datasource and local filesystem defaults for metadata stores. Note: imports are omitted in the following examples. Note: You may add an optional root_directory parameter to set the base location for the Store Backends.
data_context_config = DataContextConfig( datasources={ "my_pandas_datasource": DatasourceConfig( class_name="PandasDatasource", batch_kwargs_generators={ "subdir_reader": { "class_name": "SubdirReaderBatchKwargsGenerator", "base_directory": "/path/to/data", } }, ) }, store_backend_defaults=FilesystemStoreBackendDefaults(root_directory="optional/absolute/path/for/stores"), )
The following example shows a Data Context configuration with an SQLAlchemy datasource and two GCS buckets for metadata stores, using some custom and some default prefixes. Note that you can still substitute environment variables as in the YAML based configuration to keep sensitive credentials out of your code.
default_bucket_name
,default_project_name
sets the default value for all stores that are not specified individually.The resulting DataContextConfig from the following example creates an Expectations store and Data Docs using the
my_default_bucket
andmy_default_project
parameters since their bucket and project is not specified explicitly. The validations store is created using the explicitly specifiedmy_validations_bucket
andmy_validations_project
. Further, the prefixes are set for the Expectations store and validations store, while data docs use the defaultdata_docs
prefix.data_context_config = DataContextConfig( datasources={ "my_sqlalchemy_datasource": DatasourceConfig( class_name="SqlAlchemyDatasource", credentials={ "drivername": "custom_drivername", "host": "custom_host", "port": "custom_port", "username": "${USERNAME_FROM_ENVIRONMENT_VARIABLE}", "password": "${PASSWORD_FROM_ENVIRONMENT_VARIABLE}", "database": "custom_database", }, ) }, store_backend_defaults=GCSStoreBackendDefaults( default_bucket_name="my_default_bucket", default_project_name="my_default_project", validations_store_bucket_name="my_validations_bucket", validations_store_project_name="my_validations_project", validations_store_prefix="my_validations_store_prefix", expectations_store_prefix="my_expectations_store_prefix", ), )
The following example sets overrides for many of the parameters available to you when creating a DataContextConfig and a Datasource
project_config = DataContextConfig( config_version=2, plugins_directory=None, config_variables_file_path=None, datasources={ "my_spark_datasource": { "data_asset_type": { "class_name": "SparkDFDataset", "module_name": "great_expectations.dataset", }, "class_name": "SparkDFDatasource", "module_name": "great_expectations.datasource", "batch_kwargs_generators": {}, } }, stores={ "expectations_S3_store": { "class_name": "ExpectationsStore", "store_backend": { "class_name": "TupleS3StoreBackend", "bucket": "my_expectations_store_bucket", "prefix": "my_expectations_store_prefix", }, }, "validations_S3_store": { "class_name": "ValidationsStore", "store_backend": { "class_name": "TupleS3StoreBackend", "bucket": "my_validations_store_bucket", "prefix": "my_validations_store_prefix", }, }, "evaluation_parameter_store": {"class_name": "EvaluationParameterStore"}, }, expectations_store_name="expectations_S3_store", validations_store_name="validations_S3_store", evaluation_parameter_store_name="evaluation_parameter_store", data_docs_sites={ "s3_site": { "class_name": "SiteBuilder", "store_backend": { "class_name": "TupleS3StoreBackend", "bucket": "my_data_docs_bucket", "prefix": "my_optional_data_docs_prefix", }, "site_index_builder": { "class_name": "DefaultSiteIndexBuilder", "show_cta_footer": True, }, } }, validation_operators={ "action_list_operator": { "class_name": "ActionListValidationOperator", "action_list": [ { "name": "store_validation_result", "action": {"class_name": "StoreValidationResultAction"}, }, { "name": "store_evaluation_params", "action": {"class_name": "StoreEvaluationParametersAction"}, }, { "name": "update_data_docs", "action": {"class_name": "UpdateDataDocsAction"}, }, ], } }, anonymous_usage_statistics={ "enabled": True } )
- The following
Pass this DataContextConfig as a project_config to BaseDataContext
context = BaseDataContext(project_config=data_context_config)
Use this BaseDataContext instance as your DataContext
If you are using Airflow, you may wish to pass this Data Context to your GreatExpectationsOperator as a parameter. See the following guide for more details: