Configure Data Docs
Data Docs translate Expectations, Validation Results, and other metadata into human-readable documentation that is saved as static web pages. Automatically compiling your data documentation from your data tests in the form of Data Docs keeps your documentation current. This guide covers how to configure Data Docs.
Prerequisites:
- Python version 3.9 to 3.12.
- An installation of GX Core.
- A preconfigured File Data Context. This guide assumes the variable
context
contains your Data Context.
Procedure
- Instructions
- Sample code
-
Define a configuration dictionary for your new Data Docs site.
GX writes Data Doc sites to a directory specified by the
base_directory
key of the configuration dictionary. Configuring other keys of the dictionary is not supported, and they may be removed in a future release.A local or networked filesystem Data Doc site requires the following
store_backend
information:base_directory
: A path to the folder where the static sites should be created. This can be an absolute path, or a path relative to the root folder of the Data Context.class_name
: This value must beTupleFilesystemStoreBackend
, and is not user-configurable.
To define a Data Docs site configuration for a local or networked filesystem environment, update the value of
base_directory
in the following code and execute it:Pythonbase_directory = "uncommitted/data_docs/local_site/" # this is the default path (relative to the root folder of the Data Context) but can be changed as required
site_config = {
"class_name": "SiteBuilder",
"site_index_builder": {"class_name": "DefaultSiteIndexBuilder"},
"store_backend": {
"class_name": "TupleFilesystemStoreBackend",
"base_directory": base_directory,
},
} -
Add your configuration to your Data Context.
All Data Docs sites have a unique name within a Data Context. Once your Data Docs site configuration has been defined, add it to the Data Context by updating the value of
site_name
in the following to something more descriptive and then execute the code::Pythonsite_name = "my_data_docs_site"
context.add_data_docs_site(site_name=site_name, site_config=site_config) -
Optional. Build your Data Docs sites manually.
You can manually build a Data Docs site by executing the following code:
Pythoncontext.build_data_docs(site_names=site_name)
-
Optional. Automate Data Docs site updates with Checkpoint Actions.
You can automate the creation and update of Data Docs sites by including the
UpdateDataDocsAction
in your Checkpoints. This Action will automatically trigger a Data Docs site build whenever the Checkpoint it is included in completes itsrun()
method.Pythoncheckpoint_name = "my_checkpoint"
validation_definition_name = "my_validation_definition"
validation_definition = context.validation_definitions.get(validation_definition_name)
actions = [
gx.checkpoint.actions.UpdateDataDocsAction(
name="update_my_site", site_names=[site_name]
)
]
checkpoint = context.checkpoints.add(
gx.Checkpoint(
name=checkpoint_name,
validation_definitions=[validation_definition],
actions=actions,
)
)
result = checkpoint.run() -
Optional. View your Data Docs.
Once your Data Docs have been created, you can view them with:
Pythoncontext.open_data_docs()
import great_expectations as gx
context = gx.get_context(mode="file")
# Define a Data Docs site configuration dictionary
base_directory = "uncommitted/data_docs/local_site/" # this is the default path (relative to the root folder of the Data Context) but can be changed as required
site_config = {
"class_name": "SiteBuilder",
"site_index_builder": {"class_name": "DefaultSiteIndexBuilder"},
"store_backend": {
"class_name": "TupleFilesystemStoreBackend",
"base_directory": base_directory,
},
}
# Add the Data Docs configuration to the Data Context
site_name = "my_data_docs_site"
context.add_data_docs_site(site_name=site_name, site_config=site_config)
# Manually build the Data Docs
context.build_data_docs(site_names=site_name)
# Automate Data Docs updates with a Checkpoint Action
checkpoint_name = "my_checkpoint"
validation_definition_name = "my_validation_definition"
validation_definition = context.validation_definitions.get(validation_definition_name)
actions = [
gx.checkpoint.actions.UpdateDataDocsAction(
name="update_my_site", site_names=[site_name]
)
]
checkpoint = context.checkpoints.add(
gx.Checkpoint(
name=checkpoint_name,
validation_definitions=[validation_definition],
actions=actions,
)
)
result = checkpoint.run()
# View the Data Docs
context.open_data_docs()