Limit Validation Results in Data Docs
As you use Great Expectations (GX), the number of Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite. that are generated, stored, and rendered for your Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. also grows. This increasing Data Doc accumulation can result in a degradation of performance and an increased use of computational resources when you update or render them.
Running numerous validations and the size of your data volume both contribute to Data Doc accumulation and degraded Checkpoint performance. Whenever you run a Checkpoint, the default GX behavior is to re-render all Data Docs within a deployment, regardless of the Data Docs’ association with the run in progress.
Use one of the following options to improve performance and reduce the likelihood that Data Doc accumulation will affect GX performance.
The UpdateDataDocs
Checkpoint Action
The UpdateDataDocs
Checkpoint Action renders new Validations only for the Checkpoints containing the Action. GX recommends using the UpdateDataDocs
Checkpoint Action in the following circumstances:
-
You're experiencing a performance degradation that is caused by too many Validations.
-
The Validations running in your Checkpoints are smaller than the other Validations in your environment in terms of the Expectation or data volumes.
An important consideration is that the UpdateDataDocs
Checkpoint Action is limited to the active Checkpoint and changes made to GX or to your local environment might not be captured in other Data Docs. Also, the UpdateDataDocs
Checkpoint Action might be unsuitable if you're displaying Data Docs live.
The validation_results_limit
option
When the UpdateDataDocs
Checkpoint Action is not suitable, you can use the validation_results_limit
option to specify the number of historical Data Docs that GX retains.
The validation_results_limit
is an option for the site_index_builder
parameter that is part of the larger data_docs_sites
settings in your great_expectations.yml
file.
The following example limits the Validation Results on a local Data Docs site to the five most recent:
data_docs_sites:
local_site:
class_name: SiteBuilder
show_how_to_buttons: true
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
validation_results_limit: 5
When you use the validation_results_limit
option, Validation Results from previous Checkpoints are only rendered and indexed to the defined limit. If your GX performance issue is due to a historical accumulation of Data Docs, using validation_results_limit
can help improve performance without sacrificing Data Doc creation in your environment.
The validation_results_limit
option doesn’t limit the number of HTML documents contained in your Data Docs site. If HTML documents other than Validation Results are contributing to performance degradation, the validation_results_limit
option won't help.