Skip to main content
Version: 0.18.21

Limit Validation Results in Data Docs

As you use Great Expectations (GX), the number of Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite. that are generated, stored, and rendered for your Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. also grows. This increasing Data Doc accumulation can result in a degradation of performance and an increased use of computational resources when you update or render them.

Running numerous validations and the size of your data volume both contribute to Data Doc accumulation and degraded Checkpoint performance. Whenever you run a Checkpoint, the default GX behavior is to re-render all Data Docs within a deployment, regardless of the Data Docs’ association with the run in progress.

Use one of the following options to improve performance and reduce the likelihood that Data Doc accumulation will affect GX performance.

The UpdateDataDocs Checkpoint Action

The UpdateDataDocs Checkpoint Action renders new Validations only for the Checkpoints containing the Action. GX recommends using the UpdateDataDocs Checkpoint Action in the following circumstances:

  • You're experiencing a performance degradation that is caused by too many Validations.

  • The Validations running in your Checkpoints are smaller than the other Validations in your environment in terms of the Expectation or data volumes.

An important consideration is that the UpdateDataDocs Checkpoint Action is limited to the active Checkpoint and changes made to GX or to your local environment might not be captured in other Data Docs. Also, the UpdateDataDocs Checkpoint Action might be unsuitable if you're displaying Data Docs live.

The validation_results_limit option

When the UpdateDataDocs Checkpoint Action is not suitable, you can use the validation_results_limit option to specify the number of historical Data Docs that GX retains.

The validation_results_limit is an option for the site_index_builder parameter that is part of the larger data_docs_sites settings in your great_expectations.yml file.

The following example limits the Validation Results on a local Data Docs site to the five most recent:

YAML
data_docs_sites:
local_site:
class_name: SiteBuilder
show_how_to_buttons: true
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
validation_results_limit: 5

When you use the validation_results_limit option, Validation Results from previous Checkpoints are only rendered and indexed to the defined limit. If your GX performance issue is due to a historical accumulation of Data Docs, using validation_results_limit can help improve performance without sacrificing Data Doc creation in your environment.

The validation_results_limit option doesn’t limit the number of HTML documents contained in your Data Docs site. If HTML documents other than Validation Results are contributing to performance degradation, the validation_results_limit option won't help.