Data Docs
Data Docs translate ExpectationsA verifiable assertion about data., Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite., and other metadata into human-readable documentation. Automatically compiling your data documentation from your data tests in the form of Data Docs keeps your documentation current.
Relationship to other objects
Data Docs can be used to view Expectation SuitesA collection of verifiable assertions about data. and Validation Results. With a customized RendererA method for converting Expectations, Validation Results, etc. into Data Docs or other output such as email notifications or slack messages., you can extend what they display and how. You can issue a command to update your Data Docs from your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Alternatively, you can include the UpdateDataDocsAction
ActionA Python class with a run method that takes a Validation Result and does something with it in a Checkpoint'sThe primary means for validating data in a production deployment of Great Expectations. action_list
to trigger an update of your Data Docs with the Validation Results that were generated by that Checkpoint being run.
Use cases
You can configure multiple Data Docs sites while setting up your Great Expectations project. This allows you to tailor the information that is displayed by Data Docs as well as how they are hosted. To host and share your Data Docs, see Host and share Data Docs.
You can view your saved Expectation Suites in Data Docs.
Saved Validation Results will be displayed in any Data Docs site that is configured to show them. If you build your Data Docs from the Data Context, the process will render Data Docs for all of your Validation Results. Alternatively, you can use the UpdateDataDocsAction
Action in a Checkpoint's action_list
to update your Data Docs with just the Validation Results generated by that checkpoint.
Versatility
Multiple sites can be configured inside a project, each suitable for a particular data documentation use case.
- Visualize all Great Expectations artifacts from the local repository of a project as HTML: Expectation Suites, Validation Results and profiling results.
- Maintain a "shared source of truth" for a team working on a data project. Such documentation renders all the artifacts committed in the source control system (Expectation Suites and profiling results) and a continuously updating data quality report, built from a chronological list of validations by run id.
- Share a spec of a dataset with a client or a partner. This is similar to API documentation in software development. This documentation would include profiling results of the dataset to give the reader a quick way to grasp what the data looks like, and one or more Expectation Suites that encode what is expected from the data to be considered valid.
Access
Data Docs are rendered as HTML files. As such, you can open them with any browser.
Create
If your Data Docs have not yet been rendered, you can create them from your Data Context.
From the root folder of your project (where you initialized your Data Context), you can build your Data Docs with the CLI command:
great_expectations docs build
Alternatively, you can use your Data Context to build your Data Docs in python with the command:
import great_expectations as gx
context = gx.get_context()
context.build_data_docs()
Configure
Data Docs sites are configured under the data_docs_sites
key in your deployment's great_expectations.yml
file. Users can specify:
- which Data SourcesProvides a standard API for accessing and interacting with data from a wide variety of source systems. to document (by default, all)
- whether to include Expectations, validations and profiling results sections
- where the Expectations and validations should be read from (filesystem, S3, Azure, or GCS)
- where the HTML files should be written (filesystem, S3, Azure, or GCS)
- which RendererA method for converting Expectations, Validation Results, etc. into Data Docs or other output such as email notifications or slack messages. and view class should be used to render each section
For more information, see Host and share Data Docs.