Data Docs compile Great Expectations objects such as Expectations and Validations into structured, formatted documents. In these documents, they attempt to capture the key characteristics of a dataset.
One example of Data Docs is HTML documentation, which takes expectation suites and validation results and produces clear, functional, and self-updating documentation of expected and observed data characteristics. Together with profiling, it can help to rapidly create a clearer picture of your data, and keep your entire team on the same page as data evolves.
For example, the default BasicDatasetProfiler in GE will produce validation_results which compile to a page for each table or DataFrame including an overview section:
And then detailed statistics for each column:
The Great Expectations DataContext uses a configurable “data documentation site” to define which artifacts to compile and how to render them as documentation. Multiple sites can be configured inside a project, each suitable for a particular data documentation use case.
For example, we have identified three common use cases for using documentation in a data project. They are to:
1. Visualize all Great Expectations artifacts from the local repository of a project as HTML: expectation suites, validation results and profiling results.
2. Maintain a “shared source of truth” for a team working on a data project. Such documentation renders all the artifacts committed in the source control system (expectation suites and profiling results) and a continuously updating data quality report, built from a chronological list of validations by run id.
3. Share a spec of a dataset with a client or a partner. This is similar to API documentation in software development. This documentation would include profiling results of the dataset to give the reader a quick way to grasp what the data looks like, and one or more expectation suites that encode what is expected from the data to be considered valid.
To support these (and possibly other) use cases Great Expectations has a concept of “data documentation site”. Multiple sites can be configured inside a project, each suitable for a particular data documentation use case.
Here is an example of a site:
The behavior of a site is controlled by configuration in the DataContext’s great_expectations.yml file.
Users can specify
which datasources to document (by default, all)
whether to include expectations, validations and profiling results sections
where the expectations and validations should be read from (filesystem, S3, or GCS)
where the HTML files should be written (filesystem, S3, or GCS)
which renderer and view class should be used to render each section