Data Docs Reference

Data Docs make it simple to visualize data quality in your project. These include Expectations, Validations & Profiles. They are built for all Datasources from JSON artifacts in the local repo including validations & profiles from the uncommitted directory.

Users have full control over configuring Data Documentation for their project - they can modify the pre-configured site (or remove it altogether) and add new sites with a configuration that meets the project’s needs. The easiest way to add a new site to the configuration is to copy the “local_site” configuration block in great_expectations.yml, give the copy a new name and modify the details as needed.

Data Docs Site Configuration

The default Data Docs site configuration looks like this:

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/

Here is an example of a site configuration from great_expectations.yml with defaults defined explicitly:

data_docs_sites:
  local_site: # site name
    module_name: great_expectations.render.renderer.site_builder
    class_name: SiteBuilder
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder
    site_section_builders:
      expectations: # if not present, expectation suites are not rendered
        class_name: DefaultSiteSectionBuilder
        source_store_name: expectations_store
        renderer:
          module_name: great_expectations.render.renderer
          class_name: ExpectationSuitePageRenderer

      validations: # if not present, validation results are not rendered
        class_name: DefaultSiteSectionBuilder
        source_store_name: validations_store
        run_id_filter:
          ne: profiling # exclude validations with run id "profiling" - reserved for profiling results
        renderer:
          module_name: great_expectations.render.renderer
          class_name: ValidationResultsPageRenderer

      profiling: # if not present, profiling results are not rendered
        class_name: DefaultSiteSectionBuilder
        source_store_name: validations_store
        run_id_filter:
          eq: profiling
        renderer:
          module_name: great_expectations.render.renderer
          class_name: ProfilingResultsPageRenderer

validations_store in the example above specifies the name of a store configured in the stores section. Validation and profiling results from that store will be included in the documentation. The optional run_id_filter attribute allows to include (eq for exact match) or exclude (ne) validation results with a particular run id.

Limiting Validation Results

If you would like to limit rendered Validation Results to the n most-recent, you may do so by setting the validation_results_limit key in your Data Docs configuration:

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder
      show_cta_footer: true
      validation_results_limit: 5

Automatically Publishing Data Docs

It is possible to directly publish (continuously updating!) data docs sites to a shared location such as a static site hosted in S3 by simply updating the store_backend configuration in the site configuration. If we modify the configuration in the example above to adjust the store backend to an S3 bucket of our choosing, our SiteBuilder will automatically save the resulting site to that bucket.

store_backend:
  class_name: TupleS3StoreBackend
  bucket: data-docs.my_org.org
  prefix:

See the tutorial on publishing data docs to S3 for more information.

More advanced configuration

It is possible to extend renderers and views and customize the particular class used to render any of the objects in your documentation. In this more advanced configuration, a “CustomTableContentBlockRenderer” is used only for the validations renderer, and no profiling results are rendered at all.

data_docs_sites:
  # Data Docs make it simple to visualize data quality in your project. These
  # include Expectations, Validations & Profiles. The are built for all
  # Datasources from JSON artifacts in the local repo including validations &
  # profiles from the uncommitted directory. Read more at
  # https://docs.greatexpectations.io/en/latestfeatures/data_docs.html
  local_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_section_builders:
      expectations:
        class_name: DefaultSiteSectionBuilder
        source_store_name: expectations_store
        renderer:
          module_name: great_expectations.render.renderer
          class_name: ExpectationSuitePageRenderer

      validations:
        class_name: DefaultSiteSectionBuilder
        source_store_name: validations_store
        run_id_filter:
          ne: profiling
        renderer:
          module_name: great_expectations.render.renderer
          class_name: ValidationResultsPageRenderer
          column_section_renderer:
            class_name: ValidationResultsColumnSectionRenderer
            table_renderer:
              module_name: custom_renderers.custom_table_content_block
              class_name: CustomTableContentBlockRenderer

To support that custom renderer, we need to ensure the implementation is available in our plugins/ directory. Note that we can use a subdirectory and standard python submodule notation, but that we need to include an __init__.py file in our custom_renderers package.

plugins/
├── custom_renderers
│   ├── __init__.py
│   └── custom_table_content_block.py
└── additional_ge_plugin.py

Building Data Docs

Using the CLI

The great_expectations CLI can build comprehensive Data Docs from expectation suites available to the configured context and validations available in the great_expectations/uncommitted directory.

great_expectations docs build

When called without additional arguments, this command will render all the Data Docs sites specified in great_expectations.yml configuration file into HTML and open them in a web browser.

The command will print out the locations of index.html file for each site.

To disable the web browser opening behavior use –no-view option.

To render just one site, use –site-name SITE_NAME option.

Here is when the docs build command should be called:

  • when you want to fully rebuild a Data Docs site

  • after a new expectation suite is added or an existing one is edited

  • after new data is profiled (only if you declined the prompt to build data docs when running the profiling command)

When a new validation result is generated after running a Validation Operator, the Data Docs sites will add this result automatically if the operator has the UpdateDataDocsAction action configured (read Actions).

Using the raw API

The underlying python API for rendering documentation is still new and evolving. Use the following snippet as a guide for how to profile a single batch of data and build documentation from the validation_result.

import os
import great_expectations as ge

from great_expectations.profile.basic_dataset_profiler import BasicDatasetProfiler
from great_expectations.render.renderer import ProfilingResultsPageRenderer, ExpectationSuitePageRenderer
from great_expectations.data_context.util import safe_mmkdir
from great_expectations.render.view import DefaultJinjaPageView

profiling_html_filepath = '/path/into/which/to/save/results.html'

# obtain the DataContext object
context = ge.data_context.DataContext()

# load a batch to profile
context.create_expectation_suite('default')
batch = context.get_batch(
  batch_kwargs=context.build_batch_kwargs("my_datasource", "my_batch_kwargs_generator", "my_asset")
  expectation_suite_name='default',
)

# run the profiler on the batch - this returns an expectation suite and validation results for this suite
expectation_suite, validation_result = BasicDatasetProfiler().profile(batch)

# use a renderer to produce a document model from the validation results
document_model = ProfilingResultsPageRenderer().render(validation_result)

# use a view to render the document model (produced by the renderer) into a HTML document
safe_mmkdir(os.path.dirname(profiling_html_filepath))
with open(profiling_html_filepath, 'w') as writer:
    writer.write(DefaultJinjaPageView().render(document_model))

Customizing Data Docs

Introduction

Data Docs uses the Jinja template engine to generate HTML pages. The built-in Jinja templates used to compile Data Docs pages are implemented in the great_expectations.render.view module and are tied to View classes. Views determine how page content is displayed. The content data that Views specify and consume is generated by Renderer classes and are implemented in the great_expectations.render.renderer module. Renderers take Great Expectations objects as input and return typed dictionaries - Views take these dictionaries as input and output rendered HTML.

Built-In Views and Renderers

Out of the box, Data Docs supports two top-level Views (i.e. pages), great_expectations.render.view.DefaultJinjaIndexPageView, for a site index page, and great_expectations.render.view.DefaultJinjaPageView for all other pages. Pages are broken into sections - great_expectations.render.view.DefaultJinjaSectionView - which are composed of UI components - great_expectations.render.view.DefaultJinjaComponentView. Each of these Views references a single base Jinja template, which can incorporate any number of other templates through inheritance.

Data Docs comes with the following built-in site page Renderers:

  • great_expectations.render.renderer.SiteIndexPageRenderer (index page)

  • great_expectations.render.renderer.ProfilingResultsPageRenderer (Profiling Results pages)

  • great_expectations.render.renderer.ExpectationSuitePageRenderer (Expectation Suite pages)

  • great_expectations.render.renderer.ValidationResultsPageRenderer (Validation Results pages)

Analogous to the base View templates referenced above, these Renderers can be thought of as base Renderers for the primary Data Docs pages, and may call on many other ancillary Renderers.

It is possible to extend Renderers and Views and customize the particular class used to render any of the objects in your documentation.

Other Tools

In addition to Jinja, Data Docs draws on the following libraries to compile HTML documentation, which you can use to further customize HTML documentation:

  • Bootstrap

  • Font Awesome

  • jQuery

  • Altair

Getting Started

Making Minor Adjustments

Many of the HTML elements in the default Data Docs pages have pre-configured classes that you may use to make minor adjustments using your own custom CSS. By default, when you run great_expectations init, Great Expectations creates a scaffold within the plugins directory for customizing Data Docs. Within this scaffold is a file called data_docs_custom_styles.css - this CSS file contains all the pre-configured classes you may use to customize the look and feel of the default Data Docs pages. All custom CSS, applied to these pre-configured classes or any other HTML elements, must be placed in this file.

Scaffolded directory tree:

plugins
└── custom_data_docs
    ├── renderers
    ├── styles
    │   └── data_docs_custom_styles.css
    └── views

Using Custom Views and Renderers

Suppose you start a new Great Expectations project by running great_expectations init and compile your first Data Docs site. After looking over the local site, you decide you want to implement the following changes:

  1. A completely new Expectation Suite page, requiring a new View and Renderer

  2. A smaller modification to the default Validation page, swapping out a child renderer for a custom version

  3. Remove Profiling Results pages from the documentation

To make these changes, you must first implement the custom View and Renderers and ensure they are available in the plugins directory specified in your project configuration (plugins/ by default). Note that you can use a subdirectory and standard python submodule notation, but you must include an __init__.py file in your custom package. By default, when you run great_expectations init, Great Expectations creates placeholder directories for your custom views, renderers, and CSS stylesheets within the plugins directory. If you wish, you may save your custom views and renderers in an alternate location however, any CSS stylesheets must be saved to plugins/custom_data_docs/styles.

Scaffolded directory tree:

plugins
└── custom_data_docs
    ├── renderers
    ├── styles
    │   └── data_docs_custom_styles.css
    └── views

When you are done with your implementations, your plugins/ directory has the following structure:

plugins
└── custom_data_docs
    ├── renderers
        ├── __init__.py
        ├── custom_table_renderer.py
        └── custom_expectation_suite_page_renderer.py
    ├── styles
    │   └── data_docs_custom_styles.css
    └── views
        ├── __init__.py
        └── custom_expectation_suite_view.py

For Data Docs to use your custom Views and Renderers when compiling your local Data Docs site, you must specify where to use them in the data_docs_sites section of your project configuration.

Before modifying your project configuration, the relevant section looks like this:

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/

This is what it looks like after your changes are added:

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_section_builders:
      expectations:
        class_name: DefaultSiteSectionBuilder
        source_store_name: expectations_store
        renderer:
          module_name: custom_data_docs.renderers.custom_expectation_suite_page_renderer
          class_name: CustomExpectationSuitePageRenderer
        view:
          module_name: custom_data_docs.views.custom_expectation_suite_view
          class_name: CustomExpectationSuiteView
      validations:
        class_name: DefaultSiteSectionBuilder
        source_store_name: validations_store
        run_id_filter:
          ne: profiling
        renderer:
          module_name: great_expectations.render.renderer
          class_name: ValidationResultsPageRenderer
          column_section_renderer:
            class_name: ValidationResultsColumnSectionRenderer
            table_renderer:
              module_name: custom_data_docs.renderers.custom_table_renderer
              class_name: CustomTableRenderer

Note that if your data_docs_sites configuration contains a site_section_builders key, you must now explicitly provide defaults for anything you would like rendered. By omitting the profiling key within site_section_builders, your third goal is achieved and Data Docs will no longer render Profiling Results pages.

Lastly, to compile your newly-customized Data Docs local site, you run great_expectations docs build from the command line.

site_section_builders defaults:

site_section_builders:
  expectations: # if not present, expectation suites are not rendered
    class_name: DefaultSiteSectionBuilder
    source_store_name: expectations_store
    renderer:
      module_name: great_expectations.render.renderer
      class_name: ExpectationSuitePageRenderer

  validations: # if not present, validation results are not rendered
    class_name: DefaultSiteSectionBuilder
    source_store_name: validations_store
    run_id_filter:
      ne: profiling # exclude validations with run id "profiling" - reserved for profiling results
    renderer:
      module_name: great_expectations.render.renderer
      class_name: ValidationResultsPageRenderer

  profiling: # if not present, profiling results are not rendered
    class_name: DefaultSiteSectionBuilder
    source_store_name: validations_store
    run_id_filter:
      eq: profiling
    renderer:
      module_name: great_expectations.render.renderer
      class_name: ProfilingResultsPageRenderer

Re-Purposing Built-In Views

If you would like to re-purpose a built-in View, you may do so by implementing a custom renderer that outputs an appropriately typed and structured dictionary for that View.

Built-in Views and corresponding input types:

  • great_expectations.render.view.DefaultJinjaPageView: great_expectations.render.types.RenderedDocumentContent

  • great_expectations.render.view.DefaultJinjaSectionView: great_expectations.render.types.RenderedSectionContent

  • great_expectations.render.view.DefaultJinjaComponentView: great_expectations.render.types.RenderedComponentContent

An example of a custom page Renderer, using all built-in UI elements is provided below.

Custom Page Renderer Example

import altair as alt
import pandas as pd

from great_expectations.render.renderer.renderer import Renderer
from great_expectations.render.types import (
    RenderedDocumentContent,
    RenderedSectionContent,
    RenderedComponentContent,
    RenderedHeaderContent, RenderedBulletListContent, RenderedTableContent, RenderedStringTemplateContent,
    RenderedGraphContent, ValueListContent)


class CustomPageRenderer(Renderer):
    @classmethod
    def _get_header_content_block(cls, header="", subheader="", highlight=True):
        return RenderedHeaderContent(**{
            "content_block_type": "header",
            "header": RenderedStringTemplateContent(**{
                "content_block_type": "string_template",
                "string_template": {
                    "template": header,
                }
            }),
            "subheader": subheader,
            "styling": {
                "classes": ["col-12"],
                "header": {
                    "classes": ["alert", "alert-secondary"] if highlight else []
                }
            }
        })
    
    @classmethod
    def _get_bullet_list_content_block(cls, header="", subheader="", col=12):
        return RenderedBulletListContent(**{
            "content_block_type": "bullet_list",
            "header": header,
            "subheader": subheader,
            "bullet_list": [
                "Aenean porttitor turpis.",
                "Curabitur ligula urna.",
                cls._get_header_content_block(header="nested header content block", subheader="subheader",
                                              highlight=False)
            ],
            "styling": {
                "classes": ["col-{}".format(col)],
                "styles": {
                    "margin-top": "20px"
                },
            },
        })
    
    @classmethod
    def _get_table_content_block(cls, header="", subheader="", col=12):
        return RenderedTableContent(**{
            "content_block_type": "table",
            "header": header,
            "subheader": subheader,
            "table": [
                ["", "column_1", "column_2"],
                [
                    "row_1",
                    cls._get_bullet_list_content_block(subheader="Nested Bullet List Content Block"),
                    "buffalo"
                ],
                ["row_2", "crayon", "derby"],
            ],
            "styling": {
                "classes": ["col-{}".format(col), "table-responsive"],
                "styles": {
                    "margin-top": "20px"
                },
                "body": {
                    "classes": ["table", "table-sm"]
                }
            },
        })
    
    @classmethod
    def _get_graph_content_block(cls, header="", subheader="", col=12):
        df = pd.DataFrame({
            "value": [1, 2, 3, 4, 5, 6],
            "count": [123, 232, 543, 234, 332, 888]
        })
        bars = alt.Chart(df).mark_bar(size=20).encode(
            y='count:Q',
            x="value:O"
        ).properties(height=200, width=200, autosize="fit")
        chart = bars.to_json()
        
        return RenderedGraphContent(**{
            "content_block_type": "graph",
            "header": header,
            "subheader": subheader,
            "graph": chart,
            "styling": {
                "classes": ["col-{}".format(col)],
                "styles": {
                    "margin-top": "20px"
                },
            },
        })
    
    @classmethod
    def _get_tooltip_string_template_content_block(cls):
        return RenderedStringTemplateContent(**{
            "content_block_type": "string_template",
            "string_template": {
                "template": "This is a string template with tooltip, using a top-level custom tag.",
                "tag": "code",
                "tooltip": {
                    "content": "This is the tooltip content."
                }
            },
            "styling": {
                "classes": ["col-12"],
                "styles": {
                    "margin-top": "20px"
                },
            },
        })
    
    @classmethod
    def _get_string_template_content_block(cls):
        return RenderedStringTemplateContent(**{
            "content_block_type": "string_template",
            "string_template": {
                "template": "$icon This is a Font Awesome Icon, using a param-level custom tag\n$red_text\n$bold_serif",
                "params": {
                    "icon": "",
                    "red_text": "And this is red text!",
                    "bold_serif": "And this is big, bold serif text using style attribute..."
                },
                "styling": {
                    "params": {
                        "icon": {
                            "classes": ["fas", "fa-check-circle", "text-success"],
                            "tag": "i"
                        },
                        "red_text": {
                            "classes": ["text-danger"]
                        },
                        "bold_serif": {
                            "styles": {
                                "font-size": "22px",
                                "font-weight": "bold",
                                "font-family": "serif"
                            }
                        }
                    }
                }
            },
            "styling": {
                "classes": ["col-12"],
                "styles": {
                    "margin-top": "20px"
                },
            },
        })
    
    @classmethod
    def _get_value_list_content_block(cls, header="", subheader="", col=12):
        return ValueListContent(**{
            "content_block_type": "value_list",
            "header": header,
            "subheader": subheader,
            "value_list": [{
                "content_block_type": "string_template",
                "string_template": {
                    "template": "$value",
                    "params": {
                        "value": value
                    },
                    "styling": {
                        "default": {
                            "classes": ["badge", "badge-info"],
                        },
                    }
                }
            } for value in [
                "Andrew",
                "Elijah",
                "Matthew",
                "Cindy",
                "Pam"
            ]],
            "styling": {
                "classes": ["col-{}".format(col)]
            },
        })
    
    @classmethod
    def render(cls, ge_dict={}):
        return RenderedDocumentContent(**{
            "renderer_type": "CustomValidationResultsPageRenderer",
            "data_asset_name": "my_data_asset_name",
            "full_data_asset_identifier": "my_datasource/my_generator/my_generator_asset",
            "page_title": "My Page Title",
            "sections": [
                RenderedSectionContent(**{
                    "section_name": "Header Content Block",
                    "content_blocks": [
                        cls._get_header_content_block(header="Header Content Block", subheader="subheader")]
                }),
                RenderedSectionContent(**{
                    "section_name": "Bullet List Content Block",
                    "content_blocks": [
                        cls._get_header_content_block(header="Bullet List Content Block"),
                        cls._get_bullet_list_content_block(header="My Important List",
                                                           subheader="Unremarkable Subheader")
                    ]
                }),
                RenderedSectionContent(**{
                    "section_name": "Table Content Block",
                    "content_blocks": [
                        cls._get_header_content_block(header="Table Content Block"),
                        cls._get_table_content_block(header="My Big Data Table"),
                    ]
                }),
                RenderedSectionContent(**{
                    "section_name": "Value List Content Block",
                    "content_blocks": [
                        cls._get_header_content_block(header="Value List Content Block"),
                        cls._get_value_list_content_block(header="My Name Value List"),
                    ]
                }),
                RenderedSectionContent(**{
                    "section_name": "Graph Content Block",
                    "content_blocks": [
                        cls._get_header_content_block(header="Graph Content Block"),
                        cls._get_graph_content_block(header="My Big Data Graph"),
                    ]
                }),
                RenderedSectionContent(**{
                    "section_name": "String Template Content Block With Icon",
                    "content_blocks": [
                        cls._get_header_content_block(header="String Template Content Block With Icon"),
                        cls._get_string_template_content_block()
                    ]
                }),
                RenderedSectionContent(**{
                    "section_name": "String Template Content Block With Tooltip",
                    "content_blocks": [
                        cls._get_header_content_block(header="String Template Content Block With Tooltip"),
                        cls._get_tooltip_string_template_content_block()
                    ]
                }),
                RenderedSectionContent(**{
                    "section_name": "Multiple Content Block Section",
                    "content_blocks": [
                        cls._get_header_content_block(header="Multiple Content Block Section"),
                        cls._get_graph_content_block(header="My col-4 Graph", col=4),
                        cls._get_graph_content_block(header="My col-4 Graph", col=4),
                        cls._get_graph_content_block(header="My col-4 Graph", col=4),
                        cls._get_table_content_block(header="My col-6 Table", col=6),
                        cls._get_bullet_list_content_block(header="My col-6 List", subheader="subheader", col=6)
                    ]
                }),
            ]
        })
../_images/customizing_data_docs.png

Dependencies

  • Font Awesome 5.10.1

  • Bootstrap 4.3.1

  • jQuery 3.2.1

  • altair 3.1.0

  • Vega 5.3.5

  • Vega-Lite 3.2.1

  • Vega-Embed 4.0.0

Data Docs is implemented in the great_expectations.render module.

last updated: Aug 13, 2020