Data Docs Reference¶
Data Docs make it simple to visualize data quality in your project. These include Expectations, Validations & Profiles. They are built for all Datasources from JSON artifacts in the local repo including validations & profiles from the uncommitted directory.
Users have full control over configuring Data Documentation for their project - they can modify the pre-configured site (or remove it altogether) and add new sites with a configuration that meets the project’s needs. The easiest way to add a new site to the configuration is to copy the “local_site” configuration block in great_expectations.yml, give the copy a new name and modify the details as needed.
Data Docs Site Configuration¶
The default Data Docs site configuration looks like this:
data_docs_sites:
local_site:
class_name: SiteBuilder
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
Here is an example of a site configuration from great_expectations.yml with defaults defined explicitly:
data_docs_sites:
local_site: # site name
module_name: great_expectations.render.renderer.site_builder
class_name: SiteBuilder
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
site_section_builders:
expectations: # if not present, expectation suites are not rendered
class_name: DefaultSiteSectionBuilder
source_store_name: expectations_store
renderer:
module_name: great_expectations.render.renderer
class_name: ExpectationSuitePageRenderer
validations: # if not present, validation results are not rendered
class_name: DefaultSiteSectionBuilder
source_store_name: validations_store
run_id_filter:
ne: profiling # exclude validations with run id "profiling" - reserved for profiling results
renderer:
module_name: great_expectations.render.renderer
class_name: ValidationResultsPageRenderer
profiling: # if not present, profiling results are not rendered
class_name: DefaultSiteSectionBuilder
source_store_name: validations_store
run_id_filter:
eq: profiling
renderer:
module_name: great_expectations.render.renderer
class_name: ProfilingResultsPageRenderer
validations_store
in the example above specifies the name of a store configured in the stores section.
Validation and profiling results from that store will be included in the documentation. The optional run_id_filter
attribute allows to include (eq
for exact match) or exclude (ne
) validation results with a particular run id.
Limiting Validation Results¶
If you would like to limit rendered Validation Results to the n most-recent, you may do so by setting the validation_results_limit key in your Data Docs configuration:
data_docs_sites:
local_site:
class_name: SiteBuilder
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
show_cta_footer: true
validation_results_limit: 5
Automatically Publishing Data Docs¶
It is possible to directly publish (continuously updating!) data docs sites to a shared location such as a static site hosted in S3 by simply updating the store_backend configuration in the site configuration. If we modify the configuration in the example above to adjust the store backend to an S3 bucket of our choosing, our SiteBuilder will automatically save the resulting site to that bucket.
store_backend:
class_name: TupleS3StoreBackend
bucket: data-docs.my_org.org
prefix:
See the tutorial on publishing data docs to S3 for more information.
More advanced configuration¶
It is possible to extend renderers and views and customize the particular class used to render any of the objects in your documentation. In this more advanced configuration, a “CustomTableContentBlockRenderer” is used only for the validations renderer, and no profiling results are rendered at all.
data_docs_sites:
# Data Docs make it simple to visualize data quality in your project. These
# include Expectations, Validations & Profiles. The are built for all
# Datasources from JSON artifacts in the local repo including validations &
# profiles from the uncommitted directory. Read more at
# https://docs.greatexpectations.io/en/latestfeatures/data_docs.html
local_site:
class_name: SiteBuilder
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_section_builders:
expectations:
class_name: DefaultSiteSectionBuilder
source_store_name: expectations_store
renderer:
module_name: great_expectations.render.renderer
class_name: ExpectationSuitePageRenderer
validations:
class_name: DefaultSiteSectionBuilder
source_store_name: validations_store
run_id_filter:
ne: profiling
renderer:
module_name: great_expectations.render.renderer
class_name: ValidationResultsPageRenderer
column_section_renderer:
class_name: ValidationResultsColumnSectionRenderer
table_renderer:
module_name: custom_renderers.custom_table_content_block
class_name: CustomTableContentBlockRenderer
To support that custom renderer, we need to ensure the implementation is available in our plugins/ directory. Note that we can use a subdirectory and standard python submodule notation, but that we need to include an __init__.py file in our custom_renderers package.
plugins/
├── custom_renderers
│ ├── __init__.py
│ └── custom_table_content_block.py
└── additional_ge_plugin.py
Building Data Docs¶
Using the CLI¶
The great_expectations CLI can build comprehensive Data Docs from expectation
suites available to the configured context and validations available in the
great_expectations/uncommitted
directory.
great_expectations docs build
When called without additional arguments, this command will render all the Data Docs sites specified in great_expectations.yml configuration file into HTML and open them in a web browser.
The command will print out the locations of index.html file for each site.
To disable the web browser opening behavior use –no-view option.
To render just one site, use –site-name SITE_NAME option.
Here is when the docs build
command should be called:
when you want to fully rebuild a Data Docs site
after a new expectation suite is added or an existing one is edited
after new data is profiled (only if you declined the prompt to build data docs when running the profiling command)
When a new validation result is generated after running a Validation Operator, the Data Docs sites will add this result automatically if the operator has the UpdateDataDocsAction action configured (read Actions).
Using the raw API¶
The underlying python API for rendering documentation is still new and evolving. Use the following snippet as a guide for how to profile a single batch of data and build documentation from the validation_result.
import os
import great_expectations as ge
from great_expectations.profile.basic_dataset_profiler import BasicDatasetProfiler
from great_expectations.render.renderer import ProfilingResultsPageRenderer, ExpectationSuitePageRenderer
from great_expectations.data_context.util import safe_mmkdir
from great_expectations.render.view import DefaultJinjaPageView
profiling_html_filepath = '/path/into/which/to/save/results.html'
# obtain the DataContext object
context = ge.data_context.DataContext()
# load a batch to profile
context.create_expectation_suite('default')
batch = context.get_batch(
batch_kwargs=context.build_batch_kwargs("my_datasource", "my_batch_kwargs_generator", "my_asset")
expectation_suite_name='default',
)
# run the profiler on the batch - this returns an expectation suite and validation results for this suite
expectation_suite, validation_result = BasicDatasetProfiler().profile(batch)
# use a renderer to produce a document model from the validation results
document_model = ProfilingResultsPageRenderer().render(validation_result)
# use a view to render the document model (produced by the renderer) into a HTML document
safe_mmkdir(os.path.dirname(profiling_html_filepath))
with open(profiling_html_filepath, 'w') as writer:
writer.write(DefaultJinjaPageView().render(document_model))
Customizing Data Docs¶
Introduction¶
Data Docs uses the Jinja template engine to generate HTML pages.
The built-in Jinja templates used to compile Data Docs pages are implemented in the great_expectations.render.view
module
and are tied to View classes. Views determine how page content is displayed. The content data that Views
specify and consume is generated by Renderer classes and are implemented in the great_expectations.render.renderer
module.
Renderers take Great Expectations objects as input and return typed dictionaries - Views take these dictionaries as input and
output rendered HTML.
Built-In Views and Renderers¶
Out of the box, Data Docs supports two top-level Views (i.e. pages), great_expectations.render.view.DefaultJinjaIndexPageView
,
for a site index page, and great_expectations.render.view.DefaultJinjaPageView
for all other pages. Pages
are broken into sections - great_expectations.render.view.DefaultJinjaSectionView
- which are composed of
UI components - great_expectations.render.view.DefaultJinjaComponentView
. Each of these Views references a
single base Jinja template, which can incorporate any number of other templates through inheritance.
Data Docs comes with the following built-in site page Renderers:
great_expectations.render.renderer.SiteIndexPageRenderer
(index page)
great_expectations.render.renderer.ProfilingResultsPageRenderer
(Profiling Results pages)
great_expectations.render.renderer.ExpectationSuitePageRenderer
(Expectation Suite pages)
great_expectations.render.renderer.ValidationResultsPageRenderer
(Validation Results pages)
Analogous to the base View templates referenced above, these Renderers can be thought of as base Renderers for the primary Data Docs pages, and may call on many other ancillary Renderers.
It is possible to extend Renderers and Views and customize the particular class used to render any of the objects in your documentation.
Other Tools¶
In addition to Jinja, Data Docs draws on the following libraries to compile HTML documentation, which you can use to further customize HTML documentation:
Bootstrap
Font Awesome
jQuery
Altair
Getting Started¶
Making Minor Adjustments¶
Many of the HTML elements in the default Data Docs pages have pre-configured classes that you may use to make
minor adjustments using your own custom CSS. By default, when you run great_expectations init
,
Great Expectations creates a scaffold within the plugins
directory for customizing Data Docs.
Within this scaffold is a file called data_docs_custom_styles.css
- this CSS file contains all the pre-configured classes
you may use to customize the look and feel of the default Data Docs pages. All custom CSS, applied to these pre-configured classes
or any other HTML elements, must be placed in this file.
Scaffolded directory tree:
plugins
└── custom_data_docs
├── renderers
├── styles
│ └── data_docs_custom_styles.css
└── views
Using Custom Views and Renderers¶
Suppose you start a new Great Expectations project by running great_expectations init
and compile your first Data Docs
site. After looking over the local site, you decide you want to implement the following changes:
A completely new Expectation Suite page, requiring a new View and Renderer
A smaller modification to the default Validation page, swapping out a child renderer for a custom version
Remove Profiling Results pages from the documentation
To make these changes, you must first implement the custom View and Renderers and ensure they are available in the plugins directory specified
in your project configuration (plugins/
by default). Note that you can use a subdirectory and standard python submodule
notation, but you must include an __init__.py file in your custom package. By default, when you run great_expectations init
,
Great Expectations creates placeholder directories for your custom views, renderers, and CSS stylesheets within the plugins
directory. If you wish, you may save
your custom views and renderers in an alternate location however, any CSS stylesheets must be saved to plugins/custom_data_docs/styles
.
Scaffolded directory tree:
plugins
└── custom_data_docs
├── renderers
├── styles
│ └── data_docs_custom_styles.css
└── views
When you are done with your implementations, your plugins/
directory has the following structure:
plugins
└── custom_data_docs
├── renderers
├── __init__.py
├── custom_table_renderer.py
└── custom_expectation_suite_page_renderer.py
├── styles
│ └── data_docs_custom_styles.css
└── views
├── __init__.py
└── custom_expectation_suite_view.py
For Data Docs to use your custom Views and Renderers when compiling your local Data Docs site, you must specify
where to use them in the data_docs_sites
section of your project configuration.
Before modifying your project configuration, the relevant section looks like this:
data_docs_sites:
local_site:
class_name: SiteBuilder
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
This is what it looks like after your changes are added:
data_docs_sites:
local_site:
class_name: SiteBuilder
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_section_builders:
expectations:
class_name: DefaultSiteSectionBuilder
source_store_name: expectations_store
renderer:
module_name: custom_data_docs.renderers.custom_expectation_suite_page_renderer
class_name: CustomExpectationSuitePageRenderer
view:
module_name: custom_data_docs.views.custom_expectation_suite_view
class_name: CustomExpectationSuiteView
validations:
class_name: DefaultSiteSectionBuilder
source_store_name: validations_store
run_id_filter:
ne: profiling
renderer:
module_name: great_expectations.render.renderer
class_name: ValidationResultsPageRenderer
column_section_renderer:
class_name: ValidationResultsColumnSectionRenderer
table_renderer:
module_name: custom_data_docs.renderers.custom_table_renderer
class_name: CustomTableRenderer
Note that if your data_docs_sites
configuration contains a site_section_builders
key, you must now explicitly provide
defaults for anything you would like rendered. By omitting the profiling
key within site_section_builders
, your third goal
is achieved and Data Docs will no longer render Profiling Results pages.
Lastly, to compile your newly-customized Data Docs local site, you run great_expectations docs build
from the command line.
site_section_builders
defaults:
site_section_builders:
expectations: # if not present, expectation suites are not rendered
class_name: DefaultSiteSectionBuilder
source_store_name: expectations_store
renderer:
module_name: great_expectations.render.renderer
class_name: ExpectationSuitePageRenderer
validations: # if not present, validation results are not rendered
class_name: DefaultSiteSectionBuilder
source_store_name: validations_store
run_id_filter:
ne: profiling # exclude validations with run id "profiling" - reserved for profiling results
renderer:
module_name: great_expectations.render.renderer
class_name: ValidationResultsPageRenderer
profiling: # if not present, profiling results are not rendered
class_name: DefaultSiteSectionBuilder
source_store_name: validations_store
run_id_filter:
eq: profiling
renderer:
module_name: great_expectations.render.renderer
class_name: ProfilingResultsPageRenderer
Re-Purposing Built-In Views¶
If you would like to re-purpose a built-in View, you may do so by implementing a custom renderer that outputs an appropriately typed and structured dictionary for that View.
Built-in Views and corresponding input types:
great_expectations.render.view.DefaultJinjaPageView
:great_expectations.render.types.RenderedDocumentContent
great_expectations.render.view.DefaultJinjaSectionView
:great_expectations.render.types.RenderedSectionContent
great_expectations.render.view.DefaultJinjaComponentView
:great_expectations.render.types.RenderedComponentContent
An example of a custom page Renderer, using all built-in UI elements is provided below.
Custom Page Renderer Example¶
import altair as alt
import pandas as pd
from great_expectations.render.renderer.renderer import Renderer
from great_expectations.render.types import (
RenderedDocumentContent,
RenderedSectionContent,
RenderedComponentContent,
RenderedHeaderContent, RenderedBulletListContent, RenderedTableContent, RenderedStringTemplateContent,
RenderedGraphContent, ValueListContent)
class CustomPageRenderer(Renderer):
@classmethod
def _get_header_content_block(cls, header="", subheader="", highlight=True):
return RenderedHeaderContent(**{
"content_block_type": "header",
"header": RenderedStringTemplateContent(**{
"content_block_type": "string_template",
"string_template": {
"template": header,
}
}),
"subheader": subheader,
"styling": {
"classes": ["col-12"],
"header": {
"classes": ["alert", "alert-secondary"] if highlight else []
}
}
})
@classmethod
def _get_bullet_list_content_block(cls, header="", subheader="", col=12):
return RenderedBulletListContent(**{
"content_block_type": "bullet_list",
"header": header,
"subheader": subheader,
"bullet_list": [
"Aenean porttitor turpis.",
"Curabitur ligula urna.",
cls._get_header_content_block(header="nested header content block", subheader="subheader",
highlight=False)
],
"styling": {
"classes": ["col-{}".format(col)],
"styles": {
"margin-top": "20px"
},
},
})
@classmethod
def _get_table_content_block(cls, header="", subheader="", col=12):
return RenderedTableContent(**{
"content_block_type": "table",
"header": header,
"subheader": subheader,
"table": [
["", "column_1", "column_2"],
[
"row_1",
cls._get_bullet_list_content_block(subheader="Nested Bullet List Content Block"),
"buffalo"
],
["row_2", "crayon", "derby"],
],
"styling": {
"classes": ["col-{}".format(col), "table-responsive"],
"styles": {
"margin-top": "20px"
},
"body": {
"classes": ["table", "table-sm"]
}
},
})
@classmethod
def _get_graph_content_block(cls, header="", subheader="", col=12):
df = pd.DataFrame({
"value": [1, 2, 3, 4, 5, 6],
"count": [123, 232, 543, 234, 332, 888]
})
bars = alt.Chart(df).mark_bar(size=20).encode(
y='count:Q',
x="value:O"
).properties(height=200, width=200, autosize="fit")
chart = bars.to_json()
return RenderedGraphContent(**{
"content_block_type": "graph",
"header": header,
"subheader": subheader,
"graph": chart,
"styling": {
"classes": ["col-{}".format(col)],
"styles": {
"margin-top": "20px"
},
},
})
@classmethod
def _get_tooltip_string_template_content_block(cls):
return RenderedStringTemplateContent(**{
"content_block_type": "string_template",
"string_template": {
"template": "This is a string template with tooltip, using a top-level custom tag.",
"tag": "code",
"tooltip": {
"content": "This is the tooltip content."
}
},
"styling": {
"classes": ["col-12"],
"styles": {
"margin-top": "20px"
},
},
})
@classmethod
def _get_string_template_content_block(cls):
return RenderedStringTemplateContent(**{
"content_block_type": "string_template",
"string_template": {
"template": "$icon This is a Font Awesome Icon, using a param-level custom tag\n$red_text\n$bold_serif",
"params": {
"icon": "",
"red_text": "And this is red text!",
"bold_serif": "And this is big, bold serif text using style attribute..."
},
"styling": {
"params": {
"icon": {
"classes": ["fas", "fa-check-circle", "text-success"],
"tag": "i"
},
"red_text": {
"classes": ["text-danger"]
},
"bold_serif": {
"styles": {
"font-size": "22px",
"font-weight": "bold",
"font-family": "serif"
}
}
}
}
},
"styling": {
"classes": ["col-12"],
"styles": {
"margin-top": "20px"
},
},
})
@classmethod
def _get_value_list_content_block(cls, header="", subheader="", col=12):
return ValueListContent(**{
"content_block_type": "value_list",
"header": header,
"subheader": subheader,
"value_list": [{
"content_block_type": "string_template",
"string_template": {
"template": "$value",
"params": {
"value": value
},
"styling": {
"default": {
"classes": ["badge", "badge-info"],
},
}
}
} for value in [
"Andrew",
"Elijah",
"Matthew",
"Cindy",
"Pam"
]],
"styling": {
"classes": ["col-{}".format(col)]
},
})
@classmethod
def render(cls, ge_dict={}):
return RenderedDocumentContent(**{
"renderer_type": "CustomValidationResultsPageRenderer",
"data_asset_name": "my_data_asset_name",
"full_data_asset_identifier": "my_datasource/my_generator/my_generator_asset",
"page_title": "My Page Title",
"sections": [
RenderedSectionContent(**{
"section_name": "Header Content Block",
"content_blocks": [
cls._get_header_content_block(header="Header Content Block", subheader="subheader")]
}),
RenderedSectionContent(**{
"section_name": "Bullet List Content Block",
"content_blocks": [
cls._get_header_content_block(header="Bullet List Content Block"),
cls._get_bullet_list_content_block(header="My Important List",
subheader="Unremarkable Subheader")
]
}),
RenderedSectionContent(**{
"section_name": "Table Content Block",
"content_blocks": [
cls._get_header_content_block(header="Table Content Block"),
cls._get_table_content_block(header="My Big Data Table"),
]
}),
RenderedSectionContent(**{
"section_name": "Value List Content Block",
"content_blocks": [
cls._get_header_content_block(header="Value List Content Block"),
cls._get_value_list_content_block(header="My Name Value List"),
]
}),
RenderedSectionContent(**{
"section_name": "Graph Content Block",
"content_blocks": [
cls._get_header_content_block(header="Graph Content Block"),
cls._get_graph_content_block(header="My Big Data Graph"),
]
}),
RenderedSectionContent(**{
"section_name": "String Template Content Block With Icon",
"content_blocks": [
cls._get_header_content_block(header="String Template Content Block With Icon"),
cls._get_string_template_content_block()
]
}),
RenderedSectionContent(**{
"section_name": "String Template Content Block With Tooltip",
"content_blocks": [
cls._get_header_content_block(header="String Template Content Block With Tooltip"),
cls._get_tooltip_string_template_content_block()
]
}),
RenderedSectionContent(**{
"section_name": "Multiple Content Block Section",
"content_blocks": [
cls._get_header_content_block(header="Multiple Content Block Section"),
cls._get_graph_content_block(header="My col-4 Graph", col=4),
cls._get_graph_content_block(header="My col-4 Graph", col=4),
cls._get_graph_content_block(header="My col-4 Graph", col=4),
cls._get_table_content_block(header="My col-6 Table", col=6),
cls._get_bullet_list_content_block(header="My col-6 List", subheader="subheader", col=6)
]
}),
]
})

Dependencies¶
Font Awesome 5.10.1
Bootstrap 4.3.1
jQuery 3.2.1
altair 3.1.0
Vega 5.3.5
Vega-Lite 3.2.1
Vega-Embed 4.0.0
Data Docs is implemented in the great_expectations.render
module.
last updated: Aug 13, 2020