How to host and share Data Docs on Azure Blob Storage¶
This guide will explain how to host and share Data Docs on Azure Blob Storage. Data Docs will be served using an Azure Blob Storage static website with restricted access.
Prerequisites: This how-to guide assumes you have already:
Have permission to create and configured an Azure storage account
Steps
Create an Azure Blob Storage static website.
Create a storage account.
In settings Select Static website to display the configuration page for static websites.
Select Enabled to enable static website hosting for the storage account.
Write “index.html” in Index document.
Note the Primary endpoint url. Your team will be able to consult your data doc on this url when you have finished this tuto. You could also map a custom domain to this endpoint. A container called
$web
should have been created in your storage account.
Configure the
config_variables.yml
file with your azure storage credentials
Get the Connection string of the storage account you have just created.
We recommend that azure storage credentials be stored in the
config_variables.yml
file, which is located in theuncommitted/
folder by default, and is not part of source control. The following lines add azure storage credentials under the keyAZURE_STORAGE_CONNECTION_STRING
. Additional options for configuring theconfig_variables.yml
file or additional environment variables can be found here.AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
Add a new Azure site to the data_docs_sites section of your great_expectations.yml.
data_docs_sites: local_site: class_name: SiteBuilder show_how_to_buttons: true store_backend: class_name: TupleFilesystemStoreBackend base_directory: uncommitted/data_docs/local_site/ site_index_builder: class_name: DefaultSiteIndexBuilder az_site: # this is a user-selected name - you may select your own class_name: SiteBuilder store_backend: class_name: TupleAzureBlobStoreBackend container: \$web connection_string: ${AZURE_STORAGE_WEB_CONNECTION_STRING} site_index_builder: class_name: DefaultSiteIndexBuilder
You may also replace the default
local_site
if you would only like to maintain a single Azure Data Docs site.Note
Since the container is called
$web
, if we simply setcontainer: $web
ingreat_expectations.yml
then Great Expectations would unsuccefully try to find the variable calledweb
inconfig_variables.yml
. We use an escape char\
before the$
so the substitute_config_variable method will allow us to reach the$web
container.You also may configure Great Expectations to store your expectations and validations in this Azure Storage account. You can follow the documentation from the guides for expectations and validations but unsure you set
container: \$web
inplace of other container name.
Build the Azure Blob Data Docs site.
You can create or modify a suite and this will build the Data Docs website. Or you can use the following CLI command:
great_expectations docs build --site-name az_site
.> great_expectations docs build --site-name az_site The following Data Docs sites will be built: - az_site: https://<your-storage-account>.blob.core.windows.net/$web/index.html Would you like to proceed? [Y/n]: y Building Data Docs... Done building Data DocsIf successful, the CLI will provide the object URL of the index page. You may secure the access of your website using an IP filtering mechanism.
Limit the access to your company.
On your Azure Storage Account Settings click on Networking
Allow access from Selected networks
You can add access to Virtual Network
You can add IP ranges to the firewall
More details are available here.