Publishing Data Docs to S3

In this tutorial we will cover publishing a data docs site directly to s3. Publishing a site this way makes reviewing and acting on validation results easy in a team, and provides a central location to review the expectations currently configured for data assets under test.

Configuring data docs requires three simple steps:

  1. Configure an S3 bucket.

Modify the bucket name and region for your situation.

> aws s3api create-bucket --bucket data-docs.my_org --region us-east-1
{
    "Location": "/data-docs.my_org"
}

Configure your bucket policy to enable appropriate access. IMPORTANT: your policy should provide access only to appropriate users; data-docs can include critical information about raw data and should generally not be publicly accessible. The example policy below enforces IP-based access.

Modify the bucket name and IP addresses below for your situation.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "Allow only based on source IP",
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": [
      "arn:aws:s3:::data-docs.my_org",
      "arn:aws:s3:::data-docs.my_org/*"
    ],
    "Condition": {
      "IpAddress": {
        "aws:SourceIp": [
          "192.168.0.1/32",
          "2001:db8:1234:1234::/64"
        ]
      }
    }
  }
  ]
}

Modify the policy above and save it to a file called ip-policy.json in your local directory. Then, run:

> aws s3api put-bucket-policy --bucket data-docs.my_org --policy file://ip-policy.json

2. Edit your great_expectations.yml file to change the data_docs_sites configuration for the site you will publish. Add the `s3_site` section below existing site configuration.

# ... additional configuration above
data_docs_sites:
  local_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
  s3_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: data-docs.my_org  # UPDATE the bucket name here to match the bucket you configured above.
# ... additional configuration below
  1. Build your documentation:

> great_expectations docs build
Building...

You’re now ready to visit the site! Your site will be available at the following URL: http://data-docs.my_org.s3.amazonaws.com/index.html

Additional Resources

Optionally, you may wish to update static hosting settings for your bucket to enable AWS to automatically serve your index.html file or a custom error file:

> aws s3 website s3://data-docs.my_org/ --index-document index.html
For more information on static site hosting in AWS, see the following:

last updated: Aug 13, 2020