Configure Validation Result Stores
A Validation Results Store is a connector that is used to store and retrieve information about objects generated when data is Validated against an Expectation. By default, Validation Results are stored in JSON format in the uncommitted/validations/
subdirectory of your gx/
folder. Use the information provided here to configure a store for your Validation Results.
Validation Results can include sensitive or regulated data that should not be committed to a source control system.
- Amazon S3
- Microsoft Azure Blob Storage
- Google Cloud Service
- Filesystem
- PostgreSQL
Amazon S3
Use the information provided here to configure a new storage location for Validation Results in Amazon S3.
Prerequisites
- A Data Context.
- An Expectations Suite.
- A Checkpoint.
- Permissions to install boto3 in your local environment.
- An S3 bucket and prefix for the Validation Results.
Install boto3 in your local environment
Python interacts with AWS through the boto3
library. Great Expectations makes use of this library in the background when working with AWS. Although you won't use boto3
directly, you'll need to install it in your virtual environment.
Run one of the following pip commands to install boto3
in your virtual environment:
python -m pip install boto3
or
python3 -m pip install boto3
To set up boto3 with AWS, and use boto3
within Python, see the Boto3 documentation.
Verify your AWS credentials are properly configured
Run the following command in the AWS CLI to verify that your AWS credentials are properly configured:
aws sts get-caller-identity
When your credentials are properly configured, your UserId
, Account
, and Arn
are returned. If your credentials are not configured correctly, an error message appears. If you received an error message, or you couldn't verify your credentials, see Configuring the AWS CLI.
Identify your Data Context Validation Results Store
Your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. configuration is in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..
The following section in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. great_expectations.yml
file tells Great Expectations to look for Validation Results in a Store named validations_store
. It also creates a ValidationsStore
named validations_store
that is backed by a Filesystem and stores Validation Results under the base_directory
uncommitted/validations
(the default).
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
validations_store_name: validations_store
Update your configuration file to include a new Store for Validation Results
To manually add a Validation Results Store, add the following configuration to the stores
section of your great_expectations.yml
file:
stores:
validations_S3_store:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your>'
prefix: '<your>' # Bucket and prefix in combination must be unique across all stores
As shown in the previous example, you need to change the default store_backend
settings to make the Store work with S3. The class_name
is set to TupleS3StoreBackend
, bucket
is the address of your S3 bucket, and prefix
is the folder in your S3 bucket where Validation Results are located.
The following example shows the additional options that are available to customize TupleS3StoreBackend
:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
boto3_options:
endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
region_name: '<your_aws_region_name>'
In the previous example, the Store name is validations_S3_store
. If you use a personalized Store name, you must also update the value of the validations_store_name
key to match the Store name. For example:
validations_store_name: validations_S3_store
When you update the validations_store_name
key value, Great Expectations uses the new Store for Validation Results.
Add the following code to great_expectations.yml
to configure the IAM user:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
boto3_options:
aws_access_key_id: ${AWS_ACCESS_KEY_ID} # Uses the AWS_ACCESS_KEY_ID environment variable to get aws_access_key_id.
aws_secret_access_key: ${AWS_ACCESS_KEY_ID}
aws_session_token: ${AWS_ACCESS_KEY_ID}
Add the following code to great_expectations.yml
to configure the IAM Assume Role:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
boto3_options:
assume_role_arn: '<your_role_to_assume>'
region_name: '<your_aws_region_name>'
assume_role_duration: session_duration_in_seconds
If you are also storing ExpectationsA verifiable assertion about data. in S3 How to configure an Expectation store to use Amazon S3, or DataDocs in S3 How to host and share Data Docs, then make sure the prefix
values are disjoint and one is not a substring of the other.
Copy existing Validation results to the S3 bucket (Optional)
If you are converting an existing local Great Expectations deployment to one that works in AWS, you might have Validation Results saved that you want to transfer to your S3 bucket.
To copy Validation Results into Amazon S3, use the aws s3 sync
command as shown in the following example:
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'
The base_directory
is set to uncommitted/validations/
by default.
In the following example, the Validation Results Validation1
and Validation2
are copied to Amazon S3 and a confirmation message is returned:
upload: uncommitted/validations/val1/val1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val1.json
upload: uncommitted/validations/val2/val2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val2.json
Confirm the configuration
Run a Checkpoint to store results in the new Validation Results Store on S3 then visualize the results by re-building Data Docs.
Microsoft Azure Blob Storage
Use the information provided here to configure a new storage location for Validation Results in Azure Blob Storage.
Prerequisites
- A Data Context.
- An Expectations Suite.
- A Checkpoint.
- An Azure Storage account and get the connection string.
- An Azure Blob container. If you want to host and share Data Docs on Azure Blob Storage, you can set this up first and then use the
$web
existing container to store your ExpectationsA verifiable assertion about data.. - A prefix (folder) to store Validation Results. You don't need to create the folder, the prefix is just part of the Blob name.
Configure the config_variables.yml
file with your Azure Storage credentials
GX recommends that you store Azure Storage credentials in the config_variables.yml
file, which is located in the uncommitted/
folder by default, and is not part of source control. The following code adds Azure Storage credentials under the key AZURE_STORAGE_CONNECTION_STRING
:
AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
To learn more about the additional options for configuring the config_variables.yml
file, or additional environment variables, see How to configure credentials
Identify your Validation Results Store
Your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. configuration is provided in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Open great_expectations.yml
and find the following entry:
validations_store_name: validations_store
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
This configuration tells Great Expectations to look for Validation Results in a Store named validations_store
. The default base_directory
for validations_store
is uncommitted/validations/
.
Update your configuration file to include a new Store for Validation Results on Azure Storage account
In the following example, validations_store_name
is set to validations_AZ_store
, but it can be personalized. You also need to change the store_backend
settings. The class_name
is TupleAzureBlobStoreBackend
, container
is the name of your blob container where Validation Results are stored, prefix
is the folder in the container where Validation Result files are located, and connection_string
is ${AZURE_STORAGE_CONNECTION_STRING}
to reference the corresponding key in the config_variables.yml
file.
validations_store_name: validations_AZ_store
stores:
validations_AZ_store:
class_name: ValidationsStore
store_backend:
class_name: TupleAzureBlobStoreBackend
container: <blob-container>
prefix: validations
connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
If the container for hosting and sharing Data Docs on Azure Blob Storage is named $web
, use container: \$web
to allow access to the $web
container.
Additional authentication and configuration options are available. See Host and Share Data Docs on Azure Blob Storage.
Copy existing Validation Results JSON files to the Azure blob (Optional)
You can use the az storage blob upload
command to copy Validation Results into Azure Blob Storage. The following command copies one Validation Result from a local folder to the Azure blob:
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
az storage blob upload -f <local/path/to/validation.json> -c <GREAT-EXPECTATION-DEDICATED-AZURE-BLOB-CONTAINER-NAME> -n <PREFIX>/<validation.json>
example with a validation related to the exp1 expectation:
az storage blob upload -f gx/uncommitted/validations/exp1/20210306T104406.877327Z/20210306T104406.877327Z/8313fb37ca59375eb843adf388d4f882.json -c <blob-container> -n validations/exp1/20210306T104406.877327Z/20210306T104406.877327Z/8313fb37ca59375eb843adf388d4f882.json
Finished[#############################################################] 100.0000%
{
"etag": "\"0x8D8E09F894650C7\"",
"lastModified": "2021-03-06T12:58:28+00:00"
}
To learn more about other methods that are available to copy Validation Result JSON files into Azure Blob Storage, see Quickstart: Upload, download, and list blobs with the Azure portal.
Reference the new configuration
To make Great Expectations look for Validation Results on the Azure store, set the validations_store_name
variable to the name of your Azure Validations Store. In the previous example this was validations_AZ_store
.
Confirm that the Validation Results Store has been correctly configured
Run a Checkpoint to store results in the new Validation Results Store on Azure Blob and then visualize the results by re-building Data Docs.
GCS
Use the information provided here to configure a new storage location for Validation Results in GCS.
To view all the code used in this topic, see how_to_configure_a_validation_result_store_in_gcs.py.
Prerequisites
- A Data Context.
- An Expectations Suite.
- A Checkpoint.
- A GCP service account with credentials that allow access to GCP resources such as Storage Objects.
- A GCP project, GCS bucket, and prefix to store Validation Results.
Configure your GCP credentials
Confirm that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Validation Results will be stored. This includes the following:
- A GCP service account.
- Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable. - Verifying authentication by running a Google Cloud Storage client library script.
For more information about validating your GCP authentication credentials, see Authenticate to Cloud services using client libraries.
Identify your Data Context Validation Results Store
The configuration for your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. is available in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Open great_expectations.yml
and find the following entry:
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
validations_store_name: validations_store
This configuration tells Great Expectations to look for Validation Results in the validations_store
Store. The default base_directory
for validations_store
is uncommitted/validations/
.
Update your configuration file to include a new Store for Validation Results
In the following example, validations_store_name
is set to validations_GCS_store
, but it can be personalized. You also need to change the store_backend
settings. The class_name
is TupleGCSStoreBackend
, project
is your GCP project, bucket
is the address of your GCS bucket, and prefix
is the folder on GCS where Validation Result files are stored.
stores:
validations_GCS_store:
class_name: ValidationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>
validations_store_name: validations_GCS_store
If you are also storing Expectations in GCS or DataDocs in GCS, make sure that the prefix
values are disjoint and one is not a substring of the other.
Copy existing Validation Results to the GCS bucket (Optional)
Use the gsutil cp
command to copy Validation Results into GCS. For example, the following command copies the Validation results validation_1
and validation_2
into a GCS bucket:
gsutil cp uncommitted/validations/my_expectation_suite/validation_1.json gs://<your>/<your>/validation_1.json
gsutil cp uncommitted/validations/my_expectation_suite/validation_2.json gs://<your>/<your>/validation_2.json
The following confirmation message is returned:
Operation completed over 2 objects
Additional methods for copying Validation Results into GCS are available. See Upload objects from a filesystem.
Reference the new configuration
To make Great Expectations look for Validation Results on the GCS store, set the validations_store_name
variable to the name of your GCS Validations Store. In the previous example this was validations_GCS_store
.
Confirm that the Validation Results Store has been correctly configured
Run a Checkpoint to store results in the new Validation Results Store on GCS, and then visualize the results by re-building Data Docs.
Filesystem
Use the information provided here to configure a new storage location for Validation Results in your filesystem. You'll learn how to use an ActionA Python class with a run method that takes a Validation Result and does something with it to update Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. sites with new Validation Results from CheckpointThe primary means for validating data in a production deployment of Great Expectations. runs.
Prerequisites
- A Data Context.
- An Expectation Suite .
- A Checkpoint.
- A new storage location to store Validation Results. This can be a local path, or a path to a secure network filesystem.
Create a new folder for Validation Results
Run the following command to create a new folder for your Validation Results and move your existing Validation Results to the new folder:
# in the gx/ folder
mkdir shared_validations
mv uncommitted/validations/npi_validations/ uncommitted/shared_validations/
In this example, the name of the Validation Result is npi_validations
and the path to the new storage location is shared_validations/
.
Identify your Data Context Validation Results Store
The configuration for your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. is available in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Open great_expectations.yml
and find the following entry:
validations_store_name: validations_store
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
This configuration tells Great Expectations to look for Validation Results in the validations_store
Store. The default base_directory
for validations_store
is uncommitted/validations/
.
Update your configuration file to include a new Store for Validation results
In the following example, validations_store_name
is set to shared_validations_filesystem_store
, but it can be personalized. Also, base_directory
is set to uncommitted/shared_validations/
, but you can set it to another path that is accessible by Great Expectations.
validations_store_name: shared_validations_filesystem_store
stores:
shared_validations_filesystem_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/shared_validations/
Confirm that the Validation Results Store has been correctly configured
Run a Checkpoint to store results in the new Validation Results Store in your new location, and then visualize the results by re-building Data Docs.
PostgreSQL
Use the information provided here to configure Great Expectations to store Validation Results in a PostgreSQL database.
Prerequisites
- A Data Context.
- An Expectations Suite.
- A Checkpoint.
- A PostgreSQL database with appropriate credentials.
Configure the config_variables.yml
file with your database credentials
GX recommends storing database credentials in the config_variables.yml
file, which is located in the uncommitted/
folder by default, and not part of source control.
-
To add database credentials, open
config_variables.yml
and add the following entry below thedb_creds
key:db_creds:
drivername: postgresql
host: '<your_host_name>'
port: '<your_port>'
username: '<your_username>'
password: '<your_password>'
database: '<your_database_name>'To configure the
config_variables.yml
file, or additional environment variables, see How to configure credentials. -
Optional. To use a specific schema as the backend, specify
schema
as an additional keyword argument. For example:db_creds:
drivername: postgresql
host: '<your_host_name>'
port: '<your_port>'
username: '<your_username>'
password: '<your_password>'
database: '<your_database_name>'
schema: '<your_schema_name>'
Identify your Data Context Validation Results Store
The configuration for your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. is available in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Open great_expectations.yml
and find the following entry:
validations_store_name: validations_store
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
This configuration tells Great Expectations to look for Validation Results in the validations_store
Store. The default base_directory
for validations_store
is uncommitted/validations/
.
Update your configuration file to include a new Validation Results Store
Add the following entry to your great_expectations.yml
:
validations_store_name: validations_postgres_store
stores:
validations_postgres_store:
class_name: ValidationsStore
store_backend:
class_name: DatabaseStoreBackend
credentials: ${db_creds}
In the previous example, validations_store_name
is set to validations_postgres_store
, but it can be personalized. Also, class_name
is set to DatabaseStoreBackend
, and credentials
is set to ${db_creds}
, which references the corresponding key in the config_variables.yml
file.
Confirm the addition of the new Validation Results Store
In the previous example, a validations_store
on the local filesystem and a validations_postgres_store
are configured. Great Expectations looks for Validation Results in PostgreSQL when the validations_store_name
variable is set to validations_postgres_store
. Run the following command to remove validations_store
and confirm the validations_postgres_store
configuration:
great_expectations store list
- name: validations_store
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
- name: validations_postgres_store
class_name: ValidationsStore
store_backend:
class_name: DatabaseStoreBackend
credentials:
database: '<your_db_name>'
drivername: postgresql
host: '<your_host_name>'
password: ******
port: '<your_port>'
username: '<your_username>'
Confirm the Validation Results Store is configured correctly
Run a Checkpoint to store results in the new Validation Results store in PostgreSQL, and then visualize the results by re-building Data Docs.
Great Expectations creates a new table in your database named ge_validations_store
, and populates the fields with information from the Validation Results.