How to configure an Expectation store in GCS¶
By default, newly profiled Expectations are stored in JSON format in the expectations/
subdirectory of your great_expectations/
folder. This guide will help you configure Great Expectations to store them in a Google Cloud Storage (GCS) bucket.
Prerequisites: This how-to guide assumes that you have already:
Configured a Data Context.
Configured an Expectations Suite.
Configured a Google Cloud Platform (GCP) service account with credentials that can access the appropriate GCP resources, which include Storage Objects.
Identified the GCP project, GCS bucket, and prefix where Expectations will be stored.
Steps¶
Show Docs for V2 (Batch Kwargs) API
Configure your GCP credentials
Check that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Expectations will be stored.
The Google Cloud Platform documentation describes how to verify your authentication for the Google Cloud API, which includes:
Creating a Google Cloud Platform (GCP) service account,
Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable,Verifying authentication by running a simple Google Cloud Storage client library script.
Identify your Data Context Expectations Store
In your
great_expectations.yml
, look for the following lines. The configuration tells Great Expectations to look for Expectations in a store calledexpectations_store
. Thebase_directory
forexpectations_store
is set toexpectations/
by default.expectations_store_name: expectations_store stores: expectations_store: class_name: ExpectationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: expectations/
Update your configuration file to include a new store for Expectations on GCS
In our case, the name is set to
expectations_GCS_store
, but it can be any name you like. We also need to make some changes to thestore_backend
settings. Theclass_name
will be set toTupleGCSStoreBackend
,project
will be set to your GCP project,bucket
will be set to the address of your GCS bucket, andprefix
will be set to the folder on GCS where Expectation files will be located.Warning
If you are also storing Validations in GCS or DataDocs in GCS, please ensure that the
prefix
values are disjoint and one is not a substring of the other.expectations_store_name: expectations_GCS_store stores: expectations_GCS_store: class_name: ExpectationsStore store_backend: class_name: TupleGCSStoreBackend project: '<your_GCP_project_name>' bucket: '<your_GCS_bucket_name>' prefix: '<your_GCS_folder_name>'
Copy existing Expectation JSON files to the GCS bucket. (This step is optional).
One way to copy Expectations into GCS is by using the
gsutil cp
command, which is part of the Google Cloud SDK. The following example will copy one Expectation,exp1
from a local folder to the GCS bucket. Information on other ways to copy Expectation JSON files, like the Cloud Storage browser in the Google Cloud Console, can be found in the Documentation for Google Cloud.gsutil cp exp1.json gs://'<your_GCS_bucket_name>'/'<your_GCS_folder_name>' Operation completed over 1 objects/58.8 KiB.
Confirm that the new Expectations store has been added by running
great_expectations store list
.Notice the output contains two Expectation stores: the original
expectations_store
on the local filesystem and theexpectations_GCS_store
we just configured. This is ok, since Great Expectations will look for Expectations in GCS as long as we set theexpectations_store_name
variable toexpectations_GCS_store
, and the config forexpectations_store
can be removed if you would like.great_expectations store list - name: expectations_store class_name: ExpectationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: expectations/ - name: expectations_GCS_store class_name: ExpectationsStore store_backend: class_name: TupleGCSStoreBackend project: '<your_GCP_project_name>' bucket: '<your_GCS_bucket_name>' prefix: '<your_GCS_folder_name>'
Confirm that Expectations can be accessed from GCS by running
great_expectations suite list
.If you followed Step 4, the output should include the Expectation we copied to GCS:
exp1
. If you did not copy Expectations to the new Store, you will see a message saying no Expectations were found.great_expectations suite list 1 Expectation Suite found: - exp1
Show Docs for V3 (Batch Request) API
Configure your GCP credentials
Check that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Expectations will be stored.
The Google Cloud Platform documentation describes how to verify your authentication for the Google Cloud API, which includes:
Creating a Google Cloud Platform (GCP) service account,
Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable,Verifying authentication by running a simple Google Cloud Storage client library script.
Identify your Data Context Expectations Store
In your
great_expectations.yml
, look for the following lines. The configuration tells Great Expectations to look for Expectations in a store calledexpectations_store
. Thebase_directory
forexpectations_store
is set toexpectations/
by default.expectations_store_name: expectations_store stores: expectations_store: class_name: ExpectationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: expectations/
Update your configuration file to include a new store for Expectations on GCS
In our case, the name is set to
expectations_GCS_store
, but it can be any name you like. We also need to make some changes to thestore_backend
settings. Theclass_name
will be set toTupleGCSStoreBackend
,project
will be set to your GCP project,bucket
will be set to the address of your GCS bucket, andprefix
will be set to the folder on GCS where Expectation files will be located.Warning
If you are also storing Validations in GCS or DataDocs in GCS, please ensure that the
prefix
values are disjoint and one is not a substring of the other.expectations_store_name: expectations_GCS_store stores: expectations_GCS_store: class_name: ExpectationsStore store_backend: class_name: TupleGCSStoreBackend project: '<your_GCP_project_name>' bucket: '<your_GCS_bucket_name>' prefix: '<your_GCS_folder_name>'
Copy existing Expectation JSON files to the GCS bucket. (This step is optional).
One way to copy Expectations into GCS is by using the
gsutil cp
command, which is part of the Google Cloud SDK. The following example will copy one Expectation,exp1
from a local folder to the GCS bucket. Information on other ways to copy Expectation JSON files, like the Cloud Storage browser in the Google Cloud Console, can be found in the Documentation for Google Cloud.gsutil cp exp1.json gs://'<your_GCS_bucket_name>'/'<your_GCS_folder_name>' Operation completed over 1 objects/58.8 KiB.
Confirm that the new Expectations store has been added by running
great_expectations --v3-api store list
.Notice the output contains two Expectation stores: the original
expectations_store
on the local filesystem and theexpectations_GCS_store
we just configured. This is ok, since Great Expectations will look for Expectations in GCS as long as we set theexpectations_store_name
variable toexpectations_GCS_store
, and the config forexpectations_store
can be removed if you would like.great_expectations --v3-api store list - name: expectations_store class_name: ExpectationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: expectations/ - name: expectations_GCS_store class_name: ExpectationsStore store_backend: class_name: TupleGCSStoreBackend project: '<your_GCP_project_name>' bucket: '<your_GCS_bucket_name>' prefix: '<your_GCS_folder_name>'
Confirm that Expectations can be accessed from GCS by running
great_expectations --v3-api suite list
.If you followed Step 4, the output should include the Expectation we copied to GCS:
exp1
. If you did not copy Expectations to the new Store, you will see a message saying no Expectations were found.great_expectations --v3-api suite list 1 Expectation Suite found: - exp1
If it would be useful to you, please comment with a +1 and feel free to add any suggestions or questions below. Also, please reach out to us on Slack if you would like to learn more, or have any questions.