How to configure an Expectation store in Amazon S3¶
By default, newly profiled Expectations are stored in JSON format in the
expectations/ subdirectory of your
great_expectations/ folder. This guide will help you configure Great Expectations to store them in an Amazon S3 bucket.
Prerequisites: This how-to guide assumes that you have already:
Configure boto3 to connect to the Amazon S3 bucket where Expectations will be stored.
Identify your Data Context Expectations Store
great_expectations.yml, look for the following lines. The configuration tells Great Expectations to look for Expectations in a store called
expectations_storeis set to
expectations_store_name: expectations_store stores: expectations_store: class_name: ExpectationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: expectations/
Update your configuration file to include a new store for Expectations on S3.
In our case, the name is set to
expectations_S3_store, but it can be any name you like. We also need to make some changes to the
class_namewill be set to
bucketwill be set to the address of your S3 bucket, and
prefixwill be set to the folder where Expectation files will be located.
expectations_store_name: expectations_S3_store stores: expectations_S3_store: class_name: ExpectationsStore store_backend: class_name: TupleS3StoreBackend bucket: '<your_s3_bucket_name>' prefix: '<your_s3_bucket_folder_name>'
Copy existing Expectation JSON files to the S3 bucket. (This step is optional).
One way to copy Expectations into Amazon S3 is by using the
aws s3 synccommand. As mentioned earlier, the
base_directoryis set to
expectations/by default. In the example below, two Expectations,
exp2are copied to Amazon S3. Your output should looks something like this:
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>' upload: ./exp1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp1.json upload: ./exp2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp2.json
Confirm that the new Expectations store has been added by running
great_expectations store list.
Notice the output contains two Expectation stores: the original
expectations_storeon the local filesystem and the
expectations_S3_storewe just configured. This is ok, since Great Expectations will look for Expectations in the S3 bucket as long as we set the
great_expectations store list - name: expectations_store class_name: ExpectationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: expectations/ - name: expectations_S3_store class_name: ExpectationsStore store_backend: class_name: TupleS3StoreBackend bucket: '<your_s3_bucket_name>' prefix: '<your_s3_bucket_folder_name>'
Confirm that Expectations can be accessed from Amazon S3 by running
great_expectations suite list.
If you followed Step 4, The output should include the 2 Expectations we copied to Amazon S3:
exp2. If you did not copy Expectations to the new Store, you will see a message saying no expectations were found.
great_expectations suite list 2 Expectation Suites found: - exp1 - exp2
If it would be useful to you, please comment with a +1 and feel free to add any suggestions or questions below. Also, please reach out to us on Slack if you would like to learn more, or have any questions.