Deploy Great Expectations in hosted environments without a file system
The components in the great_expectations.yml
file define the Validation Results Stores, Data Source connections, and Data Docs hosts for a Data Context. These components might be inaccessible in hosted environments, such as Databricks, Amazon EMR, and Google Cloud Composer. The information provided here is intended to help you use Great Expectations in hosted environments.
Configure your Data Context
To use code to create a Data Context, see Instantiate an Ephemeral Data Context.
To configure a Data Context for a specific environment, see one of the following resources:
- How to instantiate a Data Context on an EMR Spark cluster
- How to use Great Expectations in Databricks
Create Expectation Suites and add Expectations
To add a Data Source and an Expectation Suite, see How to connect to a PostgreSQL database.
To add Expectations to your Suite individually, use the following code:
validator.expect_column_values_to_not_be_null("my_column")
validator.save_expectation_suite(discard_failed_expectations=False)
To configure your Expectation store to load a Suite at a later time, see Configure Expectation Stores.
Run validation
To create and run a Checkpoint in code, see How to create a new Checkpoint. In a hosted environment you will not be able to store the Checkpoint for repeated use across Python sessions, but you can recreate it each time your scripts run.
Use Data Docs
To build and view Data Docs in your environment, see Options for hosting Data Docs.