How to configure DataContext components using test_yaml_config
¶
test_yaml_config
is a convenience method for configuring the moving parts of a Great Expectations deployment. It allows you to quickly test out configs for Datasources and Stores. For many deployments of Great Expectations, these components (plus Expectations) are the only ones you’ll need.
Prerequisites: This how-to guide assumes you have already:
test_yaml_config
is primarily intended for use within a notebook, where you can iterate through an edit-run-check loop in seconds.
Steps¶
Instantiate a DataContext
Create a new Jupyter Notebook and instantiate a DataContext by running the following lines:
import great_expectations as gx context = gx.get_context()
Create or copy a yaml config
You can create your own, or copy an example. For this example, we’ll demonstrate using a Datasource that connects to postgresql.
config = """ class_name: SimpleSqlalchemyDatasource credentials: drivername: postgresql username: postgres password: "" host: localhost port: 5432 database: test_ci introspection: whole_table: {} """
Run context.test_yaml_config.
context.test_yaml_config( name="my_postgresql_datasource", yaml_config=config )
When executed,
test_yaml_config
will instantiate the component and run through aself_check
procedure to verify that the component works as expected.In the case of a Datasource, this means
confirming that the connection works,
gathering a list of available DataAssets (e.g. tables in SQL; files or folders in a filesystem), and
verifying that it can successfully fetch at least one Batch from the source.
The output will look something like this:
Attempting to instantiate class from config... Instantiating as a Datasource, since class_name is SimpleSqlalchemyDatasource Successfully instantiated SimpleSqlalchemyDatasource Execution engine: SqlAlchemyExecutionEngine Data connectors: whole_table : InferredAssetSqlDataConnector Available data_asset_names (3 of 14440): expect_table_row_count_to_equal_other_table_data_1 (1 of 1): [{}] expect_table_row_count_to_equal_other_table_data_2 (1 of 1): [{}] expect_table_row_count_to_equal_other_table_data_3 (1 of 1): [{}] Unmatched data_references (0 of 0): [] Choosing an example data reference... Reference chosen: {} Fetching batch data... Showing 5 rows c1 c2 c3 c4 0 4 a None 4.0 1 5 b None 3.0 2 6 c None 3.5 3 7 d None 1.2 <great_expectations.datasource.simple_sqlalchemy_datasource.SimpleSqlalchemyDatasource at 0x12c1e4d50>
If something about your configuration wasn’t set up correctly,
test_yaml_config
will raise an error. Whenever possible, test_yaml_config provides helpful warnings and error messages. It can’t solve every problem, but it can solve many.Attempting to instantiate class from config... Instantiating as a Datasource, since class_name is SimpleSqlalchemyDatasource --------------------------------------------------------------------------- OperationalError Traceback (most recent call last) ~/anaconda2/envs/py3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _wrap_pool_connect(self, fn, connection) 2338 try: -> 2339 return fn() 2340 except dialect.dbapi.Error as e: ... OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5433? (Background on this error at: http://sqlalche.me/e/13/e3q8)
Iterate as necessary.
From here, iterate by editing your config and re-running
test_yaml_config
, adding config blocks for additional introspection, data assets, sampling, etc. Please see <doc> for options and ideas.(Optional:) Test additional methods.
Note that when
test_yaml_config
runs successfully, it adds the specified Datasource to your DataContext. This means that you can also test other methods, such ascontext.get_validator
, right within your notebook:validator = context.get_validator( datasource_name="my_datasource", data_connector_name="whole_table", data_asset_name="my_table", create_expectation_suite_with_name="my_expectation_suite", ) validator.expect_column_values_to_be_in_set("c1", [4,5,6])
Save the config.
Once you are satisfied with your config, you can make it a permanent part of your Great Expectations setup by copying it into the appropriate section of your
great_expectations/great_expectations.yml
file.datasources: my_datasource: class_name: SimpleSqlalchemyDatasource credentials: drivername: postgresql username: postgres password: "" host: localhost port: 5432 database: test_ci introspection: whole_table: {}
Check your modified config.
In a fresh notebook, test your edited config file by re-instantiating your DataContext:
context = gx.get_context() validator = context.get_validator( datasource_name="my_datasource", data_connector_name="whole_table", data_asset_name="my_table", create_expectation_suite_with_name="my_expectation_suite", ) validator.expect_column_values_to_be_in_set("c1", [4,5,6])