How to load a database table, view, or query result as a batch

This guide shows how to get a batch of data that Great Expectations can validate from a SQL database. When you validate data using SQL, all compute is managed by your database, and only Validation Results are returned.

Prerequisites – this how-to guide assumes you have already:

  • Set up a working deployment of Great Expectations

  • Configured a SQL datasource

  • Identified a table name, view name, or query that you would like to use as the data to validate.

  • Optionally, configured a Batch Kwargs Generator to support inspecting your database and dynamically building Batch Kwargs.

Steps

  1. Construct Batch Kwargs that describe the data you plan to validate.

    For example, if your configured datasource is named “my_db” and you plan to validate data in a view called “my_view” located in the “my_schema” schema:

    • To manually build table or view-based Batch Kwargs, create a dictionary:

      batch_kwargs = {
          "datasource": "my_db",
          "schema": "my_schema",  # schema is optional; default schema will be used if it is omitted
          "table": "my_view"  # note that the "table" key is used even to validate a view
      }
      
    • To use a Batch Kwargs Generator, call the build_batch_kwargs method:

      For example, if your configured generator is called my_generator and you wish to access the my_view asset, you can call:

      batch_kwargs = context.build_batch_kwargs("my_db", "my_generator", "my_view")
      
    • To manually build query-based Batch Kwargs, create a dictionary:

      batch_kwargs = {
          "datasource": "my_db",
          "query": "SELECT * FROM my_schema.my_table WHERE '1988-01-01' <= date AND date < '1989-01-01';
      }
      
    • To use a Query Batch Kwargs Generator, call the build_batch_kwargs method:

      batch_kwargs = context.build_batch_kwargs(
          "my_db",
          "queries",
          "movies_by_date",
          query_parameters={
              "start": "1988-01-01",
              "end": "1989-01-01"
          }
      )
      
  2. Obtain an Expectation Suite to use to validate your batch.

    expectation_suite_name = "npi.warning"  # choose an appropriate name for your suite
    

    If you have not already created a suite, you can do so now.

    # Note, you can add the "overwrite_existing" flag to the below command if the suite
    # exists but you would like to replace it.
    context.create_expectation_suite(expectation_suite_name)
    
  3. Get the batch to validate.

    batch = context.get_batch(
        batch_kwargs=batch_kwargs,
        expectation_suite_name=expectation_suite_name
    )
    

Now that you have a Batch, you can use it to create Expectations or validate the data.

Additional Notes

  • For more information on configuring a Batch Kwargs generator, please see the relevant guides. The above code snippets use the following configuration:

    my_db:
      class_name: SqlAlchemyDatasource
      credentials: ${rds_movies_db}
      data_asset_type:
        class_name: SqlAlchemyDataset
        module_name: great_expectations.dataset
      batch_kwargs_generators:
        tables:
          class_name: TableBatchKwargsGenerator
        queries:
          class_name: QueryBatchKwargsGenerator
          query_store_backend:
            class_name: TupleFilesystemStoreBackend
            filepath_suffix: .sql
            base_directory: queries
    
    great_expectations/
        queries/
            movies_by_date.sql
    
    SELECT * FROM movies WHERE '$start'::date <= release_date AND release_date <= '$end'::date;