How to load a database table, view, or query result as a batch

This guide shows how to get a batch of data that Great Expectations can validate from a SQL database. When you validate data using SQL, all compute is managed by your database, and only Validation Results are returned.

Prerequisites – this how-to guide assumes you have already:

Steps

  1. Construct Batch Kwargs that describe the data you plan to validate.

    For example, if your configured datasource is named “my_db” and you plan to validate data in a view called “my_view” located in the “my_schema” schema:

    • To manually build table or view-based Batch Kwargs, create a dictionary:

      batch_kwargs = {
          "datasource": "my_db",
          "schema": "my_schema",  # schema is optional; default schema will be used if it is omitted
          "table": "my_view"  # note that the "table" key is used even to validate a view
      }
      
    • To use a Batch Kwargs Generator, call the build_batch_kwargs method:

      For example, if your configured generator is called my_generator and you wish to access the my_view asset, you can call:

      batch_kwargs = context.build_batch_kwargs("my_db", "my_generator", "my_view")
      
    • To manually build query-based Batch Kwargs, create a dictionary:

      batch_kwargs = {
          "datasource": "my_db",
          "query": "SELECT * FROM my_schema.my_table WHERE '1988-01-01' <= date AND date < '1989-01-01';
      }
      
    • To use a Query Batch Kwargs Generator, call the build_batch_kwargs method:

      batch_kwargs = context.build_batch_kwargs(
          "my_db",
          "queries",
          "movies_by_date",
          query_parameters={
              "start": "1988-01-01",
              "end": "1989-01-01"
          }
      )
      
  2. Obtain an Expectation Suite to use to validate your batch.

    expectation_suite_name = "npi.warning"  # choose an appropriate name for your suite
    

    If you have not already created a suite, you can do so now.

    # Note, you can add the "overwrite_existing" flag to the below command if the suite
    # exists but you would like to replace it.
    context.create_expectation_suite(expectation_suite_name)
    
  3. Get the batch to validate.

    batch = context.get_batch(
        batch_kwargs=batch_kwargs,
        expectation_suite_name=expectation_suite_name
    )
    

Now that you have a Batch, you can use it to create Expectations or validate the data.

Additional Notes

  • If you are using Snowflake, and you have lowercase table or column names: * If you are loading your batch with a table, you can use pass “use_quoted_name”:True into your batch_kwargs dictionary. This will use the SQL Alchemy quoted_name method to ensure case sensitivity for your table and column names. * If you are loading your batch with a query, if you have lowercase column names, you still need to pass “use_quoted_name”:True into your batch_kwargs dictionary. You will also need to wrap your query in single quotes, and your table or column name in double quotes like so:

    batch_kwargs = {
        ...
        "use_quoted_name": True,
        "query: 'select "lowercase_column_one", "lowercase_column_two" from "lowercase_table_name" limit 100'
        ...
    }
    
  • For more information on configuring a Batch Kwargs generator, please see the relevant guides. The above code snippets use the following configuration:

    my_db:
      class_name: SqlAlchemyDatasource
      credentials: ${rds_movies_db}
      data_asset_type:
        class_name: SqlAlchemyDataset
        module_name: great_expectations.dataset
      batch_kwargs_generators:
        tables:
          class_name: TableBatchKwargsGenerator
        queries:
          class_name: QueryBatchKwargsGenerator
          query_store_backend:
            class_name: TupleFilesystemStoreBackend
            filepath_suffix: .sql
            base_directory: queries
    
    great_expectations/
        queries/
            movies_by_date.sql
    
    SELECT * FROM movies WHERE '$start'::date <= release_date AND release_date <= '$end'::date;