Module Contents


LegacyDatasource(name, data_context=None, data_asset_type=None, batch_kwargs_generators=None, **kwargs)

A Datasource connects to a compute environment and one or more storage environments and produces batches of data

great_expectations.datasource.datasource.default_flow_style = False
class great_expectations.datasource.datasource.LegacyDatasource(name, data_context=None, data_asset_type=None, batch_kwargs_generators=None, **kwargs)

A Datasource connects to a compute environment and one or more storage environments and produces batches of data that Great Expectations can validate in that compute environment.

Each Datasource provides Batches connected to a specific compute environment, such as a SQL database, a Spark cluster, or a local in-memory Pandas DataFrame.

Datasources use Batch Kwargs to specify instructions for how to access data from relevant sources such as an existing object from a DAG runner, a SQL database, S3 bucket, or local filesystem.

To bridge the gap between those worlds, Datasources interact closely with generators which are aware of a source of data and can produce produce identifying information, called “batch_kwargs” that datasources can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, downsampling, or other techniques appropriate for the datasource.

For example, a batch kwargs generator could produce a SQL query that logically represents “rows in the Events table with a timestamp on February 7, 2012,” which a SqlAlchemyDatasource could use to materialize a SqlAlchemyDataset corresponding to that batch of data and ready for validation.

Since opinionated DAG managers such as airflow, dbt,, dagster can also act as datasources and/or batch kwargs generators for a more generic datasource.

When adding custom expectations by subclassing an existing DataAsset type, use the data_asset_type parameter to configure the datasource to load and return DataAssets of the custom type.

Feature Maturity

icon-c01e98ae99cf11ecaf600242ac110002 Datasource - S3 - How-to Guide
Support for connecting to Amazon Web Services S3 as an external datasource.
Maturity: Production
API Stability: medium
Implementation Completeness: Complete
Unit Test Coverage: : Complete
Integration Infrastructure/Test Coverage: None
Documentation Completeness: Minimal/Spotty
Bug Risk: Low
icon-c01e9a6699cf11ecaf600242ac110002 Datasource - Filesystem - How-to Guide
Support for using a mounted filesystem as an external datasource.
Maturity: Production
API Stability: Medium
Implementation Completeness: Complete
Unit Test Coverage: Complete
Integration Infrastructure/Test Coverage: Partial
Documentation Completeness: Partial
Bug Risk: Low (Moderate for Windows users because of path issues)
icon-c01e9b5699cf11ecaf600242ac110002 Datasource - GCS - How-to Guide
Support for Google Cloud Storage as an external datasource
Maturity: Experimental
API Stability: Medium (supported via native ‘gs://’ syntax in Pandas and Pyspark; medium because we expect configuration to evolve)
Implementation Completeness: Medium (works via passthrough, not via CLI)
Unit Test Coverage: Minimal
Integration Infrastructure/Test Coverage: Minimal
Documentation Completeness: Minimal
Bug Risk: Moderate
icon-c01e9c3c99cf11ecaf600242ac110002 Datasource - Azure Blob Storage - How-to Guide
Support for Microsoft Azure Blob Storage as an external datasource
Maturity: In Roadmap (Sub-Experimental - “Not Impossible”)
API Stability: N/A (Supported on Databricks Spark via ‘wasb://’ / ‘wasps://’ url; requires local download first for Pandas)
Implementation Completeness: Minimal
Unit Test Coverage: N/A
Integration Infrastructure/Test Coverage: N/A
Documentation Completeness: Minimal
Bug Risk: Unknown
classmethod from_configuration(cls, **kwargs)

Build a new datasource from a configuration dictionary.


**kwargs – configuration key-value pairs


the newly-created datasource

Return type

datasource (Datasource)

classmethod build_configuration(cls, class_name, module_name='great_expectations.datasource', data_asset_type=None, batch_kwargs_generators=None, **kwargs)

Build a full configuration object for a datasource, potentially including batch kwargs generators with defaults.

  • class_name – The name of the class for which to build the config

  • module_name – The name of the module in which the datasource class is located

  • data_asset_type – A ClassConfig dictionary

  • batch_kwargs_generators – BatchKwargGenerators configuration dictionary

  • **kwargs – Additional kwargs to be part of the datasource constructor’s initialization


A complete datasource configuration.

property name(self)

Property for datasource name

property config(self)
property data_context(self)

Property for attached DataContext


Build batch kwargs generator objects from the datasource configuration.



add_batch_kwargs_generator(self, name, class_name, **kwargs)

Add a BatchKwargGenerator to the datasource.

  • name (str) – the name of the new BatchKwargGenerator to add

  • class_name – class of the BatchKwargGenerator to add

  • kwargs – additional keyword arguments will be passed directly to the new BatchKwargGenerator’s constructor


BatchKwargGenerator (BatchKwargGenerator)

_build_batch_kwargs_generator(self, **kwargs)

Build a BatchKwargGenerator using the provided configuration and return the newly-built generator.

get_batch_kwargs_generator(self, name)

Get the (named) BatchKwargGenerator from a datasource


name (str) – name of BatchKwargGenerator (default value is ‘default’)


BatchKwargGenerator (BatchKwargGenerator)


List currently-configured BatchKwargGenerator for this datasource.


each dictionary includes “name” and “type” keys

Return type


process_batch_parameters(self, limit=None, dataset_options=None)

Use datasource-specific configuration to translate any batch parameters into batch kwargs at the datasource level.

  • limit (int) – a parameter all datasources must accept to allow limiting a batch to a smaller number of rows.

  • dataset_options (dict) – a set of kwargs that will be passed to the constructor of a dataset built using these batch_kwargs


Result will include both parameters passed via argument and configured parameters.

Return type


abstract get_batch(self, batch_kwargs, batch_parameters=None)

Get a batch of data from the datasource.

  • batch_kwargs – the BatchKwargs to use to construct the batch

  • batch_parameters – optional parameters to store as the reference description of the batch. They should reflect parameters that would provide the passed BatchKwargs.



get_available_data_asset_names(self, batch_kwargs_generator_names=None)

Returns a dictionary of data_asset_names that the specified batch kwarg generator can provide. Note that some batch kwargs generators may not be capable of describing specific named data assets, and some (such as filesystem glob batch kwargs generators) require the user to configure data asset names.


batch_kwargs_generator_names – the BatchKwargGenerator for which to get available data asset names.


  generator_name: {
    names: [ (data_asset_1, data_asset_1_type), (data_asset_2, data_asset_2_type) ... ]

Return type

dictionary consisting of sets of generator assets available for the specified generators

build_batch_kwargs(self, batch_kwargs_generator, data_asset_name=None, partition_id=None, **kwargs)