Datasources are responsible for connecting data and compute infrastructure. Each Datasource provides Great Expectations Data Assets connected to a specific compute environment, such as a SQL database, a Spark cluster, or a local in-memory Pandas DataFrame. Datasources know how to access data from relevant sources such as an existing object from a DAG runner, a SQL database, an S3 bucket, GCS, or a local filesystem.

To bridge the gap between those worlds, Datasources can interact closely with Batch Kwargs Generators which are aware of a source of data and can produce produce identifying information, called “batch_kwargs” that datasources can use to get individual batches of data.

See Datasource Reference for more detail about configuring and using datasources in your DataContext.

See datasource module docs Datasource Module for more detail about available datasources.

last updated: Aug 13, 2020