Module Contents


RuntimeDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_identifiers: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

A DataConnector that allows users to specify a Batch’s data directly using a RuntimeBatchRequest that contains

great_expectations.datasource.data_connector.runtime_data_connector.DEFAULT_DELIMITER :str = -
class great_expectations.datasource.data_connector.runtime_data_connector.RuntimeDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_identifiers: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.data_connector.DataConnector

A DataConnector that allows users to specify a Batch’s data directly using a RuntimeBatchRequest that contains either an in-memory Pandas or Spark DataFrame, a filesystem or S3 path, or an arbitrary SQL query

  • name (str) – The name of this DataConnector

  • datasource_name (str) – The name of the Datasource that contains it

  • execution_engine (ExecutionEngine) – An ExecutionEngine

  • batch_identifiers (list) – a list of keys that must be defined in the batch_identifiers dict of RuntimeBatchRequest

  • batch_spec_passthrough (dict) – dictionary with keys that will be added directly to batch_spec

_get_data_reference_list(self, data_asset_name: Optional[str] = None)

List objects in the cache to create a list of data_references. If data_asset_name is passed in, method will return all data_references for the named data_asset. If no data_asset_name is passed in, will return a list of all data_references for all data_assets in the cache.

_get_data_reference_list_from_cache_by_data_asset_name(self, data_asset_name: str)

Fetch data_references corresponding to data_asset_name from the cache.


Get number of data_references corresponding to all data_asset_names in cache. In cases where the RuntimeDataConnector has been passed a BatchRequest with the same data_asset_name but different batch_identifiers, it is possible to have more than one data_reference for a data_asset.


Please see note in : _get_batch_definition_list_from_batch_request()

get_batch_data_and_metadata(self, batch_definition: BatchDefinition, runtime_parameters: dict)

Uses batch_definition to retrieve batch_data and batch_markers by building a batch_spec from batch_definition, then using execution_engine to return batch_data and batch_markers


batch_definition (BatchDefinition) – required batch_definition parameter for retrieval

get_batch_definition_list_from_batch_request(self, batch_request: RuntimeBatchRequest)
_get_batch_definition_list_from_batch_request(self, batch_request: RuntimeBatchRequest)

<Will> 202103. The following behavior of the _data_references_cache follows a pattern that we are using for other data_connectors, including variations of FilePathDataConnector. When BatchRequest contains batch_data that is passed in as a in-memory dataframe, the cache will contain the names of all data_assets (and data_references) that have been passed into the RuntimeDataConnector in this session, even though technically only the most recent batch_data is available. This can be misleading. However, allowing the RuntimeDataConnector to keep a record of all data_assets (and data_references) that have been passed in will allow for the proposed behavior of RuntimeBatchRequest which will allow for paths and queries to be passed in as part of the BatchRequest. Therefore this behavior will be revisited when the design of RuntimeBatchRequest and related classes are complete.

_update_data_references_cache(self, data_asset_name: str, batch_definition_list: List, batch_identifiers: IDDict)
_self_check_fetch_batch(self, pretty_print, example_data_reference, data_asset_name)

Helper function for self_check() to retrieve batch using example_data_reference and data_asset_name, all while printing helpful messages. First 5 rows of batch_data are printed by default.

  • pretty_print (bool) – print to console?

  • example_data_reference (Any) – data_reference to retrieve

  • data_asset_name (str) – data_asset_name to retrieve

_generate_batch_spec_parameters_from_batch_definition(self, batch_definition: BatchDefinition)
build_batch_spec(self, batch_definition: BatchDefinition, runtime_parameters: dict)

Builds batch_spec from batch_definition by generating batch_spec params and adding any pass_through params


batch_definition (BatchDefinition) – required batch_definition parameter for retrieval


BatchSpec object built from BatchDefinition

static _get_data_reference_name(batch_identifiers: IDDict)
static _validate_runtime_parameters(runtime_parameters: Union[dict, type(None)])
_validate_batch_request(self, batch_request: RuntimeBatchRequest)
Validate batch_request by checking:
  1. if configured datasource_name matches batch_request’s datasource_name

  2. if current data_connector_name matches batch_request’s data_connector_name


batch_request (BatchRequestBase) – batch_request object to validate

_validate_batch_identifiers(self, batch_identifiers: dict)
_validate_batch_identifiers_configuration(self, batch_identifiers: List[str])
self_check(self, pretty_print=True, max_examples=3)

Overrides the self_check method for RuntimeDataConnector. Normally the self_check() method will check the configuration of the DataConnector by doing the following :

  1. refresh or create data_reference_cache

  2. print batch_definition_count and example_data_references for each data_asset_names

  3. also print unmatched data_references, and allow the user to modify the regex or glob configuration if necessary

However, in the case of the RuntimeDataConnector there is no example data_asset_names until the data is passed in through the RuntimeBatchRequest. Therefore, there will be a note displayed to the user saying that RuntimeDataConnector will not have data_asset_names until they are passed in through RuntimeBatchRequest.

  • pretty_print (bool) – should the output be printed?

  • max_examples (int) – how many data_references should be printed?


dictionary containing self_check output

Return type

report_obj (dict)