great_expectations.datasource.data_connector.runtime_data_connector
¶
Module Contents¶
Classes¶
|
A DataConnector that allows users to specify a Batch’s data directly using a RuntimeBatchRequest that contains |
-
great_expectations.datasource.data_connector.runtime_data_connector.
logger
¶
-
great_expectations.datasource.data_connector.runtime_data_connector.
DEFAULT_DELIMITER
:str = -¶
-
class
great_expectations.datasource.data_connector.runtime_data_connector.
RuntimeDataConnector
(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_identifiers: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)¶ Bases:
great_expectations.datasource.data_connector.data_connector.DataConnector
A DataConnector that allows users to specify a Batch’s data directly using a RuntimeBatchRequest that contains either an in-memory Pandas or Spark DataFrame, a filesystem or S3 path, or an arbitrary SQL query
- Parameters
name (str) – The name of this DataConnector
datasource_name (str) – The name of the Datasource that contains it
execution_engine (ExecutionEngine) – An ExecutionEngine
batch_identifiers (list) – a list of keys that must be defined in the batch_identifiers dict of RuntimeBatchRequest
batch_spec_passthrough (dict) – dictionary with keys that will be added directly to batch_spec
-
_refresh_data_references_cache
(self)¶
-
_get_data_reference_list
(self, data_asset_name: Optional[str] = None)¶ List objects in the cache to create a list of data_references. If data_asset_name is passed in, method will return all data_references for the named data_asset. If no data_asset_name is passed in, will return a list of all data_references for all data_assets in the cache.
-
_get_data_reference_list_from_cache_by_data_asset_name
(self, data_asset_name: str)¶ Fetch data_references corresponding to data_asset_name from the cache.
-
get_data_reference_list_count
(self)¶ Get number of data_references corresponding to all data_asset_names in cache. In cases where the RuntimeDataConnector has been passed a BatchRequest with the same data_asset_name but different batch_identifiers, it is possible to have more than one data_reference for a data_asset.
-
get_unmatched_data_references
(self)¶
-
get_available_data_asset_names
(self)¶ Please see note in : _get_batch_definition_list_from_batch_request()
-
get_batch_data_and_metadata
(self, batch_definition: BatchDefinition, runtime_parameters: dict)¶ Uses batch_definition to retrieve batch_data and batch_markers by building a batch_spec from batch_definition, then using execution_engine to return batch_data and batch_markers
- Parameters
batch_definition (BatchDefinition) – required batch_definition parameter for retrieval
-
get_batch_definition_list_from_batch_request
(self, batch_request: RuntimeBatchRequest)¶
-
_get_batch_definition_list_from_batch_request
(self, batch_request: RuntimeBatchRequest)¶ <Will> 202103. The following behavior of the _data_references_cache follows a pattern that we are using for other data_connectors, including variations of FilePathDataConnector. When BatchRequest contains batch_data that is passed in as a in-memory dataframe, the cache will contain the names of all data_assets (and data_references) that have been passed into the RuntimeDataConnector in this session, even though technically only the most recent batch_data is available. This can be misleading. However, allowing the RuntimeDataConnector to keep a record of all data_assets (and data_references) that have been passed in will allow for the proposed behavior of RuntimeBatchRequest which will allow for paths and queries to be passed in as part of the BatchRequest. Therefore this behavior will be revisited when the design of RuntimeBatchRequest and related classes are complete.
-
_update_data_references_cache
(self, data_asset_name: str, batch_definition_list: List, batch_identifiers: IDDict)¶
-
_self_check_fetch_batch
(self, pretty_print, example_data_reference, data_asset_name)¶ Helper function for self_check() to retrieve batch using example_data_reference and data_asset_name, all while printing helpful messages. First 5 rows of batch_data are printed by default.
- Parameters
pretty_print (bool) – print to console?
example_data_reference (Any) – data_reference to retrieve
data_asset_name (str) – data_asset_name to retrieve
-
_generate_batch_spec_parameters_from_batch_definition
(self, batch_definition: BatchDefinition)¶
-
build_batch_spec
(self, batch_definition: BatchDefinition, runtime_parameters: dict)¶ Builds batch_spec from batch_definition by generating batch_spec params and adding any pass_through params
- Parameters
batch_definition (BatchDefinition) – required batch_definition parameter for retrieval
- Returns
BatchSpec object built from BatchDefinition
-
static
_get_data_reference_name
(batch_identifiers: IDDict)¶
-
static
_validate_runtime_parameters
(runtime_parameters: Union[dict, type(None)])¶
-
_validate_batch_request
(self, batch_request: RuntimeBatchRequest)¶ - Validate batch_request by checking:
if configured datasource_name matches batch_request’s datasource_name
if current data_connector_name matches batch_request’s data_connector_name
- Parameters
batch_request (BatchRequestBase) – batch_request object to validate
-
_validate_batch_identifiers
(self, batch_identifiers: dict)¶
-
_validate_batch_identifiers_configuration
(self, batch_identifiers: List[str])¶
-
self_check
(self, pretty_print=True, max_examples=3)¶ Overrides the self_check method for RuntimeDataConnector. Normally the self_check() method will check the configuration of the DataConnector by doing the following :
refresh or create data_reference_cache
print batch_definition_count and example_data_references for each data_asset_names
also print unmatched data_references, and allow the user to modify the regex or glob configuration if necessary
However, in the case of the RuntimeDataConnector there is no example data_asset_names until the data is passed in through the RuntimeBatchRequest. Therefore, there will be a note displayed to the user saying that RuntimeDataConnector will not have data_asset_names until they are passed in through RuntimeBatchRequest.
- Parameters
pretty_print (bool) – should the output be printed?
max_examples (int) – how many data_references should be printed?
- Returns
dictionary containing self_check output
- Return type
report_obj (dict)