great_expectations.datasource.data_connector.file_path_data_connector

Module Contents

Classes

FilePathDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Base-class for DataConnector that are designed for connecting to filesystem-like data, which can include

great_expectations.datasource.data_connector.file_path_data_connector.logger
class great_expectations.datasource.data_connector.file_path_data_connector.FilePathDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.data_connector.DataConnector

Base-class for DataConnector that are designed for connecting to filesystem-like data, which can include files on disk, but also S3 and GCS.

Note: FilePathDataConnector is not meant to be used on its own, but extended. Currently ConfiguredAssetFilePathDataConnector and InferredAssetFilePathDataConnector are subclasses of FilePathDataConnector.

property sorters(self)
_get_data_reference_list_from_cache_by_data_asset_name(self, data_asset_name: str)

Fetch data_references corresponding to data_asset_name from the cache.

get_batch_definition_list_from_batch_request(self, batch_request: BatchRequest)

Retrieve batch_definitions and that match batch_request.

First retrieves all batch_definitions that match batch_request
  • if batch_request also has a batch_filter, then select batch_definitions that match batch_filter.

  • if data_connector has sorters configured, then sort the batch_definition list before returning.

Parameters

batch_request (BatchRequest) – BatchRequest (containing previously validated attributes) to process

Returns

A list of BatchDefinition objects that match BatchRequest

_get_batch_definition_list_from_batch_request(self, batch_request: BatchRequestBase)

Retrieve batch_definitions that match batch_request.

First retrieves all batch_definitions that match batch_request
  • if batch_request also has a batch_filter, then select batch_definitions that match batch_filter.

  • if data_connector has sorters configured, then sort the batch_definition list before returning.

Parameters

batch_request (BatchRequestBase) – BatchRequestBase (BatchRequest without attribute validation) to process

Returns

A list of BatchDefinition objects that match BatchRequest

_sort_batch_definition_list(self, batch_definition_list: List[BatchDefinition])

Use configured sorters to sort batch_definition

Parameters

batch_definition_list (list) – list of batch_definitions to sort

Returns

sorted list of batch_definitions

_map_data_reference_to_batch_definition_list(self, data_reference: str, data_asset_name: str = None)
_map_batch_definition_to_data_reference(self, batch_definition: BatchDefinition)
build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

static sanitize_prefix(text: str)

Takes in a given user-prefix and cleans it to work with file-system traversal methods (i.e. add ‘/’ to the end of a string meant to represent a directory)

_generate_batch_spec_parameters_from_batch_definition(self, batch_definition: BatchDefinition)
_validate_batch_request(self, batch_request: BatchRequestBase)
Validate batch_request by checking:
  1. if configured datasource_name matches batch_request’s datasource_name

  2. if current data_connector_name matches batch_request’s data_connector_name

Parameters

batch_request (BatchRequestBase) – batch_request object to validate

_validate_sorters_configuration(self, data_asset_name: Optional[str] = None)
abstract _get_batch_definition_list_from_cache(self)
abstract _get_regex_config(self, data_asset_name: Optional[str] = None)
abstract _get_full_file_path(self, path: str, data_asset_name: Optional[str] = None)