great_expectations.datasource.data_connector.inferred_asset_gcs_data_connector

Module Contents

Classes

InferredAssetGCSDataConnector(name: str, datasource_name: str, bucket_or_name: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, max_results: Optional[int] = None, gcs_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)

Extension of ConfiguredAssetFilePathDataConnector used to connect to GCS

great_expectations.datasource.data_connector.inferred_asset_gcs_data_connector.logger
great_expectations.datasource.data_connector.inferred_asset_gcs_data_connector.storage
class great_expectations.datasource.data_connector.inferred_asset_gcs_data_connector.InferredAssetGCSDataConnector(name: str, datasource_name: str, bucket_or_name: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, max_results: Optional[int] = None, gcs_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)

Bases: great_expectations.datasource.data_connector.inferred_asset_file_path_data_connector.InferredAssetFilePathDataConnector

Extension of ConfiguredAssetFilePathDataConnector used to connect to GCS

DataConnectors produce identifying information, called “batch_spec” that ExecutionEngines can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, splitting and sampling, or other techniques appropriate for obtaining batches of data.

The InferredAssetGCSDataConnector is one of two classes (ConfiguredAssetGCSDataConnector being the other one) designed for connecting to data on GCS.

An InferredAssetGCSDataConnector uses regular expressions to traverse through GCS buckets and implicitly determine data_asset_names. Please note that in order to maintain consistency with Google’s official SDK, we utilize terms like “bucket_or_name” and “max_results”. Since we convert these keys from YAML to Python and directly pass them in to the GCS connection object, maintaining consistency is necessary for proper usage.

This DataConnector supports the following methods of authentication:
  1. Standard gcloud auth / GOOGLE_APPLICATION_CREDENTIALS environment variable workflow

  2. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_file

  3. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_info

As much of the interaction with the SDK is done through a GCS Storage Client, please refer to the official docs if a greater understanding of the supported authentication methods and general functionality is desired. Source: https://googleapis.dev/python/google-api-core/latest/auth.html

build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

_get_data_reference_list(self, data_asset_name: Optional[str] = None)

List objects in the underlying data store to create a list of data_references. This method is used to refresh the cache by classes that extend this base DataConnector class

Parameters

data_asset_name (str) – optional data_asset_name to retrieve more specific results

_get_full_file_path(self, path: str, data_asset_name: Optional[str] = None)