great_expectations.datasource.data_connector.configured_asset_gcs_data_connector

Module Contents

Classes

ConfiguredAssetGCSDataConnector(name: str, datasource_name: str, bucket_or_name: str, assets: dict, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, max_results: Optional[int] = None, gcs_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None)

Extension of ConfiguredAssetFilePathDataConnector used to connect to GCS

great_expectations.datasource.data_connector.configured_asset_gcs_data_connector.logger
great_expectations.datasource.data_connector.configured_asset_gcs_data_connector.storage
class great_expectations.datasource.data_connector.configured_asset_gcs_data_connector.ConfiguredAssetGCSDataConnector(name: str, datasource_name: str, bucket_or_name: str, assets: dict, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, max_results: Optional[int] = None, gcs_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.configured_asset_file_path_data_connector.ConfiguredAssetFilePathDataConnector

Extension of ConfiguredAssetFilePathDataConnector used to connect to GCS

DataConnectors produce identifying information, called “batch_spec” that ExecutionEngines can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, splitting and sampling, or other techniques appropriate for obtaining batches of data.

The ConfiguredAssetGCSDataConnector is one of two classes (InferredAssetGCSDataConnector being the other one) designed for connecting to data on GCS.

A ConfiguredAssetGCSDataConnector requires an explicit specification of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup. Please note that in order to maintain consistency with Google’s official SDK, we utilize terms like “bucket_or_name” and “max_results”. Since we convert these keys from YAML to Python and directly pass them in to the GCS connection object, maintaining consistency is necessary for proper usage.

This DataConnector supports the following methods of authentication:
  1. Standard gcloud auth / GOOGLE_APPLICATION_CREDENTIALS environment variable workflow

  2. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_file

  3. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_info

As much of the interaction with the SDK is done through a GCS Storage Client, please refer to the official docs if a greater understanding of the supported authentication methods and general functionality is desired. Source: https://googleapis.dev/python/google-api-core/latest/auth.html

build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

_get_data_reference_list_for_asset(self, asset: Optional[Asset])
_get_full_file_path_for_asset(self, path: str, asset: Optional[Asset] = None)