great_expectations.execution_engine.sqlalchemy_execution_engine

Module Contents

Classes

SqlAlchemyExecutionEngine(name: Optional[str] = None, credentials: Optional[dict] = None, data_context: Optional[Any] = None, engine=None, connection_string: Optional[str] = None, url: Optional[str] = None, batch_data_dict: Optional[dict] = None, create_temp_table: bool = True, concurrency: Optional[ConcurrencyConfig] = None, **kwargs)

Helper class that provides a standard way to create an ABC using

Functions

_get_dialect_type_module(dialect)

Given a dialect, returns the dialect type, which is defines the engine/system that is used to communicates

great_expectations.execution_engine.sqlalchemy_execution_engine.__version__
great_expectations.execution_engine.sqlalchemy_execution_engine.logger
great_expectations.execution_engine.sqlalchemy_execution_engine.make_url
great_expectations.execution_engine.sqlalchemy_execution_engine.reflection
great_expectations.execution_engine.sqlalchemy_execution_engine.sqlalchemy_psycopg2
great_expectations.execution_engine.sqlalchemy_execution_engine.sqlalchemy_redshift
great_expectations.execution_engine.sqlalchemy_execution_engine.sqlalchemy_dremio
great_expectations.execution_engine.sqlalchemy_execution_engine.snowflake
great_expectations.execution_engine.sqlalchemy_execution_engine._BIGQUERY_MODULE_NAME = sqlalchemy_bigquery
great_expectations.execution_engine.sqlalchemy_execution_engine.bigquery_types_tuple
great_expectations.execution_engine.sqlalchemy_execution_engine.teradatasqlalchemy
great_expectations.execution_engine.sqlalchemy_execution_engine._get_dialect_type_module(dialect)

Given a dialect, returns the dialect type, which is defines the engine/system that is used to communicates with the database/database implementation. Currently checks for RedShift/BigQuery dialects

class great_expectations.execution_engine.sqlalchemy_execution_engine.SqlAlchemyExecutionEngine(name: Optional[str] = None, credentials: Optional[dict] = None, data_context: Optional[Any] = None, engine=None, connection_string: Optional[str] = None, url: Optional[str] = None, batch_data_dict: Optional[dict] = None, create_temp_table: bool = True, concurrency: Optional[ConcurrencyConfig] = None, **kwargs)

Bases: great_expectations.execution_engine.ExecutionEngine

Helper class that provides a standard way to create an ABC using inheritance.

property credentials(self)
property connection_string(self)
property url(self)
_build_engine(self, credentials: dict, **kwargs)

Using a set of given credentials, constructs an Execution Engine , connecting to a database using a URL or a private key path.

_get_sqlalchemy_key_pair_auth_url(self, drivername: str, credentials: dict)

Utilizing a private key path and a passphrase in a given credentials dictionary, attempts to encode the provided values into a private key. If passphrase is incorrect, this will fail and an exception is raised.

Parameters
  • drivername (str) –

  • credentials (dict) –

Returns

a tuple consisting of a url with the serialized key-pair authentication, and a dictionary of engine kwargs.

get_domain_records(self, domain_kwargs: Dict)

Uses the given domain kwargs (which include row_condition, condition_parser, and ignore_row_if directives) to obtain and/or query a batch. Returns in the format of an SqlAlchemy table/column(s) object.

Parameters

domain_kwargs (dict) –

Returns

An SqlAlchemy table/column(s) (the selectable object for obtaining data on which to compute)

get_compute_domain(self, domain_kwargs: Dict, domain_type: Union[str, MetricDomainTypes], accessor_keys: Optional[Iterable[str]] = None)

Uses a given batch dictionary and domain kwargs to obtain a SqlAlchemy column object.

Parameters
  • domain_kwargs (dict) –

  • domain_type (str or MetricDomainTypes) –

  • to be using, or a corresponding string value representing it. String types include "identity", (like) –

  • "column_pair", "table" and "other". Enum types include capitalized versions of these from the ("column",) –

  • MetricDomainTypes. (class) –

  • accessor_keys (str iterable) –

  • the domain and simply transferred with their associated values into accessor_domain_kwargs. (describing) –

Returns

SqlAlchemy column

resolve_metric_bundle(self, metric_fn_bundle: Iterable[Tuple[MetricConfiguration, Any, dict, dict]])

For every metric in a set of Metrics to resolve, obtains necessary metric keyword arguments and builds bundles of the metrics into one large query dictionary so that they are all executed simultaneously. Will fail if bundling the metrics together is not possible.

Args:
metric_fn_bundle (Iterable[Tuple[MetricConfiguration, Callable, dict]): A Dictionary containing a MetricProvider’s MetricConfiguration (its unique identifier), its metric provider function

(the function that actually executes the metric), and the arguments to pass to the metric provider function. A dictionary of metrics defined in the registry and corresponding arguments

Returns:

A dictionary of metric names and their corresponding now-queried values.

close(self)

Note: Will 20210729

This is a helper function that will close and dispose Sqlalchemy objects that are used to connect to a database. Databases like Snowflake require the connection and engine to be instantiated and closed separately, and not doing so has caused problems with hanging connections.

Currently the ExecutionEngine does not support handling connections and engine separately, and will actually override the engine with a connection in some cases, obfuscating what object is used to actually used by the ExecutionEngine to connect to the external database. This will be handled in an upcoming refactor, which will allow this function to eventually become:

self.connection.close() self.engine.dispose()

More background can be found here: https://github.com/great-expectations/great_expectations/pull/3104/

_split_on_whole_table(self, table_name: str, batch_identifiers: dict)

‘Split’ by returning the whole table

_split_on_column_value(self, table_name: str, column_name: str, batch_identifiers: dict)

Split using the values in the named column

_split_on_converted_datetime(self, table_name: str, column_name: str, batch_identifiers: dict, date_format_string: str = '%Y-%m-%d')

Convert the values in the named column to the given date_format, and split on that

_split_on_divided_integer(self, table_name: str, column_name: str, divisor: int, batch_identifiers: dict)

Divide the values in the named column by divisor, and split on that

_split_on_mod_integer(self, table_name: str, column_name: str, mod: int, batch_identifiers: dict)

Divide the values in the named column by divisor, and split on that

_split_on_multi_column_values(self, table_name: str, column_names: List[str], batch_identifiers: dict)

Split on the joint values in the named columns

_split_on_hashed_column(self, table_name: str, column_name: str, hash_digits: int, batch_identifiers: dict)

Split on the hashed value of the named column

_sample_using_mod(self, column_name: str, mod: int, value: int)

Take the mod of named column, and only keep rows that match the given value

_sample_using_a_list(self, column_name: str, value_list: list)

Match the values in the named column against value_list, and only keep the matches

_sample_using_md5(self, column_name: str, hash_digits: int = 1, hash_value: str = 'f')

Hash the values in the named column, and split on that

_build_selectable_from_batch_spec(self, batch_spec: BatchSpec)
get_batch_data_and_markers(self, batch_spec: BatchSpec)