great_expectations.execution_engine.sqlalchemy_execution_engine

Module Contents

Classes

SqlAlchemyBatchData(engine, record_set_name: str = None, schema_name: str = None, table_name: str = None, query: str = None, selectable=None, create_temp_table: bool = True, temp_table_name: str = None, temp_table_schema_name: str = None, use_quoted_name: bool = False)

A class which represents a SQL alchemy batch, with properties including the construction of the batch itself

SqlAlchemyExecutionEngine(name=None, credentials=None, data_context=None, engine=None, connection_string=None, url=None, batch_data_dict=None, **kwargs)

Functions

_get_dialect_type_module(dialect)

Given a dialect, returns the dialect type, which is defines the engine/system that is used to communicates

great_expectations.execution_engine.sqlalchemy_execution_engine.logger
great_expectations.execution_engine.sqlalchemy_execution_engine.sa
great_expectations.execution_engine.sqlalchemy_execution_engine.reflection
great_expectations.execution_engine.sqlalchemy_execution_engine.sqlalchemy_psycopg2
great_expectations.execution_engine.sqlalchemy_execution_engine.sqlalchemy_redshift
great_expectations.execution_engine.sqlalchemy_execution_engine.snowflake
great_expectations.execution_engine.sqlalchemy_execution_engine.bigquery_types_tuple
great_expectations.execution_engine.sqlalchemy_execution_engine._get_dialect_type_module(dialect)

Given a dialect, returns the dialect type, which is defines the engine/system that is used to communicates with the database/database implementation. Currently checks for RedShift/BigQuery dialects

class great_expectations.execution_engine.sqlalchemy_execution_engine.SqlAlchemyBatchData(engine, record_set_name: str = None, schema_name: str = None, table_name: str = None, query: str = None, selectable=None, create_temp_table: bool = True, temp_table_name: str = None, temp_table_schema_name: str = None, use_quoted_name: bool = False)

A class which represents a SQL alchemy batch, with properties including the construction of the batch itself and several getters used to access various properties.

property sql_engine_dialect(self)

Returns the Batches’ current engine dialect

property record_set_name(self)
property selectable(self)
property use_quoted_name(self)
_create_temporary_table(self, temp_table_name, query, temp_table_schema_name=None)

Create Temporary table based on sql query. This will be used as a basis for executing expectations. :param query:

head(self, n=5, fetch_all=False)

Fetches the head of the table

row_count(self)

Gets the number of rows

class great_expectations.execution_engine.sqlalchemy_execution_engine.SqlAlchemyExecutionEngine(name=None, credentials=None, data_context=None, engine=None, connection_string=None, url=None, batch_data_dict=None, **kwargs)

Bases: great_expectations.execution_engine.ExecutionEngine

property credentials(self)
property connection_string(self)
property url(self)
_build_engine(self, credentials, **kwargs)

Using a set of given credentials, constructs an Execution Engine , connecting to a database using a URL or a private key path.

_get_sqlalchemy_key_pair_auth_url(self, drivername: str, credentials: dict)

Utilizing a private key path and a passphrase in a given credentials dictionary, attempts to encode the provided values into a private key. If passphrase is incorrect, this will fail and an exception is raised.

Parameters
  • drivername (str) –

  • credentials (dict) –

Returns

a tuple consisting of a url with the serialized key-pair authentication, and a dictionary of engine kwargs.

get_compute_domain(self, domain_kwargs: Dict, domain_type: Union[str, 'MetricDomainTypes'], accessor_keys: Optional[Iterable[str]] = None)

Uses a given batch dictionary and domain kwargs to obtain a SqlAlchemy column object.

Parameters
  • domain_kwargs (dict) –

  • domain_type (str or "MetricDomainTypes") –

  • to be using, or a corresponding string value representing it. String types include "identity", "column", (like) –

  • "table" and "other". Enum types include capitalized versions of these from the class ("column_pair",) –

  • MetricDomainTypes.

  • accessor_keys (str iterable) –

  • domain and simply transferred with their associated values into accessor_domain_kwargs. (the) –

Returns

SqlAlchemy column

resolve_metric_bundle(self, metric_fn_bundle: Iterable[Tuple[MetricConfiguration, Any, dict, dict]])

For every metrics in a set of Metrics to resolve, obtains necessary metric keyword arguments and builds a bundles the metrics into one large query dictionary so that they are all executed simultaneously. Will fail if bundling the metrics together is not possible.

Args:
metric_fn_bundle (Iterable[Tuple[MetricConfiguration, Callable, dict]): A Dictionary containing a MetricProvider’s MetricConfiguration (its unique identifier), its metric provider function

(the function that actually executes the metric), and the arguments to pass to the metric provider function.

metrics (Dict[Tuple, Any]): A dictionary of metrics defined in the registry and corresponding arguments

Returns:

A dictionary of metric names and their corresponding now-queried values.

_split_on_whole_table(self, table_name: str, partition_definition: dict)

‘Split’ by returning the whole table

_split_on_column_value(self, table_name: str, column_name: str, partition_definition: dict)

Split using the values in the named column

_split_on_converted_datetime(self, table_name: str, column_name: str, partition_definition: dict, date_format_string: str = '%Y-%m-%d')

Convert the values in the named column to the given date_format, and split on that

_split_on_divided_integer(self, table_name: str, column_name: str, divisor: int, partition_definition: dict)

Divide the values in the named column by divisor, and split on that

_split_on_mod_integer(self, table_name: str, column_name: str, mod: int, partition_definition: dict)

Divide the values in the named column by divisor, and split on that

_split_on_multi_column_values(self, table_name: str, column_names: List[str], partition_definition: dict)

Split on the joint values in the named columns

_split_on_hashed_column(self, table_name: str, column_name: str, hash_digits: int, partition_definition: dict)

Split on the hashed value of the named column

_sample_using_random(self, p: float = 0.1)

Take a random sample of rows, retaining proportion p

Note: the Random function behaves differently on different dialects of SQL

_sample_using_mod(self, column_name, mod: int, value: int)

Take the mod of named column, and only keep rows that match the given value

_sample_using_a_list(self, column_name: str, value_list: list)

Match the values in the named column against value_list, and only keep the matches

_sample_using_md5(self, column_name: str, hash_digits: int = 1, hash_value: str = 'f')

Hash the values in the named column, and split on that

_build_selectable_from_batch_spec(self, batch_spec)
get_batch_data_and_markers(self, batch_spec: BatchSpec)