great_expectations.datasource.batch_kwargs_generator.query_batch_kwargs_generator
¶
Module Contents¶
Classes¶
|
Produce query-style batch_kwargs from sql files or defined queries. |
-
great_expectations.datasource.batch_kwargs_generator.query_batch_kwargs_generator.
logger
¶
-
great_expectations.datasource.batch_kwargs_generator.query_batch_kwargs_generator.
sqlalchemy
¶
-
class
great_expectations.datasource.batch_kwargs_generator.query_batch_kwargs_generator.
QueryBatchKwargsGenerator
(name='default', datasource=None, query_store_backend=None, queries=None)¶ Bases:
great_expectations.datasource.batch_kwargs_generator.batch_kwargs_generator.BatchKwargsGenerator
Produce query-style batch_kwargs from sql files or defined queries.
By default, a QueryBatchKwargsGenerator will look for queries in the
datasources/datasource_name/generators/generator_name
directory, and look for files ending in.sql
.For example, a file stored in
datasources/datasource_name/generators/generator_name/movies_by_date.sql
would allow you to access an asset calledmovies_by_date
Queries can be parameterized using $substitution.
Example configuration:
- queries:
class_name: QueryBatchKwargsGenerator query_store_backend:
class_name: TupleFilesystemStoreBackend filepath_suffix: .sql base_directory: queries
Example query template, to be stored in
queries/movies_by_date.sql
SELECT * FROM movies where ‘$start’::date <= release_date AND release_date <= ‘$end’::date;
Example usage:
- context.build_batch_kwargs(
“my_db”, “query_generator”, “movies_by_date”, “query_parameters”: {
“start”: “2020-01-01”, “end”: “2020-02-01”
}
-
recognized_batch_parameters
¶
-
_get_raw_query
(self, data_asset_name)¶
-
_get_iterator
(self, data_asset_name, query_parameters=None)¶
-
add_query
(self, generator_asset=None, query=None, data_asset_name=None)¶
-
get_available_data_asset_names
(self)¶ Return the list of asset names known by this batch kwargs generator.
- Returns
A list of available names
-
_build_batch_kwargs
(self, batch_parameters)¶ Build batch kwargs from a partition id.
-
get_available_partition_ids
(self, generator_asset=None, data_asset_name=None)¶ Applies the current _partitioner to the batches available on data_asset_name and returns a list of valid partition_id strings that can be used to identify batches of data.
- Parameters
data_asset_name – the data asset whose partitions should be returned.
- Returns
A list of partition_id strings