great_expectations.datasource
¶
Subpackages¶
great_expectations.datasource.batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.databricks_batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.glob_reader_batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.manual_batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.query_batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.s3_batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.s3_subdir_reader_batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.subdir_reader_batch_kwargs_generator
great_expectations.datasource.batch_kwargs_generator.table_batch_kwargs_generator
great_expectations.datasource.data_connector
great_expectations.datasource.data_connector.asset
great_expectations.datasource.data_connector.sorter
great_expectations.datasource.data_connector.sorter.custom_list_sorter
great_expectations.datasource.data_connector.sorter.date_time_sorter
great_expectations.datasource.data_connector.sorter.lexicographic_sorter
great_expectations.datasource.data_connector.sorter.numeric_sorter
great_expectations.datasource.data_connector.sorter.sorter
great_expectations.datasource.data_connector.configured_asset_file_path_data_connector
great_expectations.datasource.data_connector.configured_asset_filesystem_data_connector
great_expectations.datasource.data_connector.configured_asset_s3_data_connector
great_expectations.datasource.data_connector.configured_asset_sql_data_connector
great_expectations.datasource.data_connector.data_connector
great_expectations.datasource.data_connector.file_path_data_connector
great_expectations.datasource.data_connector.inferred_asset_file_path_data_connector
great_expectations.datasource.data_connector.inferred_asset_filesystem_data_connector
great_expectations.datasource.data_connector.inferred_asset_s3_data_connector
great_expectations.datasource.data_connector.inferred_asset_sql_data_connector
great_expectations.datasource.data_connector.partition_query
great_expectations.datasource.data_connector.runtime_data_connector
great_expectations.datasource.data_connector.util
great_expectations.datasource.types
Submodules¶
great_expectations.datasource.datasource
great_expectations.datasource.new_datasource
great_expectations.datasource.pandas_datasource
great_expectations.datasource.simple_sqlalchemy_datasource
great_expectations.datasource.sparkdf_datasource
great_expectations.datasource.sqlalchemy_datasource
great_expectations.datasource.util
Package Contents¶
Classes¶
|
DataConnectors produce identifying information, called “batch_spec” that ExecutionEngines |
|
A Datasource connects to a compute environment and one or more storage environments and produces batches of data |
|
An Datasource is the glue between an ExecutionEngine and a DataConnector. |
|
An Datasource is the glue between an ExecutionEngine and a DataConnector. |
|
The PandasDatasource produces PandasDataset objects and supports generators capable of |
|
A specialized Datasource for SQL backends |
|
The SparkDFDatasource produces SparkDFDatasets and supports generators capable of interacting with local |
|
A SqlAlchemyDatasource will provide data_assets converting batch_kwargs using the following rules: |
-
class
great_expectations.datasource.
DataConnector
(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None)¶ DataConnectors produce identifying information, called “batch_spec” that ExecutionEngines can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, downsampling, or other techniques appropriate for the Datasource.
For example, a DataConnector could produce a SQL query that logically represents “rows in the Events table with a timestamp on February 7, 2012,” which a SqlAlchemyDatasource could use to materialize a SqlAlchemyDataset corresponding to that batch of data and ready for validation.
A batch is a sample from a data asset, sliced according to a particular rule. For example, an hourly slide of the Events table or “most recent users records.”
A Batch is the primary unit of validation in the Great Expectations DataContext. Batches include metadata that identifies how they were constructed–the same “batch_spec” assembled by the data connector, While not every Datasource will enable re-fetching a specific batch of data, GE can store snapshots of batches or store metadata from an external data version control system.
-
property
name
(self)¶
-
property
datasource_name
(self)¶
-
property
data_context_root_directory
(self)¶
-
get_batch_data_and_metadata
(self, batch_definition: BatchDefinition)¶ Uses batch_definition to retrieve batch_data and batch_markers by building a batch_spec from batch_definition, then using execution_engine to return batch_data and batch_markers
- Parameters
batch_definition (BatchDefinition) – required batch_definition parameter for retrieval
-
build_batch_spec
(self, batch_definition: BatchDefinition)¶ Builds batch_spec from batch_definition by generating batch_spec params and adding any pass_through params
- Parameters
batch_definition (BatchDefinition) – required batch_definition parameter for retrieval
- Returns
BatchSpec object built from BatchDefinition
-
abstract
_refresh_data_references_cache
(self)¶
-
abstract
_get_data_reference_list
(self, data_asset_name: Optional[str] = None)¶ List objects in the underlying data store to create a list of data_references. This method is used to refresh the cache by classes that extend this base DataConnector class
- Parameters
data_asset_name (str) – optional data_asset_name to retrieve more specific results
-
abstract
_get_data_reference_list_from_cache_by_data_asset_name
(self, data_asset_name: str)¶ Fetch data_references corresponding to data_asset_name from the cache.
-
abstract
get_data_reference_list_count
(self)¶
-
abstract
get_unmatched_data_references
(self)¶
-
abstract
get_available_data_asset_names
(self)¶ Return the list of asset names known by this data connector.
- Returns
A list of available names
-
abstract
get_batch_definition_list_from_batch_request
(self, batch_request: BatchRequest)¶
-
abstract
_map_data_reference_to_batch_definition_list
(self, data_reference: Any, data_asset_name: Optional[str] = None)¶
-
abstract
_map_batch_definition_to_data_reference
(self, batch_definition: BatchDefinition)¶
-
abstract
_generate_batch_spec_parameters_from_batch_definition
(self, batch_definition: BatchDefinition)¶
-
self_check
(self, pretty_print=True, max_examples=3)¶ Checks the configuration of the current DataConnector by doing the following :
refresh or create data_reference_cache
print batch_definition_count and example_data_references for each data_asset_names
also print unmatched data_references, and allow the user to modify the regex or glob configuration if necessary
select a random data_reference and attempt to retrieve and print the first few rows to user
When used as part of the test_yaml_config() workflow, the user will be able to know if the data_connector is properly configured, and if the associated execution_engine can properly retrieve data using the configuration.
- Parameters
pretty_print (bool) – should the output be printed?
max_examples (int) – how many data_references should be printed?
-
_self_check_fetch_batch
(self, pretty_print: bool, example_data_reference: Any, data_asset_name: str)¶ Helper function for self_check() to retrieve batch using example_data_reference and data_asset_name, all while printing helpful messages. First 5 rows of batch_data are printed by default.
- Parameters
pretty_print (bool) – print to console?
example_data_reference (Any) – data_reference to retrieve
data_asset_name (str) – data_asset_name to retrieve
-
_validate_batch_request
(self, batch_request: BatchRequest)¶ - Validate batch_request by checking:
if configured datasource_name matches batch_request’s datasource_name
if current data_connector_name matches batch_request’s data_connector_name
- Parameters
batch_request (BatchRequest) – batch_request to validate
-
property
-
class
great_expectations.datasource.
LegacyDatasource
(name, data_context=None, data_asset_type=None, batch_kwargs_generators=None, **kwargs)¶ A Datasource connects to a compute environment and one or more storage environments and produces batches of data that Great Expectations can validate in that compute environment.
Each Datasource provides Batches connected to a specific compute environment, such as a SQL database, a Spark cluster, or a local in-memory Pandas DataFrame.
Datasources use Batch Kwargs to specify instructions for how to access data from relevant sources such as an existing object from a DAG runner, a SQL database, S3 bucket, or local filesystem.
To bridge the gap between those worlds, Datasources interact closely with generators which are aware of a source of data and can produce produce identifying information, called “batch_kwargs” that datasources can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, downsampling, or other techniques appropriate for the datasource.
For example, a batch kwargs generator could produce a SQL query that logically represents “rows in the Events table with a timestamp on February 7, 2012,” which a SqlAlchemyDatasource could use to materialize a SqlAlchemyDataset corresponding to that batch of data and ready for validation.
Since opinionated DAG managers such as airflow, dbt, prefect.io, dagster can also act as datasources and/or batch kwargs generators for a more generic datasource.
When adding custom expectations by subclassing an existing DataAsset type, use the data_asset_type parameter to configure the datasource to load and return DataAssets of the custom type.
Datasource - S3 - How-to GuideSupport for connecting to Amazon Web Services S3 as an external datasource.Maturity: ProductionDetails:API Stability: mediumImplementation Completeness: CompleteUnit Test Coverage: : CompleteIntegration Infrastructure/Test Coverage: NoneDocumentation Completeness: Minimal/SpottyBug Risk: LowDatasource - Filesystem - How-to GuideSupport for using a mounted filesystem as an external datasource.Maturity: ProductionDetails:API Stability: MediumImplementation Completeness: CompleteUnit Test Coverage: CompleteIntegration Infrastructure/Test Coverage: PartialDocumentation Completeness: PartialBug Risk: Low (Moderate for Windows users because of path issues)Datasource - GCS - How-to GuideSupport for Google Cloud Storage as an external datasourceMaturity: ExperimentalDetails:API Stability: Medium (supported via native ‘gs://’ syntax in Pandas and Pyspark; medium because we expect configuration to evolve)Implementation Completeness: Medium (works via passthrough, not via CLI)Unit Test Coverage: MinimalIntegration Infrastructure/Test Coverage: MinimalDocumentation Completeness: MinimalBug Risk: ModerateDatasource - Azure Blob Storage - How-to GuideSupport for Microsoft Azure Blob Storage as an external datasourceMaturity: In Roadmap (Sub-Experimental - “Not Impossible”)Details:API Stability: N/A (Supported on Databricks Spark via ‘wasb://’ / ‘wasps://’ url; requires local download first for Pandas)Implementation Completeness: MinimalUnit Test Coverage: N/AIntegration Infrastructure/Test Coverage: N/ADocumentation Completeness: MinimalBug Risk: Unknown-
recognized_batch_parameters
¶
-
classmethod
from_configuration
(cls, **kwargs)¶ Build a new datasource from a configuration dictionary.
- Parameters
**kwargs – configuration key-value pairs
- Returns
the newly-created datasource
- Return type
datasource (Datasource)
-
classmethod
build_configuration
(cls, class_name, module_name='great_expectations.datasource', data_asset_type=None, batch_kwargs_generators=None, **kwargs)¶ Build a full configuration object for a datasource, potentially including batch kwargs generators with defaults.
- Parameters
class_name – The name of the class for which to build the config
module_name – The name of the module in which the datasource class is located
data_asset_type – A ClassConfig dictionary
batch_kwargs_generators – BatchKwargGenerators configuration dictionary
**kwargs – Additional kwargs to be part of the datasource constructor’s initialization
- Returns
A complete datasource configuration.
-
property
name
(self)¶ Property for datasource name
-
property
config
(self)¶
-
property
data_context
(self)¶ Property for attached DataContext
-
_build_generators
(self)¶ Build batch kwargs generator objects from the datasource configuration.
- Returns
None
-
add_batch_kwargs_generator
(self, name, class_name, **kwargs)¶ Add a BatchKwargGenerator to the datasource.
- Parameters
name (str) – the name of the new BatchKwargGenerator to add
class_name – class of the BatchKwargGenerator to add
kwargs – additional keyword arguments will be passed directly to the new BatchKwargGenerator’s constructor
- Returns
BatchKwargGenerator (BatchKwargGenerator)
-
_build_batch_kwargs_generator
(self, **kwargs)¶ Build a BatchKwargGenerator using the provided configuration and return the newly-built generator.
-
get_batch_kwargs_generator
(self, name)¶ Get the (named) BatchKwargGenerator from a datasource)
- Parameters
name (str) – name of BatchKwargGenerator (default value is ‘default’)
- Returns
BatchKwargGenerator (BatchKwargGenerator)
-
list_batch_kwargs_generators
(self)¶ List currently-configured BatchKwargGenerator for this datasource.
- Returns
each dictionary includes “name” and “type” keys
- Return type
List(dict)
-
process_batch_parameters
(self, limit=None, dataset_options=None)¶ Use datasource-specific configuration to translate any batch parameters into batch kwargs at the datasource level.
- Parameters
limit (int) – a parameter all datasources must accept to allow limiting a batch to a smaller number of rows.
dataset_options (dict) – a set of kwargs that will be passed to the constructor of a dataset built using these batch_kwargs
- Returns
Result will include both parameters passed via argument and configured parameters.
- Return type
batch_kwargs
-
abstract
get_batch
(self, batch_kwargs, batch_parameters=None)¶ Get a batch of data from the datasource.
- Parameters
batch_kwargs – the BatchKwargs to use to construct the batch
batch_parameters – optional parameters to store as the reference description of the batch. They should reflect parameters that would provide the passed BatchKwargs.
- Returns
Batch
-
get_available_data_asset_names
(self, batch_kwargs_generator_names=None)¶ Returns a dictionary of data_asset_names that the specified batch kwarg generator can provide. Note that some batch kwargs generators may not be capable of describing specific named data assets, and some (such as filesystem glob batch kwargs generators) require the user to configure data asset names.
- Parameters
batch_kwargs_generator_names – the BatchKwargGenerator for which to get available data asset names.
- Returns
{ generator_name: { names: [ (data_asset_1, data_asset_1_type), (data_asset_2, data_asset_2_type) ... ] } ... }
- Return type
dictionary consisting of sets of generator assets available for the specified generators
-
build_batch_kwargs
(self, batch_kwargs_generator, data_asset_name=None, partition_id=None, **kwargs)¶
-
-
class
great_expectations.datasource.
BaseDatasource
(name: str, execution_engine=None, data_context_root_directory: Optional[str] = None)¶ An Datasource is the glue between an ExecutionEngine and a DataConnector.
-
recognized_batch_parameters
:set¶
-
get_batch_from_batch_definition
(self, batch_definition: BatchDefinition, batch_data: Any = None)¶ Note: this method should not be used when getting a Batch from a BatchRequest, since it does not capture BatchRequest metadata.
-
get_single_batch_from_batch_request
(self, batch_request: BatchRequest)¶
-
get_batch_list_from_batch_request
(self, batch_request: BatchRequest)¶ Processes batch_request and returns the (possibly empty) list of batch objects.
:param : batch_request encapsulation of request parameters necessary to identify the (possibly multiple) batches :param : returns possibly empty list of batch objects; each batch object contains a dataset and associated metatada
-
_build_data_connector_from_config
(self, name: str, config: Dict[str, Any])¶ Build a DataConnector using the provided configuration and return the newly-built DataConnector.
-
get_available_data_asset_names
(self, data_connector_names: Optional[Union[list, str]] = None)¶ Returns a dictionary of data_asset_names that the specified data connector can provide. Note that some data_connectors may not be capable of describing specific named data assets, and some (such as inferred_asset_data_connector) require the user to configure data asset names.
- Parameters
data_connector_names – the DataConnector for which to get available data asset names.
- Returns
{ data_connector_name: { names: [ (data_asset_1, data_asset_1_type), (data_asset_2, data_asset_2_type) ... ] } ... }
- Return type
dictionary consisting of sets of data assets available for the specified data connectors
-
get_available_batch_definitions
(self, batch_request: BatchRequest)¶
-
self_check
(self, pretty_print=True, max_examples=3)¶
-
_validate_batch_request
(self, batch_request: BatchRequest)¶
-
property
name
(self)¶ Property for datasource name
-
property
execution_engine
(self)¶
-
property
data_connectors
(self)¶
-
property
config
(self)¶
-
-
class
great_expectations.datasource.
Datasource
(name: str, execution_engine=None, data_connectors=None, data_context_root_directory: Optional[str] = None)¶ Bases:
great_expectations.datasource.new_datasource.BaseDatasource
An Datasource is the glue between an ExecutionEngine and a DataConnector.
-
recognized_batch_parameters
:set¶
-
_init_data_connectors
(self, data_connector_configs: Dict[str, Dict[str, Any]])¶
-
-
class
great_expectations.datasource.
PandasDatasource
(name='pandas', data_context=None, data_asset_type=None, batch_kwargs_generators=None, boto3_options=None, reader_method=None, reader_options=None, limit=None, **kwargs)¶ Bases:
great_expectations.datasource.datasource.LegacyDatasource
The PandasDatasource produces PandasDataset objects and supports generators capable of interacting with the local filesystem (the default subdir_reader generator), and from existing in-memory dataframes.
-
recognized_batch_parameters
¶
-
classmethod
build_configuration
(cls, data_asset_type=None, batch_kwargs_generators=None, boto3_options=None, reader_method=None, reader_options=None, limit=None, **kwargs)¶ Build a full configuration object for a datasource, potentially including generators with defaults.
- Parameters
data_asset_type – A ClassConfig dictionary
batch_kwargs_generators – Generator configuration dictionary
boto3_options – Optional dictionary with key-value pairs to pass to boto3 during instantiation.
reader_method – Optional default reader_method for generated batches
reader_options – Optional default reader_options for generated batches
limit – Optional default limit for generated batches
**kwargs – Additional kwargs to be part of the datasource constructor’s initialization
- Returns
A complete datasource configuration.
-
process_batch_parameters
(self, reader_method=None, reader_options=None, limit=None, dataset_options=None)¶ Use datasource-specific configuration to translate any batch parameters into batch kwargs at the datasource level.
- Parameters
limit (int) – a parameter all datasources must accept to allow limiting a batch to a smaller number of rows.
dataset_options (dict) – a set of kwargs that will be passed to the constructor of a dataset built using these batch_kwargs
- Returns
Result will include both parameters passed via argument and configured parameters.
- Return type
batch_kwargs
-
get_batch
(self, batch_kwargs, batch_parameters=None)¶ Get a batch of data from the datasource.
- Parameters
batch_kwargs – the BatchKwargs to use to construct the batch
batch_parameters – optional parameters to store as the reference description of the batch. They should reflect parameters that would provide the passed BatchKwargs.
- Returns
Batch
-
static
guess_reader_method_from_path
(path)¶
-
_infer_default_options
(self, reader_fn: Callable, reader_options: dict)¶ Allows reader options to be customized based on file context before loading to a DataFrame
- Parameters
reader_method (str) – pandas reader method
reader_options – Current options and defaults set to pass to the reader method
- Returns
A copy of the reader options post-inference
- Return type
dict
-
_get_reader_fn
(self, reader_method=None, path=None)¶ Static helper for parsing reader types. If reader_method is not provided, path will be used to guess the correct reader_method.
- Parameters
reader_method (str) – the name of the reader method to use, if available.
path (str) – the to use to guess
- Returns
ReaderMethod to use for the filepath
-
-
class
great_expectations.datasource.
SimpleSqlalchemyDatasource
(name: str, connection_string: str = None, url: str = None, credentials: dict = None, engine=None, introspection: dict = None, tables: dict = None)¶ Bases:
great_expectations.datasource.new_datasource.BaseDatasource
A specialized Datasource for SQL backends
SimpleSqlalchemyDatasource is designed to minimize boilerplate configuration and new concepts
-
_init_data_connectors
(self, introspection_configs: dict, table_configs: dict)¶
-
-
class
great_expectations.datasource.
SparkDFDatasource
(name='default', data_context=None, data_asset_type=None, batch_kwargs_generators=None, spark_config=None, **kwargs)¶ Bases:
great_expectations.datasource.datasource.LegacyDatasource
- The SparkDFDatasource produces SparkDFDatasets and supports generators capable of interacting with local
filesystem (the default subdir_reader batch kwargs generator) and databricks notebooks.
- Accepted Batch Kwargs:
PathBatchKwargs (“path” or “s3” keys)
InMemoryBatchKwargs (“dataset” key)
QueryBatchKwargs (“query” key)
Datasource - HDFS - How-to GuideUse HDFS as an external datasource in conjunction with Spark.Maturity: ExperimentalDetails:API Stability: StableImplementation Completeness: UnknownUnit Test Coverage: Minimal (none)Integration Infrastructure/Test Coverage: Minimal (none)Documentation Completeness: Minimal (none)Bug Risk: Unknown-
recognized_batch_parameters
¶
-
classmethod
build_configuration
(cls, data_asset_type=None, batch_kwargs_generators=None, spark_config=None, **kwargs)¶ Build a full configuration object for a datasource, potentially including generators with defaults.
- Parameters
data_asset_type – A ClassConfig dictionary
batch_kwargs_generators – Generator configuration dictionary
spark_config – dictionary of key-value pairs to pass to the spark builder
**kwargs – Additional kwargs to be part of the datasource constructor’s initialization
- Returns
A complete datasource configuration.
-
process_batch_parameters
(self, reader_method=None, reader_options=None, limit=None, dataset_options=None)¶ Use datasource-specific configuration to translate any batch parameters into batch kwargs at the datasource level.
- Parameters
limit (int) – a parameter all datasources must accept to allow limiting a batch to a smaller number of rows.
dataset_options (dict) – a set of kwargs that will be passed to the constructor of a dataset built using these batch_kwargs
- Returns
Result will include both parameters passed via argument and configured parameters.
- Return type
batch_kwargs
-
get_batch
(self, batch_kwargs, batch_parameters=None)¶ class-private implementation of get_data_asset
-
static
guess_reader_method_from_path
(path)¶
-
_get_reader_fn
(self, reader, reader_method=None, path=None)¶ Static helper for providing reader_fn
- Parameters
reader – the base spark reader to use; this should have had reader_options applied already
reader_method – the name of the reader_method to use, if specified
path (str) – the path to use to guess reader_method if it was not specified
- Returns
ReaderMethod to use for the filepath
-
class
great_expectations.datasource.
SqlAlchemyDatasource
(name='default', data_context=None, data_asset_type=None, credentials=None, batch_kwargs_generators=None, **kwargs)¶ Bases:
great_expectations.datasource.LegacyDatasource
- A SqlAlchemyDatasource will provide data_assets converting batch_kwargs using the following rules:
if the batch_kwargs include a table key, the datasource will provide a dataset object connected to that table
if the batch_kwargs include a query key, the datasource will create a temporary table usingthat query. The query can be parameterized according to the standard python Template engine, which uses $parameter, with additional kwargs passed to the get_batch method.
Datasource - PostgreSQL - How-to GuideSupport for using the open source PostgresQL database as an external datasource and execution engine.Maturity: ProductionDetails:API Stability: HighImplementation Completeness: CompleteUnit Test Coverage: CompleteIntegration Infrastructure/Test Coverage: CompleteDocumentation Completeness: Medium (does not have a specific how-to, but easy to use overall)Bug Risk: LowExpectation Completeness: ModerateDatasource - BigQuery - How-to GuideUse Google BigQuery as an execution engine and external datasource to validate data.Maturity: BetaDetails:API Stability: Unstable (table generator inability to work with triple-dotted, temp table usability, init flow calls setup “other”)Implementation Completeness: ModerateUnit Test Coverage: Partial (no test coverage for temp table creation)Integration Infrastructure/Test Coverage: MinimalDocumentation Completeness: Partial (how-to does not cover all cases)Bug Risk: High (we know of several bugs, including inability to list tables, SQLAlchemy URL incomplete)Expectation Completeness: ModerateDatasource - Amazon Redshift - How-to GuideUse Amazon Redshift as an execution engine and external datasource to validate data.Maturity: BetaDetails:API Stability: Moderate (potential metadata/introspection method special handling for performance)Implementation Completeness: CompleteUnit Test Coverage: MinimalIntegration Infrastructure/Test Coverage: Minimal (none automated)Documentation Completeness: ModerateBug Risk: ModerateExpectation Completeness: ModerateDatasource - Snowflake - How-to GuideUse Snowflake Computing as an execution engine and external datasource to validate data.Maturity: ProductionDetails:API Stability: HighImplementation Completeness: CompleteUnit Test Coverage: CompleteIntegration Infrastructure/Test Coverage: Minimal (manual only)Documentation Completeness: CompleteBug Risk: LowExpectation Completeness: CompleteDatasource - Microsoft SQL Server - How-to GuideUse Microsoft SQL Server as an execution engine and external datasource to validate data.Maturity: ExperimentalDetails:API Stability: HighImplementation Completeness: ModerateUnit Test Coverage: Minimal (none)Integration Infrastructure/Test Coverage: Minimal (none)Documentation Completeness: MinimalBug Risk: HighExpectation Completeness: Low (some required queries do not generate properly, such as related to nullity)Datasource - MySQL - How-to GuideUse MySQL as an execution engine and external datasource to validate data.Maturity: ExperimentalDetails:API Stability: Low (no consideration for temp tables)Implementation Completeness: Low (no consideration for temp tables)Unit Test Coverage: Minimal (none)Integration Infrastructure/Test Coverage: Minimal (none)Documentation Completeness: Minimal (none)Bug Risk: UnknownExpectation Completeness: UnknownDatasource - MariaDB - How-to GuideUse MariaDB as an execution engine and external datasource to validate data.Maturity: ExperimentalDetails:API Stability: Low (no consideration for temp tables)Implementation Completeness: Low (no consideration for temp tables)Unit Test Coverage: Minimal (none)Integration Infrastructure/Test Coverage: Minimal (none)Documentation Completeness: Minimal (none)Bug Risk: UnknownExpectation Completeness: Unknown-
recognized_batch_parameters
¶
-
classmethod
build_configuration
(cls, data_asset_type=None, batch_kwargs_generators=None, **kwargs)¶ Build a full configuration object for a datasource, potentially including generators with defaults.
- Parameters
data_asset_type – A ClassConfig dictionary
batch_kwargs_generators – Generator configuration dictionary
**kwargs – Additional kwargs to be part of the datasource constructor’s initialization
- Returns
A complete datasource configuration.
-
_get_sqlalchemy_connection_options
(self, **kwargs)¶
-
_get_sqlalchemy_key_pair_auth_url
(self, drivername, credentials)¶
-
get_batch
(self, batch_kwargs, batch_parameters=None)¶ Get a batch of data from the datasource.
- Parameters
batch_kwargs – the BatchKwargs to use to construct the batch
batch_parameters – optional parameters to store as the reference description of the batch. They should reflect parameters that would provide the passed BatchKwargs.
- Returns
Batch
-
process_batch_parameters
(self, query_parameters=None, limit=None, dataset_options=None)¶ Use datasource-specific configuration to translate any batch parameters into batch kwargs at the datasource level.
- Parameters
limit (int) – a parameter all datasources must accept to allow limiting a batch to a smaller number of rows.
dataset_options (dict) – a set of kwargs that will be passed to the constructor of a dataset built using these batch_kwargs
- Returns
Result will include both parameters passed via argument and configured parameters.
- Return type
batch_kwargs