Execution Engine
An Execution Engine is a system capable of processing data to compute MetricsA computed attribute of data such as the mean of a column..
An Execution Engine provides the computing resources that will be used to actually perform ValidationThe act of applying an Expectation Suite to a Batch.. Great Expectations can take advantage of different Execution Engines, such as Pandas, Spark, or SqlAlchemy, and even translate the same ExpectationsA verifiable assertion about data. to validate data using different engines.
Data is always viewed through the lens of an Execution Engine in Great Expectations. When we obtain a BatchA selection of records from a Data Asset. of data, that Batch contains metadata that wraps the native Data Object of the Execution Engine -- for example, a DataFrame
in Pandas or Spark, or a table or query result in SQL.
Relationship to other objects
Execution Engines are components of Data SourcesProvides a standard API for accessing and interacting with data from a wide variety of source systems.. They accept Batch RequestsProvided to a Data Source in order to create a Batch. and deliver Batches. The Execution Engine is an underlying component of the Data Source, and when you interact with the Data Source it will handle the Execution Engine for you.
Use cases
You define the Execution Engine that you want to use to process data to compute Metrics in the Data Source configuration. After you define the Execution Engine, you don't need to interact with it because the Data Source it is configured for uses it automatically.
If you use the interactive workflow for creating Expectations, an Execution Engine and the Data Source provide the data for introspection.
When a CheckpointThe primary means for validating data in a production deployment of Great Expectations. Validates data, it uses a Data Source (and therefore an Execution Engine) to execute one or more Batch Requests and acquire the data that the Validation is run on.
When creating Custom Expectations and Metrics, often Execution Engine-specific logic is required for that Expectation or Metric. See Custom Expectations for more information.
Standardized data and Expectations
Execution engines handle the interactions with the Data Source. They also wrap data from the Data Source with metadata that allows Great Expectations to read it regardless of its native format. Additionally, Execution Engines enable the calculations of Metrics used by Expectations so that they can operate in a format appropriate to their associated Data Source. Because of this, the same Expectations can be used to validate data from different Data Sources, even if those Data Sources interact with Data Sources so different in nature that they require different Execution Engines to access their data.
Deferred Metrics
SqlAlchemyExecutionEngine and SparkDFExecutionEngine provide an additional feature that allows deferred resolution of Metrics, making it possible to bundle the request for several metrics into a single trip to the backend. Additional Execution Engines may also support this feature in the future.
The resolve_metric_bundle()
method of these engines computes values of a bundle of Metrics; this function is used internally by resolve_metrics()
on Execution Engines that support bundled metrics.
Access
You will not need to directly access an Execution Engine. When you interact with a Data Source it will handle the Execution Engine's operation under the hood.
Create
You will not need to directly instantiate an Execution Engine. Instead, they are automatically created as a component in a Data Source.
If you are interested in using and accessing data with an Execution Engine that Great Expectations does not yet support, consider making your work a contribution to the Great Expectations open source GitHub project. This is a considerable undertaking, so you may also wish to reach out to us on Slack as we will be happy to provide guidance and support.
Execution Engine init arguments
name
caching
batch_spec_defaults
batch_data_dict
validator
Execution Engine Properties
loaded_batch_data
active_batch_data_id
Execution Engine Methods
load_batch_data(batrch_id, batch_data)
resolve_metrics
: computes metric valuesget_compute_domain
: gets the compute domain for a particular type of intermediate metric.
Configure
Execution Engines are not configured directly, but determined based on the Data Source you choose.