great_expectations.datasource.sparkdf_datasource
¶
Module Contents¶
Classes¶
|
The SparkDFDatasource produces SparkDFDatasets and supports generators capable of interacting with local |
-
great_expectations.datasource.sparkdf_datasource.
logger
¶
-
great_expectations.datasource.sparkdf_datasource.
SparkSession
¶
-
class
great_expectations.datasource.sparkdf_datasource.
SparkDFDatasource
(name='default', data_context=None, data_asset_type=None, batch_kwargs_generators=None, spark_config=None, **kwargs)¶ Bases:
great_expectations.datasource.datasource.Datasource
- The SparkDFDatasource produces SparkDFDatasets and supports generators capable of interacting with local
filesystem (the default subdir_reader batch kwargs generator) and databricks notebooks.
- Accepted Batch Kwargs:
PathBatchKwargs (“path” or “s3” keys)
InMemoryBatchKwargs (“dataset” key)
QueryBatchKwargs (“query” key)
Datasource - HDFS - How-to GuideUse HDFS as an external datasource in conjunction with Spark.Maturity: ExperimentalDetails:API Stability: StableImplementation Completeness: UnknownUnit Test Coverage: Minimal (none)Integration Infrastructure/Test Coverage: Minimal (none)Documentation Completeness: Minimal (none)Bug Risk: Unknown-
recognized_batch_parameters
¶
-
classmethod
build_configuration
(cls, data_asset_type=None, batch_kwargs_generators=None, spark_config=None, **kwargs)¶ Build a full configuration object for a datasource, potentially including generators with defaults.
- Parameters
data_asset_type – A ClassConfig dictionary
batch_kwargs_generators – Generator configuration dictionary
spark_config – dictionary of key-value pairs to pass to the spark builder
**kwargs – Additional kwargs to be part of the datasource constructor’s initialization
- Returns
A complete datasource configuration.
-
process_batch_parameters
(self, reader_method=None, reader_options=None, limit=None, dataset_options=None)¶ Use datasource-specific configuration to translate any batch parameters into batch kwargs at the datasource level.
- Parameters
limit (int) – a parameter all datasources must accept to allow limiting a batch to a smaller number of rows.
dataset_options (dict) – a set of kwargs that will be passed to the constructor of a dataset built using these batch_kwargs
- Returns
Result will include both parameters passed via argument and configured parameters.
- Return type
batch_kwargs
-
get_batch
(self, batch_kwargs, batch_parameters=None)¶ class-private implementation of get_data_asset
-
static
guess_reader_method_from_path
(path)¶
-
_get_reader_fn
(self, reader, reader_method=None, path=None)¶ Static helper for providing reader_fn
- Parameters
reader – the base spark reader to use; this should have had reader_options applied already
reader_method – the name of the reader_method to use, if specified
path (str) – the path to use to guess reader_method if it was not specified
- Returns
ReaderMethod to use for the filepath