great_expectations.datasource.batch_kwargs_generator.subdir_reader_batch_kwargs_generator
¶
Module Contents¶
Classes¶
|
The SubdirReaderBatchKwargsGenerator inspects a filesystem and produces path-based batch_kwargs. |
-
great_expectations.datasource.batch_kwargs_generator.subdir_reader_batch_kwargs_generator.
logger
¶
-
great_expectations.datasource.batch_kwargs_generator.subdir_reader_batch_kwargs_generator.
KNOWN_EXTENSIONS
= ['.csv', '.tsv', '.parquet', '.xls', '.xlsx', '.json', '.csv.gz', '.tsv.gz', '.feather', '.pkl']¶
-
class
great_expectations.datasource.batch_kwargs_generator.subdir_reader_batch_kwargs_generator.
SubdirReaderBatchKwargsGenerator
(name='default', datasource=None, base_directory='/data', reader_options=None, known_extensions=None, reader_method=None)¶ Bases:
great_expectations.datasource.batch_kwargs_generator.batch_kwargs_generator.BatchKwargsGenerator
The SubdirReaderBatchKwargsGenerator inspects a filesystem and produces path-based batch_kwargs.
- SubdirReaderBatchKwargsGenerator recognizes data assets using two criteria:
for files directly in ‘base_directory’ with recognized extensions (.csv, .tsv, .parquet, .xls, .xlsx, .json .csv.gz, tsv.gz, .feather, .pkl), it uses the name of the file without the extension
for other files or directories in ‘base_directory’, is uses the file or directory name
SubdirReaderBatchKwargsGenerator sees all files inside a directory of base_directory as batches of one datasource.
SubdirReaderBatchKwargsGenerator can also include configured reader_options which will be added to batch_kwargs generated by this generator.
-
_default_reader_options
¶
-
recognized_batch_parameters
¶
-
property
reader_options
(self)¶
-
property
known_extensions
(self)¶
-
property
reader_method
(self)¶
-
property
base_directory
(self)¶
-
get_available_data_asset_names
(self)¶ Return the list of asset names known by this batch kwargs generator.
- Returns
A list of available names
-
get_available_partition_ids
(self, generator_asset=None, data_asset_name=None)¶ Applies the current _partitioner to the batches available on data_asset_name and returns a list of valid partition_id strings that can be used to identify batches of data.
- Parameters
data_asset_name – the data asset whose partitions should be returned.
- Returns
A list of partition_id strings
-
_build_batch_kwargs
(self, batch_parameters)¶ - Parameters
batch_parameters –
- Returns
batch_kwargs
-
_get_valid_file_options
(self, base_directory=None)¶
-
_get_iterator
(self, data_asset_name, reader_options=None, limit=None)¶
-
_build_batch_kwargs_path_iter
(self, path_list, reader_options=None, limit=None)¶
-
_build_batch_kwargs_from_path
(self, path, reader_method=None, reader_options=None, limit=None)¶