great_expectations.datasource.batch_kwargs_generator.s3_subdir_reader_batch_kwargs_generator

Module Contents

Classes

S3SubdirReaderBatchKwargsGenerator(name=’default’, datasource=None, bucket=None, boto3_options=None, base_directory=’/data’, reader_options=None, known_extensions=None, reader_method=None)

The SubdirReaderBatchKwargsGenerator inspects a filesystem and produces path-based batch_kwargs.

great_expectations.datasource.batch_kwargs_generator.s3_subdir_reader_batch_kwargs_generator.s3fs
great_expectations.datasource.batch_kwargs_generator.s3_subdir_reader_batch_kwargs_generator.logger
great_expectations.datasource.batch_kwargs_generator.s3_subdir_reader_batch_kwargs_generator.KNOWN_EXTENSIONS = ['.csv', '.tsv', '.parquet', '.pqt', '.parq', '.xls', '.xlsx', '.json', '.csv.gz', '.tsv.gz', '.feather', '.pkl']
class great_expectations.datasource.batch_kwargs_generator.s3_subdir_reader_batch_kwargs_generator.S3SubdirReaderBatchKwargsGenerator(name='default', datasource=None, bucket=None, boto3_options=None, base_directory='/data', reader_options=None, known_extensions=None, reader_method=None)

Bases: great_expectations.datasource.batch_kwargs_generator.batch_kwargs_generator.BatchKwargsGenerator

The SubdirReaderBatchKwargsGenerator inspects a filesystem and produces path-based batch_kwargs.

SubdirReaderBatchKwargsGenerator recognizes data assets using two criteria:
  • for files directly in ‘base_directory’ with recognized extensions (.csv, .tsv, .parquet, .xls, .xlsx, .json .csv.gz, tsv.gz, .feather, .pkl), it uses the name of the file without the extension

  • for other files or directories in ‘base_directory’, is uses the file or directory name

SubdirReaderBatchKwargsGenerator sees all files inside a directory of base_directory as batches of one datasource.

SubdirReaderBatchKwargsGenerator can also include configured reader_options which will be added to batch_kwargs generated by this generator.

_default_reader_options :Dict
recognized_batch_parameters
property reader_options(self)
property known_extensions(self)
property reader_method(self)
property base_directory(self)
get_available_data_asset_names(self)

Return the list of asset names known by this batch kwargs generator.

Returns

A list of available names

get_available_partition_ids(self, generator_asset=None, data_asset_name=None)

Applies the current _partitioner to the batches available on data_asset_name and returns a list of valid partition_id strings that can be used to identify batches of data.

Parameters

data_asset_name – the data asset whose partitions should be returned.

Returns

A list of partition_id strings

_build_batch_kwargs(self, batch_parameters)
Parameters

batch_parameters

Returns

batch_kwargs

_get_valid_file_options(self, base_directory=None)
_get_iterator(self, data_asset_name, reader_options=None, limit=None)
_build_batch_kwargs_path_iter(self, path_list, reader_options=None, limit=None)
_build_batch_kwargs_from_path(self, path, reader_method=None, reader_options=None, limit=None)
_window_to_s3_path(self, path: str)

To handle window “” path. “s3://bucketprefix” => “s3://bucket/prefix” >>> path = os.path.join(“s3://bucket”, “prefix”) >>> window_to_s3_path(path) >>>