great_expectations.core.util

Module Contents

Classes

AzureUrl(url: str)

Parses an Azure Blob Storage URL into its separate components.

GCSUrl(url: str)

Parses a Google Cloud Storage URL into its separate components

S3Url(url)

DBFSPath()

Methods for converting Databricks Filesystem (DBFS) paths

Functions

nested_update(d: Union[Iterable, dict], u: Union[Iterable, dict], dedup: bool = False, concat_lists: bool = True)

Update d with items from u, recursively and joining elements. By default, list values are

in_jupyter_notebook()

in_databricks()

Tests whether we are in a Databricks environment.

convert_to_json_serializable(data)

Helper function to convert an object to one that is json serializable

ensure_json_serializable(data)

Helper function to convert an object to one that is json serializable

requires_lossy_conversion(d)

substitute_all_strftime_format_strings(data: Union[dict, list, str, Any], datetime_obj: Optional[datetime.datetime] = None)

This utility function will iterate over input data and for all strings, replace any strftime format

get_datetime_string_from_strftime_format(format_str: str, datetime_obj: Optional[datetime.datetime] = None)

This utility function takes a string with strftime format elements and substitutes those elements using

parse_string_to_datetime(datetime_string: str, datetime_format_string: Optional[str] = None)

datetime_to_int(dt: datetime.date)

sniff_s3_compression(s3_url: S3Url)

Attempts to get read_csv compression from s3_url

get_or_create_spark_application(spark_config: Optional[Dict[str, str]] = None, force_reuse_spark_context: bool = False)

get_or_create_spark_session(spark_config: Optional[Dict[str, str]] = None)

spark_restart_required(current_spark_config: List[tuple], desired_spark_config: dict)

get_sql_dialect_floating_point_infinity_value(schema: str, negative: bool = False)

great_expectations.core.util.logger
great_expectations.core.util.sqlalchemy
great_expectations.core.util.SCHEMAS
great_expectations.core.util.pyspark
great_expectations.core.util._SUFFIX_TO_PD_KWARG
great_expectations.core.util.nested_update(d: Union[Iterable, dict], u: Union[Iterable, dict], dedup: bool = False, concat_lists: bool = True)

Update d with items from u, recursively and joining elements. By default, list values are concatenated without de-duplication. If concat_lists is set to False, lists in u (new dict) will replace those in d (base dict).

great_expectations.core.util.in_jupyter_notebook()
great_expectations.core.util.in_databricks() → bool

Tests whether we are in a Databricks environment.

Returns

bool

great_expectations.core.util.convert_to_json_serializable(data)

Helper function to convert an object to one that is json serializable :param data: an object to attempt to convert a corresponding json-serializable object

Returns

(dict) A converted test_object

Warning

test_obj may also be converted in place.

great_expectations.core.util.ensure_json_serializable(data)

Helper function to convert an object to one that is json serializable :param data: an object to attempt to convert a corresponding json-serializable object

Returns

(dict) A converted test_object

Warning

test_obj may also be converted in place.

great_expectations.core.util.requires_lossy_conversion(d)
great_expectations.core.util.substitute_all_strftime_format_strings(data: Union[dict, list, str, Any], datetime_obj: Optional[datetime.datetime] = None) → Union[str, Any]

This utility function will iterate over input data and for all strings, replace any strftime format elements using either the provided datetime_obj or the current datetime

great_expectations.core.util.get_datetime_string_from_strftime_format(format_str: str, datetime_obj: Optional[datetime.datetime] = None) → str

This utility function takes a string with strftime format elements and substitutes those elements using either the provided datetime_obj or current datetime

great_expectations.core.util.parse_string_to_datetime(datetime_string: str, datetime_format_string: Optional[str] = None) → datetime.date
great_expectations.core.util.datetime_to_int(dt: datetime.date) → int
class great_expectations.core.util.AzureUrl(url: str)

Parses an Azure Blob Storage URL into its separate components. Formats:

WASBS (for Spark): “wasbs://<CONTAINER>@<ACCOUNT_NAME>.blob.core.windows.net/<BLOB>” HTTP(S) (for Pandas) “<ACCOUNT_NAME>.blob.core.windows.net/<CONTAINER>/<BLOB>”

Reference: WASBS – Windows Azure Storage Blob (https://datacadamia.com/azure/wasb).

AZURE_BLOB_STORAGE_PROTOCOL_DETECTION_REGEX_PATTERN :str = ^[^@]+@.+\.blob\.core\.windows\.net\/.+$
AZURE_BLOB_STORAGE_HTTPS_URL_REGEX_PATTERN :str = ^(https?:\/\/)?(.+?)\.blob\.core\.windows\.net/([^/]+)/(.+)$
AZURE_BLOB_STORAGE_HTTPS_URL_TEMPLATE :str = {account_name}.blob.core.windows.net/{container}/{path}
AZURE_BLOB_STORAGE_WASBS_URL_REGEX_PATTERN :str = ^(wasbs?:\/\/)?([^/]+)@(.+?)\.blob\.core\.windows\.net/(.+)$
AZURE_BLOB_STORAGE_WASBS_URL_TEMPLATE :str = wasbs://{container}@{account_name}.blob.core.windows.net/{path}
property protocol(self)
property account_name(self)
property account_url(self)
property container(self)
property blob(self)
class great_expectations.core.util.GCSUrl(url: str)

Parses a Google Cloud Storage URL into its separate components Format: gs://<BUCKET_OR_NAME>/<BLOB>

URL_REGEX_PATTERN :str = ^gs://([^/]+)/(.+)$
OBJECT_URL_TEMPLATE :str = gs://{bucket_or_name}/{path}
property bucket(self)
property blob(self)
class great_expectations.core.util.S3Url(url)
OBJECT_URL_TEMPLATE :str = s3a://{bucket}/{path}

//bucket/hello/world”) >>> s.bucket ‘bucket’ >>> s.key ‘hello/world’ >>> s.url ‘s3://bucket/hello/world’

>>> s = S3Url("s3://bucket/hello/world?qwe1=3#ddd")
>>> s.bucket
'bucket'
>>> s.key
'hello/world?qwe1=3#ddd'
>>> s.url
's3://bucket/hello/world?qwe1=3#ddd'
>>> s = S3Url("s3://bucket/hello/world#foo?bar=2")
>>> s.key
'hello/world#foo?bar=2'
>>> s.url
's3://bucket/hello/world#foo?bar=2'
Type
>>> s = S3Url("s3
property bucket(self)
property key(self)
property suffix(self)

Attempts to get a file suffix from the S3 key. If can’t find one returns None.

property url(self)
class great_expectations.core.util.DBFSPath

Methods for converting Databricks Filesystem (DBFS) paths

static convert_to_protocol_version(path: str)
static convert_to_file_semantics_version(path: str)
great_expectations.core.util.sniff_s3_compression(s3_url: S3Url) → str

Attempts to get read_csv compression from s3_url

great_expectations.core.util.get_or_create_spark_application(spark_config: Optional[Dict[str, str]] = None, force_reuse_spark_context: bool = False)
great_expectations.core.util.get_or_create_spark_session(spark_config: Optional[Dict[str, str]] = None)
great_expectations.core.util.spark_restart_required(current_spark_config: List[tuple], desired_spark_config: dict) → bool
great_expectations.core.util.get_sql_dialect_floating_point_infinity_value(schema: str, negative: bool = False) → float