Version: 0.18.21

Connect to SQL database Data Assets

Use the information provided here to connect to Data Assets stored in SQL databases. Great Expectations (GX) uses SQLAlchemy to connect to SQL Data Assets, and most of the SQL dialects supported by SQLAlchemy are also supported by GX. For more information about the SQL dialects supported by SQLAlchemy, see Dialects.

Snowflake

Connect GX to a Snowflake database to access Data Assets.

Prerequisites

An installation of GX set up to work with SQL
Source data stored in a Snowflake database

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Determine your connection string

The following code examples use a Snowflake connection string. A Snowflake connection string connects GX to the Snowflake database.

The following code is an example of a Snowflake connection string format:

 my_connection_string = "snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME_OR_LOCATOR>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>"

Account Names and Locators

Snowflake accepts both account names and account locators as valid account identifiers when constructing a connection string.

Account names uniquely identify an account within your organization and are the preferred method of account identification.

Account locators act in the same manner but are auto-generated by Snowflake based on the cloud platform and region used.

For more information on both methods, see Account Identifiers

Create a Snowflake Data Source

Run the following Python code to set the name and connection_string variables:
```
datasource_name = "my_snowflake_datasource"
```

Run the following Python code to create a Snowflake Data Source:

datasource = context.sources.add_snowflake(
    name=datasource_name, 
    connection_string=my_connection_string, # Or alternatively, individual connection args
)

Passing individual connection arguments instead of connection_string

Although a connection string is the standard way to yield a connection to a database, the Snowflake datasource supports individual connection arguments to be passed in as an alternative.

The following arguments are supported:

account
user
password
database
schema
warehouse
role
numpy

Passing these values as keyword args to add_snowflake is functionally equivalent to passing in a connection_string.

For more information, check out Snowflake's official documentation on the Snowflake SQLAlchemy toolkit.

Connect to the data in a table (Optional)

Run the following Python code to set the asset_name and asset_table_name variables:
```
asset_name = "my_asset"
asset_table_name = "my_table_name"
```

Run the following Python code to create the Data Asset:

table_asset = datasource.add_table_asset(name=asset_name, table_name=asset_table_name)

Connect to the data in a query (Optional)

Run the following Python code to define a Query Data Asset:

asset_name = "my_query_asset"
query = "SELECT * from yellow_tripdata_sample_2019_01"

Run the following Python code to create the Data Asset:

query_asset = datasource.add_query_asset(name=asset_name, query=query)

Add additional tables or queries (Optional)

Repeat the previous steps to add additional Data Assets.

PostgreSQL

Connect GX to a PostgreSQL database to access Data Assets.

Prerequisites

An installation of GX set up to work with PostgreSQL
Source data stored in a PostgreSQL database

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Determine your connection string

The following code examples use a PostgreSQL connection string. A PostgreSQL connection string connects GX to the PostgreSQL database.

The following code is an example of a PostgreSQL connection string format:

Python
my_connection_string = (
    "postgresql+psycopg2://<username>:<password>@<host>:<port>/<database>"
)

Is there a more secure way to store my credentials than plain text in a connection string?

We recommend that database credentials be stored in the config_variables.yml file, which is located in the uncommitted/ folder by default, and is not part of source control. The following lines add database credentials under the key db_creds.

YAML file contents
db_creds:
  drivername: postgres
  host: '<your_host_name>'
  port: '<your_port>'
  username: '<your_username>'
  password: '<your_password>'
  database: '<your_database_name>'

For additional options on configuring the config_variables.yml file or additional environment variables, please see our guide on how to configure credentials.

Create a PostgreSQL Data Source

Run the following Python code to set the name and connection_string variables:

Python
datasource_name = "my_datasource"
my_connection_string = (
    "postgresql+psycopg2://<username>:<password>@<host>:<port>/<database>"
)

Run the following Python code to create a PostgreSQL Data Source:

Python
datasource = context.sources.add_postgres(
    name=datasource_name, connection_string=my_connection_string
)

Connect to a specific set of data with a Data Asset

To connect the Data Source to a specific set of data in the database, you define a Data Asset in the Data Source. A Data Source can contain multiple Data Assets. Each Data Asset acts as the interface between GX and the specific set of data it is configured for.

With SQL databases, you can use Table or Query Data Assets. The Table Data Asset connects GX to the data contained in a single table in the source database. The Query Data Asset connects GX to the data returned by a SQL query.

Maximum allowable Data Assets for a Data Source

Although there isn't a maximum number of Data Assets you can define for a Data Source, you must create a single Data Asset to allow GX to retrieve data from your Data Source.

Connect a Data Asset to the data in a table (Optional)

Run the following Python code to identify the table to connect to with a Table Data Asset:
Python
```
asset_name = "my_table_asset"
asset_table_name = "postgres_taxi_data"
```

Run the following Python code to create the Data Asset:

Python
table_asset = datasource.add_table_asset(name=asset_name, table_name=asset_table_name)

Connect a Data Asset to the data returned by a query (Optional)

Run the following Python code to define a Query Data Asset:

Python
asset_name = "my_query_asset"
asset_query = "SELECT * from postgres_taxi_data"

Run the following Python code to create the Data Asset:

Python
query_asset = datasource.add_query_asset(name=asset_name, query=asset_query)

Connect to additional tables or queries (Optional)

Repeat the previous steps to add additional Data Assets.

SQLite

Connect GX to a SQLite database to access Data Assets.

Prerequisites

An installation of GX set up to work with SQLite
Source data stored in a SQLite database

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Determine your connection string

The following code examples use a SQLite connection string. A SQLite connection string connects GX to the SQLite database.

The following code is an example of a SQLite connection string format:

Python
my_connection_string = "sqlite:///<path_to_db_file>"

Create a SQLite Data Source

Run the following Python code to set the name and connection_string variables:
Python
```
datasource_name = "my_datasource"
```
Run the following Python code to create a SQLite Data Source:
Python
```
datasource = context.sources.add_sqlite(
    name=datasource_name, connection_string=my_connection_string
)
```
Using add_sql(...) instead of add_sqlite(...)
The SQL Data Source created with add_sql can connect to data in a SQLite database. However, add_sqlite(...) is the preferred method.
SQLite stores datetime values as strings. Because of this, a general SQL Data Source sees datetime columns as string columns. A SQLite Data Source has additional handling in place for these fields, and also has additional error reporting for SQLite specific issues.
If you are working with SQLite Data Source, use add_sqlite(...) to create your Data Source.

Connect to the data in a table (Optional)

Run the following Python code to set the asset_name and asset_table_name variables:
Python
```
asset_name = "my_asset"
asset_table_name = my_table_name
```

Run the following Python code to create the Data Asset:

Python
table_asset = datasource.add_table_asset(name=asset_name, table_name=asset_table_name)

Connect to the data in a query (Optional)

Run the following Python code to define a Query Data Asset:

Python
asset_name = "my_query_asset"
query = "SELECT * from yellow_tripdata_sample_2019_01"

Run the following Python code to create the Data Asset:

Python
query_asset = datasource.add_query_asset(name=asset_name, query=query)

Add additional tables or queries (Optional)

Repeat the previous steps to add additional Data Assets.

Databricks SQL

Connect GX to Databricks to access Data Assets.

Prerequisites

An installation of GX set up to work with SQL
Source data stored in a Databricks cluster

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Determine your connection string

The following code examples use a Databricks SQL connection string. A connection string connects GX to Databricks.

The following code is an example of a Databricks SQL connection string format:

my_connection_string = f"databricks://token:{token}@{host}:{port}?http_path={http_path}&catalog={catalog}&schema={schema}"

Create a Databricks SQL Data Source

Run the following Python code to set the name and connection_string variables:
```
datasource_name = "my_databricks_sql_datasource"
```

Run the following Python code to create a Snowflake Data Source:

datasource = context.sources.add_databricks_sql(
    name=datasource_name, 
    connection_string=my_connection_string,
)

Connect to the data in a table (Optional)

Run the following Python code to set the asset_name and asset_table_name variables:
```
asset_name = "my_asset"
asset_table_name = my_table_name
```

Run the following Python code to create the Data Asset:

table_asset = datasource.add_table_asset(name=asset_name, table_name=asset_table_name)

Connect to the data in a query (Optional)

Run the following Python code to define a Query Data Asset:

asset_name = "my_query_asset"
query = "SELECT * from yellow_tripdata_sample_2019_01"

Run the following Python code to create the Data Asset:

query_asset = datasource.add_query_asset(name=asset_name, query=query)

Add additional tables or queries (Optional)

Repeat the previous steps to add additional Data Assets.

BigQuery SQL

Connect GX to a BigQuery SQL database to access Data Assets.

Prerequisites

An installation of GX set up to work with SQL.
Read/write access to a BigQuery database.

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Create a BigQuery Data Source.

note

Tables that are created by BigQuery queries are automatically set to expire after one day.

Run the following code to create a Data Source that connects to data in BigQuery:

Python
datasource = context.sources.add_or_update_sql(
    name="my_bigquery_datasource",
    connection_string="bigquery://<gcp_project_name>/<bigquery_dataset>",
)

In the example, you created a Data Source named my_bigquery_datasource, using the add_or_update_sql method and passed it in a connection string.

Create Assets

You can add a BigQuery Asset into your Data Source as a table asset or query asset.

In the following example, a table Asset named my_table_asset is built by naming the table in your BigQuery Database.

Python
table_asset = datasource.add_table_asset(name="my_table_asset", table_name="taxi_data")

In the following example, a query Asset named my_query_asset is built by submitting a query to the taxi_data table.

Python
query_asset = datasource.add_query_asset(
    name="my_query_asset", query="SELECT * from taxi_data"
)

SQL

Connect GX to a SQL database to access Data Assets.

Prerequisites

An installation of GX set up to work with SQL
Source data stored in a SQL database

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Determine your connection string

GX supports numerous SQL Data Sources. However, most SQL dialects have their own specifications for defining a connection string. See the dialect documentation to determine the connection string for your SQL database.

Some examples of different connection strings:

The following are some of the connection strings that are available for different SQL dialects:

AWS Athena: awsathena+rest://@athena.<REGION>.amazonaws.com/<DATABASE>?s3_staging_dir=<S3_PATH>
BigQuery: bigquery://<GCP_PROJECT>/<BIGQUERY_DATASET>?credentials_path=/path/to/your/credentials.json
MSSQL: mssql+pyodbc://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>?driver=<DRIVER>&charset=utf&autocommit=true
MySQL: mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
PostgreSQL: postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
Redshift: postgresql+psycopg2://<USER_NAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>?sslmode=<SSLMODE>
Snowflake: snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss
SQLite: sqlite:///<PATH_TO_DB_FILE>
Trino: trino://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<CATALOG>/<SCHEMA>

Run one of the connection strings in your preferred SQL dialect to store the connection string in the connection_string variable with plain text credentials. The following code is an example of the PostgreSQL connection string format:

Python
connection_string = "postgresql+psycopg2://username:my_password@localhost/test"

Is there a more secure way to include my credentials?

You can use environment variables or a key in config_variables.yml to store connection string passwords. After you define your password, you reference it in your connection string similar to this example:

Python
connection_string = (
    "postgresql+psycopg2://<username>:${MY_PASSWORD}@<host>:<port>/<database>"
)

In the previous example MY_PASSWORD is the name of the environment variable, or the key to the value in config_variables.yml that corresponds to your password.

If you include a password as plain text in your connection string when you define your Data Source, GX automatically removes it, adds it to config_variables.yml, and substitutes it in the Data Source saved configuration with a variable.

Create a SQL Data Source

Run the following Python code to create a SQL Data Source:

Python
datasource = context.sources.add_sql(
    name="my_datasource", connection_string=connection_string
)

Connect to the data in a table (Optional)

Run the following Python code to set the asset_name and asset_table_name variables:
Python
```
asset_name = "my_asset"
asset_table_name = my_table_name
```

Run the following Python code to create the Data Asset:

Python
table_asset = datasource.add_table_asset(name=asset_name, table_name=asset_table_name)

Connect to the data in a query (Optional)

Run the following Python code to define a Query Data Asset:

Python
asset_name = "my_query_asset"
query = "SELECT * from yellow_tripdata_sample_2019_01"

Run the following Python code to create the Data Asset:

Python
query_asset = datasource.add_query_asset(name=asset_name, query=query)

Add additional tables or queries (Optional)

Repeat the previous steps to add additional Data Assets.

Snowflake​

Prerequisites​

Import GX and instantiate a Data Context​

Determine your connection string​

Create a Snowflake Data Source​

Connect to the data in a table (Optional)​

Connect to the data in a query (Optional)​

Add additional tables or queries (Optional)​

PostgreSQL​

Prerequisites​

Import GX and instantiate a Data Context​

Determine your connection string​

Create a PostgreSQL Data Source​

Connect to a specific set of data with a Data Asset​

Connect a Data Asset to the data in a table (Optional)​

Connect a Data Asset to the data returned by a query (Optional)​

Connect to additional tables or queries (Optional)​

SQLite​

Prerequisites​

Import GX and instantiate a Data Context​

Determine your connection string​

Create a SQLite Data Source​

Connect to the data in a table (Optional)​

Connect to the data in a query (Optional)​

Add additional tables or queries (Optional)​

Databricks SQL​

Prerequisites​

Import GX and instantiate a Data Context​

Determine your connection string​

Create a Databricks SQL Data Source​

Connect to the data in a table (Optional)​

Connect to the data in a query (Optional)​

Add additional tables or queries (Optional)​

BigQuery SQL​

Prerequisites​

Import GX and instantiate a Data Context​

Create a BigQuery Data Source.​

Create Assets​

SQL​

Prerequisites​

Import GX and instantiate a Data Context​

Determine your connection string​

Create a SQL Data Source​

Connect to the data in a table (Optional)​

Connect to the data in a query (Optional)​

Add additional tables or queries (Optional)​

Related documentation​

Snowflake

Prerequisites

Import GX and instantiate a Data Context

Determine your connection string

Create a Snowflake Data Source

Connect to the data in a table (Optional)

Connect to the data in a query (Optional)

Add additional tables or queries (Optional)

PostgreSQL

Prerequisites

Import GX and instantiate a Data Context

Determine your connection string

Create a PostgreSQL Data Source

Connect to a specific set of data with a Data Asset

Connect a Data Asset to the data in a table (Optional)

Connect a Data Asset to the data returned by a query (Optional)

Connect to additional tables or queries (Optional)

SQLite

Prerequisites

Import GX and instantiate a Data Context

Determine your connection string

Create a SQLite Data Source

Connect to the data in a table (Optional)

Connect to the data in a query (Optional)

Add additional tables or queries (Optional)

Databricks SQL

Prerequisites

Import GX and instantiate a Data Context

Determine your connection string

Create a Databricks SQL Data Source

Connect to the data in a table (Optional)

Connect to the data in a query (Optional)

Add additional tables or queries (Optional)

BigQuery SQL

Prerequisites

Import GX and instantiate a Data Context

Create a BigQuery Data Source.

Create Assets

SQL

Prerequisites

Import GX and instantiate a Data Context

Determine your connection string

Create a SQL Data Source

Connect to the data in a table (Optional)

Connect to the data in a query (Optional)

Add additional tables or queries (Optional)

Related documentation