How to configure a BigQuery Datasource¶
This guide will help you add a BigQuery project (or a dataset) as a Datasource. This will allow you to validate tables and queries within this project. When you use a BigQuery Datasource, the validation is done in BigQuery itself. Your data is not downloaded.
Prerequisites: This how-to guide assumes you have already:
Set up a working deployment of Great Expectations
Followed the Google Cloud library guide for authentication
Installed the pybigquery package for the BigQuery sqlalchemy dialect (
pip install pybigquery)
Run the following CLI command to begin the interactive Datasource creation process:
great_expectations datasource new
Choose “Big Query” from the list of database engines, when prompted.
Identify the connection string you would like Great Expectations to use to connect to BigQuery, using the examples below and the PyBigQuery documentation.
If you want Great Expectations to connect to your BigQuery project (without specifying a particular dataset), the URL should be:
If you want Great Expectations to connect to a particular dataset inside your BigQuery project, the URL should be:
If you want Great Expectations to connect to one of the Google’s public datasets, the URL should be:
Enter the connection string when prompted (and press Enter when asked “Would you like to proceed? [Y/n]:”).
Should you need to modify your connection string, you can manually edit the
Environment variables can be used to store the SQLAlchemy URL instead of the file, if preferred - search documentation for “Managing Environment and Secrets”.