Connect to GX Cloud with Python
Learn how to use GX Cloud from a Python script or interpreter, such as a Jupyter Notebook. You'll install Great Expectations, configure your GX Cloud environment variables, connect to sample data, build your first Expectation, validate data, and review the validation results through Python code.
Prerequisites
- You have internet access and download permissions.
- You have a GX Cloud account.
Prepare your environment
-
Download and install Python. See Active Python Releases.
-
Download and install pip. See the pip documentation.
Install GX
-
Run the following command in an empty base directory inside a Python virtual environment:
Terminal inputpip install great_expectations
It can take several minutes for the installation to complete.
Get your user access token and organization ID
You'll need your user access token and organization ID to set your environment variables. Don't commit your access tokens to your version control software.
-
In GX Cloud, click Settings > Tokens.
-
In the User access tokens pane, click Create user access token.
-
In the Token name field, enter a name for the token that will help you quickly identify it.
-
Click Create.
-
Copy and then paste the user access token into a temporary file. The token can't be retrieved after you close the dialog.
-
Click Close.
-
Copy the value in the Organization ID field into the temporary file with your user access token and then save the file.
GX recommends deleting the temporary file after you set the environment variables.
Set the GX Cloud Organization ID and user access token as environment variables
Environment variables securely store your GX Cloud access credentials.
-
Save your GX_CLOUD_ACCESS_TOKEN and GX_CLOUD_ORGANIZATION_ID as environment variables by entering
export ENV_VAR_NAME=env_var_value
in the terminal or adding the command to your~/.bashrc
or~/.zshrc
file. For example:Terminal inputexport GX_CLOUD_ACCESS_TOKEN=<user_access_token>
export GX_CLOUD_ORGANIZATION_ID=<organization_id>noteAfter you save your GX_CLOUD_ACCESS_TOKEN and GX_CLOUD_ORGANIZTION_ID, you can use Python scripts to access GX Cloud and complete other tasks. See the GX Core guides.
-
Optional. If you created a temporary file to record your user access token and Organization ID, delete it.
Create a Data Context
-
Run the following Python code to create a Data Context object:
Pythonimport great_expectations as gx
context = gx.get_context(mode="cloud")The Data Context will detect the previously set environment variables and connect to your GX Cloud account. You can verify that you have a GX Cloud Data Context with:
Pythonprint(type(context).__name__)
Connect to a Data Asset
-
Run the following Python code to connect to existing
.csv
data stored in thegreat_expectations
GitHub repository and create a Validator object:Pythonbatch = context.data_sources.pandas_default.read_csv(
"https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)The code example uses the default Data Source for Pandas to access the
.csv
data from the file at the specified URL path.Alternatively, if you have already configured your data in GX Cloud you can use it instead. To see your available Data Sources, run:
print(context.list_datasources())
Using the printed information you can get the name of one of your existing Data Sources, one of its Data Assets, and the name of a Batch Definition on the Data Asset. Then, you can retrieve a Batch of data by updating the values for
data_source_name
,data_asset_name
, andbatch_definition_name
in the following code and executing it:Pythondata_source_name = "my_data_source"
asset_name = "my_data_asset"
batch_definition_name = "my_batch_definition"
batch = (
gx.context.data_sources.get(data_source_name)
.get_asset(asset_name)
.get_batch_definition(batch_definition_name)
.get_batch()
)
Create Expectations
-
Run the following Python code to create two Expectations and save them to the Expectation Suite:
Pythonsuite = context.suites.add(ExpectationSuite(name="my_suite"))
# TODO: update where these expectations are imported
suite.add_expectation(gxe.ExpectColumnValuesToNotBeNull(column="pickup_datetime"))
suite.add_expectation(
gxe.ExpectColumnValuesToBeBetween(
column="passenger_count", min_value=1, max_value=6
)
)The first Expectation uses domain knowledge (the
pickup_datetime
shouldn't be null).The second Expectation uses explicit kwargs along with the
passenger_count
column.
Validate data
-
Run the following Python code to define a Checkpoint and examine the data to determine if it matches the defined Expectations:
Python# We no longer need to create a checkpoint to interactively validate data
-
Use the following command to return the Validation Results:
Pythonresults = batch.validate(suite)
-
Run the following Python code to view an HTML representation of the Validation Results in the generated Data Docs:
Pythonprint(results.describe())