Validate your data using a Checkpoint¶
Validation is the core operation of Great Expectations: “Validate data X against Expectation Y.”
In normal usage, the best way to validate data is with a Checkpoint. Checkpoints bundle Batches of data with corresponding Expectation Suites for validation.
Let’s set up our first Checkpoint! Go back to your terminal and shut down the Jupyter notebook, if you haven’t yet. Then run the following command:
great_expectations checkpoint new staging.chk taxi.demo
From there, you will be prompted by the CLI to configure the Checkpoint:
Heads up! This feature is Experimental. It may change. Please give us your feedback! Which table would you like to use? (Choose one) 1. yellow_tripdata_sample_2019_01 (table) 2. yellow_tripdata_staging (table) Do not see the table in the list above? Just type the SQL query 2 A checkpoint named `staging.chk` was added to your project! ...
What just happened?
staging.chkis the name of your new Checkpoint.
The Checkpoint uses
taxi.demoas its primary Expectation Suite.
You configured the Checkpoint to validate the
That way, we can simply run the Checkpoint each time we have new data loaded to
yellow_tripdata_staging and validate that the data meets our Expectations!
How to validate data by running Checkpoints¶
The final step in this tutorial is to use our Expectation Suite to alert us of the 0 values in the
passenger_count column in the staging data! Run the Checkpoint we just created to trigger validation of the staging data:
great_expectations checkpoint run staging.chk
This will output the following:
Heads up! This feature is Experimental. It may change. Please give us your feedback! Validation failed!
What just happened?
We ran the Checkpoint and it successfully failed! Wait - what? Yes, that’s correct, and that’s we wanted. We know that in this example, the staging data has data quality issues, which means we expect the validation to fail. Let’s open up Data Docs again to see the details.
If you navigate to the Data Docs Home page and refresh, you will now see a failed validation run at the top of the page:
If you click through to the validation results page, you will see that the validation of the staging data failed because the set of Observed Values in the
passenger_count column contained the value 0.0! This violates our Expectation, which makes the validation fail.
And this is it!
We have successfully created an Expectation Suite based on historical data, and used it to detect an issue with our new data. Congratulations! You have now completed the “Getting started with Great Expectations” tutorial.
Wrap-up and next steps¶
In this tutorial, we have covered the following basic capabilities of Great Expectations:
Setting up a Data Context
Connecting a Data Source
Creating an Expectation Suite using a automated profiling
Exploring validation results in Data Docs
Validating a new batch of data with a Checkpoint
As a final, optional step, you can check out the next section on how to customize your deployment in order to configure options such as where to store Expectations, validation results, and Data Docs.
And if you want to stop here, feel free to join our Slack community to say hi to fellow Great Expectations users in the #beginners channel!