Connect to data¶
Once you have a DataContext, you’ll want to connect to data. In Great Expectations, Datasources simplify connections, by managing configuration and providing a consistent, cross-platform API for referencing data.
Let’s configure your first Datasource: a connection to the data directory we’ve provided in the repo. This could also be a database connection, but for now we’re just using a simple file store:
Would you like to configure a Datasource? [Y/n]: <press enter> What data would you like Great Expectations to connect to? 1. Files on a filesystem (for processing with Pandas or Spark) 2. Relational database (SQL) : 1 What are you processing your files with? 1. Pandas 2. PySpark : 1 Enter the path (relative or absolute) of the root directory where the data files are stored. : data Give your new Datasource a short name. [data__dir]: <press enter> ... <some more output here> ... Would you like to proceed? [Y/n]: A new datasource 'data__dir' was added to your project. Would you like to profile new Expectations for a single data asset within your new Datasource? [Y/n]: n
That’s it! You just configured your first Datasource!
Make sure to choose
n at this prompt to exit the
init flow for now. Normally, the
init flow takes you through another step to create sample Expectations, but we want to jump straight to creating an Expectation Suite using the
scaffold method next.
Before continuing, let’s stop and unpack what just happened.
When you completed those last few steps in
great_expectations init, you told Great Expectations that:
You want to create a new Datasource called
You want to use Pandas to read the data from CSV.
Based on that information, the CLI added the following entry into your
great_expectations.yml file, under the
data__dir: data_asset_type: class_name: PandasDataset module_name: great_expectations.dataset batch_kwargs_generators: subdir_reader: class_name: SubdirReaderBatchKwargsGenerator base_directory: ../data class_name: PandasDatasource module_name: great_expectations.datasource
This datasource does not require any credentials. However, if you were to connect to a database that requires connection credentials, those would be stored in
In the future, you can modify or delete your configuration by editing your
config_variables.yml files directly.
For now, let’s move on to creating your first Expectations.