Most data science and data engineering teams end up building some form of pipeline testing, eventually. Unfortunately, many teams don’t get around to it until late in the game, long after early lessons from data exploration and model development have been forgotten.
In the meantime, data pipelines often become deep stacks of unverified assumptions. Mysterious (and sometimes embarrassing) bugs crop up more and more frequently. Resolving them requires painstaking exploration of upstream data, often leading to frustrating negotiations about data specs across teams.
It’s not unusual to see data teams grind to a halt for weeks (or even months!) to pay down accumulated pipeline debt. This work is never fun—after all, it’s just data cleaning: no new products shipped; no new insights kindled. Even worse, it’s re-cleaning old data that you thought you’d already dealt with. In our experience, servicing pipeline debt is one of the biggest productivity and morale killers on data teams.
We strongly believe that most of this pain is avoidable. We built Great Expectations to make it very, very simple to
- set up your testing framework early,
- capture those early learnings while they’re still fresh, and
- systematically validate new data against them.
It’s the best tool we know of for managing the complexity that inevitably grows within data pipelines. We hope it helps you as much as it’s helped us.
Good night and good luck!