Accomplish Your Goal in Small, Testable Steps
When you need to transform data, break the transformation into a sequence of small, simple, and independently tested sub-transforms connected into a data pipeline.
The Value Proposition
Data transformation projects start out easy and then they get messy. As you work, you find nuance after nuance in the data. Extending the program for the first several is easy. But soon the changes start interacting with each other, especially for error cases. Testing is especially difficult, as testing each variation requires testing the whole thing, often with multiple combinations.
The data pipeline design solves this problem. A data pipeline gives the following advantages:
- Addresses each nuance as its own independent sub-problem.
- Tests each part separately.
- Proves the whole thing works together without having to test it all together.
- Handles errors consistently and separately from the success path, so you can solve them separately.
Please log in
This content is only available to students. Please log in to access it.