Extending a System

Create the core of a data transformation application.

Any data transformation application consists of two parts:

  • The boundary, which talks with data sources and destinations.
  • The core, which transforms the data from the way it is represented in the sources to the way each destination wants it.

The boundary is much more difficult to test, because it involves interacting with other systems.

From another perspective, we can think of the job of a data transformation application in two parts:

  • Load and store data.
  • Understand, parse, and clean data.

Parsing, understanding, and cleaning data is much more difficult to test, simply because of the wide variety of possible input variations and messy data.

The key to keeping a data transformation application simple is to keep the hard parts separate. In other words, we need to ensure that the boundary code only loads and stores opaque data. All parsing, understanding, or cleaning data happens only in the core.

This module focuses on the core; the next module will focus on the boundary.

Implementing all the parsing, understanding, and data cleaning logic in the core requires two techniques: stick-figure testing and the data pipeline design.


1 — Introduce and Test New Functionality

Identify the problem before assuming solutions.

2 — Move Functionality to Product

Create design now that code is written.

3 — Merge and Re-use Product Code across Tests

Use stick-figure testing to modify existing code.

4 — Accomplish Your Goal in Small, Testable Steps

Use a data pipeline to make transforms easy.