Fixing Databases with Lego
Problem and Solution Overview
This recipe just states how to execute a solution. Read our Legacy Newsletter Edition: Changing Database Schema (25 June) to understand the specific problem we are solving and the solution approach.
This recipe helps you enable database refactoring for a project. You execute it once per project. You will need to set up 5 things:
- Get the Refactoring Databases book.
- Define the migration script structure.
- Create your tool to apply migrations.
- Add the tool to your dev and deploy processes.
- Start creating refactorings as migrations!
Get the Book
You want to read the book Refactoring Databases, by Ambler and Sadalage. It describes the general technique and provides recipes for common refactorings.
Migration Script Structure
Each migration script will consist of 5 parts. Each part scripts one specific activity. It is easiest to write each part in the default language of your database. For example, in SQL Server I write T-SQL while in Cachet I would write in Cache ObjectScript.
The 5 parts are:
I generally define my script format as a single file with 5 sections, divided by structured comments. This makes it easy to edit and test the file with my usual database development tools. For example, my default SQL Server script structure is:
-- migration -- version: 26 -- section: begin -- section: undo-begin -- section: sync-data -- section: data-sync-is-done -- section: finish -- section: undo-finish
I name the file
00026-purpose-of-migration.tsql. That way I can double-check that the version in the filename matches the version inside the file and prevent one category of errors. The rest of the filename serves the same purpose as a commit message: I can look at the folder to see a log of the changes made and why.
This is usually a small script. I usually write it in either C# or Python, depending on the company. Either way, I implement these requirements in order:
- Console program.
- Parse arguments:
path: specifies the folder to find scripts in. Defaults to cwd.
action: one of
all. Defaults to
version|max|prior: specifies the destination version for the current action; stored as an int.
maxmeans the highest version found in the scripts folder.
max-1. Defaults to
connection info: Defines the database to connect to, using whatever is right for the specific technologies involved. Required.
- Can connect to the database.
- Can parse a migration script file. Errors out on any parse error, including a version mismatch between filename and contents.
- Can ask the DB for current version range (typically stored in a version_info table that contains
highest-data_sync_versionfields). Returns “unknown” if none are present (missing table or row).
- Includes built-in migration definition to move between “unknown” and version 0.
begincreates the version_info table and inserts a single row with the value -1 for all versions.
undo-begindrops the version_info table.
- All other steps do nothing.
- Given a current version and a target version, can determine which migrations it will need.
- Errors out if any required migration is not in the folder.
- Iterate the migrations between the current
highest_supported_versionand the destination version, in order (up or down).
- For each, execute either the
beginsection (if iterating up) or the
undo-beginsection (if iterating down). Then set
highest_supported_versionto that migration’s version.
- Iterate the migrations between the current
migrate-bottom. Same as
migrate-top, except it uses
sync-data. Creates a background process on the DB server that will:
highest_data_sync_version. Stop and unschedule itself if that matches the target version.
- Load the migration with version
- Run the
sync-datasection to update one chunk of work (that section should be re-runable and work on a chunk at a time).
- Run the
data-sync-is-donesection to see if all work is done. If so, update
- Schedule itself for another execution (or otherwise loop while giving time up to regular DB processing).
all. When migrating up versions it executes
sync-dataand waits for data synchronization to complete, then
migrate-bottom. It does the reverse to migrate down versions.
- Optionally, optimize
blocking-all. This does data sync in the foreground (don’t give up time slices to other DB work), as it will only be used in dev & test scenarios.
When a commit involves a schema change, you will generally do the following:
- Create a new migration script & set the version.
beginto do the first part of the first phase of your refactoring.
migrate --action=migrate-topto apply it.
- Inspect the schema and make sure you got the change you intended. Or use approval tests and a schema dump to verify the diff.
- Implement enough of
undo-beginto reverse the change you just made.
migrate --action=migrate-top --to=priorto get back to your initial state.
- Verify you are back to your initial state.
- Go back to step 2 to implement the next chunk. Repeat until you are done with
undo-begin. Make sure you remember triggers and other intermediate changes!
migrate --action=migrate-topto get get to the state where data sync will happen.
- Implement one chunk of
sync-dataand the same chunk of
migrate --action=sync-dataand verify the data, including that it updated everything.
- Go back to step 10 if you have more data to update.
migrate --action=migrate-top --to=priorto undo everything, then
migrate action=sync-datato verify that it all works when run at once.
- Build out
undo-finishin the same way as you built
undo-begin. You will use
migrate --action=migrate-bottomto test.
- Update or add one data value in each changed part of the new schema.
migrate --action=all --to=priorto get back to original state and verify that it is correct. In particular, make sure that your data change survived.
migrate --action=allto get back to the next version.
- Check it all in.
With practice, you will be able to execute this series in less than 15 minutes for just about any single refactoring. Break up your changes if you are struggling to hit this timebox.
Your automated deploy should do the following:
migrate --action=migrate-top --to=<min version needed by app>
- Deploy the code.
migrate --action=sync-data --to=<same version>
- Register a background process that waits until
highest_supported_version, and then runs
migrate --action=migrate-bottom --to=<same version>
If you run multiple applications from the same database, then you can enable independent deployment by doing:
- Create an application_version_info table, containing fields
- Only do migrate-top if the app’s requierd version is higher than
- As part of deploying the code, update
required_versionfor the application you are deploying.
- When data sync finishes, migrate bottom to the min
required_versionin the application_version_info table.
Start Creating Refactorings as Migrations!
The best way to learn refactorings is one at a time, as you need them. So disable your old way of updating the database. Grab a pair or a mob when you need to learn a new refactoring, and dive in!