From Hard to Easy: Changing Database Schema

Fixing Databases with Lego

Problem and Solution Overview

This recipe just states how to execute a solution. Read our Legacy Newsletter blog post: Changing Database Schema to understand the specific problem we are solving and the solution approach.

This recipe helps you enable database refactoring for a project. You execute it once per project. You will need to set up 5 things:

  1. Get the Refactoring Databases book.
  2. Define the migration script structure.
  3. Create your tool to apply migrations.
  4. Add the tool to your dev and deploy processes.
  5. Start creating refactorings as migrations!

Get the Book

You want to read the book Refactoring Databases, by Ambler and Sadalage. It describes the general technique and provides recipes for common refactorings.

Migration Script Structure

Each migration script will consist of 5 parts. Each part scripts one specific activity. It is easiest to write each part in the default language of your database. For example, in SQL Server I write T-SQL while in Cachet I would write in Cache ObjectScript.

The 5 parts are:

  • begin
  • undo-begin
  • sync-data
  • data-sync-is-done
  • finish
  • undo-finish

I generally define my script format as a single file with 5 sections, divided by structured comments. This makes it easy to edit and test the file with my usual database development tools. For example, my default SQL Server script structure is:

-- migration
-- version: 26
-- section: begin

-- section: undo-begin

-- section: sync-data

-- section: data-sync-is-done

-- section: finish

-- section: undo-finish

I name the file 00026-purpose-of-migration.tsql. That way I can double-check that the version in the filename matches the version inside the file and prevent one category of errors. The rest of the filename serves the same purpose as a commit message: I can look at the folder to see a log of the changes made and why.

Migration Tool

This is usually a small script. I usually write it in either C# or Python, depending on the company. Either way, I implement these requirements in order:

  1. Console program.
  2. Parse arguments:
    • –migrations=path: specifies the folder to find scripts in. Defaults to cwd.
    • –action=action: one of migrate-top, sync-data, migrate-bottom, or all. Defaults to migrate-top.
    • –to=version|max|prior: specifies the destination version for the current action; stored as an int. max means the highest version found in the scripts folder. prior means max-1. Defaults to max.
    • –database=connection info: Defines the database to connect to, using whatever is right for the specific technologies involved. Required.
  3. Can connect to the database.
  4. Can parse a migration script file. Errors out on any parse error, including a version mismatch between filename and contents.
  5. Can ask the DB for current version range (typically stored in a version_info table that contains lowest_supported_version, highest_supported_version, and highest-data_sync_version fields). Returns “unknown” if none are present (missing table or row).
  6. Includes built-in migration definition to move between “unknown” and version 0.
    • begin creates the version_info table and inserts a single row with the value -1 for all versions.
    • undo-begin drops the version_info table.
    • All other steps do nothing.
  7. Given a current version and a target version, can determine which migrations it will need.
  8. Errors out if any required migration is not in the folder.
  9. Implements migrate-top:
    1. Iterate the migrations between the current highest_supported_version and the destination version, in order (up or down).
    2. For each, execute either the begin section (if iterating up) or the undo-begin section (if iterating down). Then set highest_supported_version to that migration’s version.
  10. Implements migrate-bottom. Same as migrate-top, except it uses lowest_supported_version, finish, and undo-finish.
  11. Implements sync-data. Creates a background process on the DB server that will:
    1. Load highest_data_sync_version. Stop and unschedule itself if that matches the target version.
    2. Load the migration with version highest_data_sync_version + 1.
    3. Run the sync-data section to update one chunk of work (that section should be re-runable and work on a chunk at a time).
    4. Run the data-sync-is-done section to see if all work is done. If so, update highest_data_sync_version.
    5. Schedule itself for another execution (or otherwise loop while giving time up to regular DB processing).
  12. Implements all. When migrating up versions it executes migrate-top, then sync-data and waits for data synchronization to complete, then migrate-bottom. It does the reverse to migrate down versions.
  13. Optionally, optimize all to make blocking-all. This does data sync in the foreground (don’t give up time slices to other DB work), as it will only be used in dev & test scenarios.

Dev Process

When a commit involves a schema change, you will generally do the following:

  1. Create a new migration script & set the version.
  2. Define begin to do the first part of the first phase of your refactoring.
  3. Run migrate --action=migrate-top to apply it.
  4. Inspect the schema and make sure you got the change you intended. Or use approval tests and a schema dump to verify the diff.
  5. Implement enough of undo-begin to reverse the change you just made.
  6. Run migrate --action=migrate-top --to=prior to get back to your initial state.
  7. Verify you are back to your initial state.
  8. Go back to step 2 to implement the next chunk. Repeat until you are done with begin and undo-begin. Make sure you remember triggers and other intermediate changes!
  9. Run migrate --action=migrate-top to get get to the state where data sync will happen.
  10. Implement one chunk of sync-data and the same chunk of data-sync-is-done.
  11. Run migrate --action=sync-data and verify the data, including that it updated everything.
  12. Go back to step 10 if you have more data to update.
  13. Run migrate --action=migrate-top --to=prior to undo everything, then migrate --action=migrate-top and migrate action=sync-data to verify that it all works when run at once.
  14. Build out finish and undo-finish in the same way as you built begin and undo-begin. You will use migrate --action=migrate-bottom to test.
  15. Update or add one data value in each changed part of the new schema.
  16. Run migrate --action=all --to=prior to get back to original state and verify that it is correct. In particular, make sure that your data change survived.
  17. Run migrate --action=all to get back to the next version.
  18. Check it all in.

With practice, you will be able to execute this series in less than 15 minutes for just about any single refactoring. Break up your changes if you are struggling to hit this timebox.

Deploy Process

Your automated deploy should do the following:

  1. migrate --action=migrate-top --to=<min version needed by app>
  2. Deploy the code.
  3. migrate --action=sync-data --to=<same version>
  4. Register a background process that waits until highest_data_sync_version equals highest_supported_version, and then runs migrate --action=migrate-bottom --to=<same version>

If you run multiple applications from the same database, then you can enable independent deployment by doing:

  1. Create an application_version_info table, containing fields application_name and required_version.
  2. Only do migrate-top if the app’s requierd version is higher than highest_supported_version.
  3. As part of deploying the code, update required_version for the application you are deploying.
  4. When data sync finishes, migrate bottom to the min required_version in the application_version_info table.

Start Creating Refactorings as Migrations!

The best way to learn refactorings is one at a time, as you need them. So disable your old way of updating the database. Grab a pair or a mob when you need to learn a new refactoring, and dive in!