Discover and Automate Manual Processes

Automating Yourself with Checklists

Problem and Solution Overview

This recipe just states how to execute a solution. Read our Legacy Newsletter blog post: DevOps #1 – Automating Manual Processes to understand the specific problem we are solving and the solution approach.

This recipe helps you automate one manual process of any size. You can also execute it recursively on sub-processes.

  1. Observe and Record Checklist
    1. The next time someone performs the process
    2. The second time
    3. When things go horribly wrong
  2. Automate Running the Checklist
    1. Pick your scripting language
    2. The third process execution
    3. The fourth process execution
  3. Identify and Automate Working Sub-steps
    1. To automate one action
    2. To automate one condition
    3. To automate one verification
  4. Automate Everything
    1. If remaining steps are too hard
    2. If remaining steps aren’t annoying enough to be worth automation
    3. Clean up

Observe and Record Checklist

Our goal is to spend as little as possible to make our process repeatable. Once it is repeatable we can then iterate our discovery until it is correct. Then we can iteratively automate parts.

So first we just need to get something written down, repeatable, and in source control. We don’t attempt anything more than this. This happens in two steps.

The next time someone performs the process

  1. Pick an Actor. Choose whoever is most familiar with the process.
  2. Pick an observer. Choose someone unfamiliar with the process.
  3. Observer, ask the Actor what steps he expects to perform.
  4. Write that as a numbered list in a text file.
  5. Create a new branch to work on. Choose some convention such as processes/whatever-we-are-automating/main.
  6. Pick the right folder for the file and check it in.
  7. Actor, start doing the task. Observer watch with file open.
  8. Observer update checklist to include steps or details the Actor didn’t think of at first. Pause Actor if you need time to type.
  9. Observer remind Actor of any steps that he put in the checklist and forgets to do. Do one of the following:
    1. If Actor just forgot the step, then take the action.
    2. If Actor realizes the step is needed, remove it from the checklist.
    3. If Actor realizes the step is needed sometimes, but not this time, add a condition to the checklist stating when to skip the step.
  10. Check in the updated checklist.

The second time

  1. Pick an Actor. Choose someone unfamiliar with the process. This could be the Observer from the last iteration or someone new.
  2. Pick a Guide. Choose someone familiar with the process, possibly the prior Actor.
  3. Review the checklist together. Discuss and make any necessary changes for the new situation.
    1. Add conditions if there are steps you plan to skip this time.
    2. Add new steps that you didn’t need last time, with a condition stating why you didn’t need it last time.
  4. Check in the updated checklist.
  5. Perform the process. But this time, the Guide walks through the checklist and tells the Actor what to do. The Actor does exactly what he is told.
  6. As you perform each step, fix errors that you find in the checklist.
    1. Sometimes the Actor will misinterpret the action to take or will not know what to do. The step is incomplete; add the missing information.
    2. Sometimes the Guide will see something that you need to do which isn’t in the checklist. Add it. Remember, your goal is automation, so you will need to get every detail into the checklist. Leave nothing up to interpretation or common sense.
    3. Make sure to add any new conditions you encounter.
  7. Check in the checklist when you have finished.
  8. Now have the Actor ask the Guide how to verify that the process had the desired result.
  9. Record the Guide’s answer into a new section of the checklist. You should now have both a process and a test of the final state.
  10. Check in the checklist.

When things go horribly wrong

We are still in discovery so it is likely that things will go horribly wrong. Often we will discover that we’ve took a wrong direction several commits of the checklist ago. That’s OK. Just keep this branch alive as a possible direction, go back to where you made the wrong turn, and try another approach.

Revert back to whatever step you were in on the process at that time. For example, if your checklist was the one that you’d saved halfway through your first use of the process, then resume from there as if this were the first time through the process.

Using git, I perform the following branch operations to revert and explore a new direction.

git branch "processes/whatever-we-are-automating/probably-wrong/direction-we-attempted"
git reset --hard <sha for the last good commit>
git push -f

Sometimes I will later discover that my first approach was the right one. I can then get back to it by resetting processes/whatever-we-are-automating/main to point to processes/whatever-we-are-automating/probably-wrong/direction-we-attempted.

Note that this mechanism will rewrite history in a way that would cause problems if others had pulled and then performed their own commits on this process’s main. However, your team is likely only running this processes once at a time. If not, then only one set of people should make changes at a single time. Therefore, everyone can always pull before they perform a process and rewriting history will not cause problems.

Automate Running the Checklist

In this step we automate the Guide role. The Actor is still a human, and we bring back an Observer to help fix errors in the checklist and its validation. This also sets us up to start incrementally automating things.

Pick your scripting language

You will want a real scripting language; shell will not be enough. You will also want a language that can easily perform file system operations and launch other programs. I recommend the following:

  1. If you are a JS-only company, then JavaScript (node).
  2. Otherwise Python.
  3. You could also consider Ruby or Perl for special circumstances, but usually Python or JS will have fewer difficulties over the long haul, simply because they invite less magic and so make refactoring easier.

The third process execution

  1. Pick an Actor and an Observer. If neither has much experience with the process, make sure that someone experienced is available to be on call.
  2. Convert your checklist into a trivial script that Guides the checklist.
    1. Fill in the overall script skeleton like in the example below.
    2. Wrap each action step in a tell() function.
    3. Wrap each condition step in an ask() function, wrapping the stuff to possibly skip as a lambda (or a named function if there’s a lot there).
    4. Define tell() and ask() as defined below (probably in a library, as you will use the same functions for every process you automate).
  3. Check it in.
  4. Start the script. Actor do as told unless the script asks for something stupid. If so, pause the script, fix it, re-start and skip ahead to where you were.
  5. Check in the final script.
  6. Consider how you would verify that you did everything correctly. Extend the verification section with any new verifications you thought of. Fix any errors.
  7. Check it in.
  8. Convert the verification into a second guided method in the same script, by wrapping each step in tell() and verify() functions, also as defined below. Finish the verification with a call to print_test_results().
  9. Check it in.
  10. Execute the verification function.
    1. If the verification misses something, add a new verification step and restart verification (skipping steps as needed).
    2. If the verification finds a problem, determine whether the fault is in the verification or the process.
    3. Fix the verification or the process, then restart that portion and skip steps as needed.
  11. Check it in.
The library functions you will need
import datetime

class Clock(object):
    def __init__(self):
        self.reset()
    def reset(self):
        self.accumulator = datetime.timedelta(0)
        self.started = None
    def start(self):
        if not self.started:
            self.started = datetime.datetime.utcnow()
    def stop(self):
        if self.started:
            self.accumulator += (
                datetime.datetime.utcnow() - self.started
            )
            self.started = None
    @property
    def elapsed(self):
        if self.started:
            return self.accumulator + (
                datetime.datetime.utcnow() - self.started
            )
        return self.accumulator
    def __repr__(self):
        return "<Clock {} ({})>".format(
            self.elapsed,
            'started' if self.started else 'stopped'
        )

def _query(self, prefix, question, first_answer, second_answer):
   for item in prefix:
      print(item)
   return input("{question} ({first_answer}/{second_answer})").to_lower().starts_with(first_answer.to_lower())

class Process:
   def __init__(self):
      self._automated = Clock()
      self._automated.start()
      self._manual = Clock()
      self._test_result = []
      self._automated_steps = 0
      self._manual_steps = 0

   class ManualSection:
      def __init__(self, go):
         self.go = go
      def __enter__(self):
         self.go._automated.stop()
         self.go._manual.start()
      def __exit__(self):
         self.go._manual.stop()
         self.go._automated.start()

   @classmethod
   def run(perform, verify):
      go = Process()
      with ManualSection(self):
        do_perform = _query([], "Do you wish to perform the process or verify it?", "P", "V")
      if(do_perform):
         perform(go)
      else:
         verify(go)
      go._print_stats()

   def do(self, operation):
      self._automatic_steps += 1
      operation()

   def tell(self, message):
      self._manual_steps += 1
      with ManualSection(self):
         print(message)
         input("press enter when done")

   def ask(self, condition, operation):
      self._manual_steps += 1
      with ManualSection(self):
         should_do_it = _query([condition], "Should I perform this step?", "Y", "N")
      if(should_do_it):
         operation()

   def ask_yes_no(self, condition):
      self._manual_steps += 1
      with ManualSection(self):
         return = _query([condition], "Should I perform this step?", "Y", "N")

   def if(self, condition, operation):
      self._automatic_steps += 1
      if(condition()):
         operation()

   def verify(self, condition):
      initial = self._manual_steps
      if(not condition()):
         self._test_result.append(f"Failed expectation: {condition}")
      if(initial == self._manual_steps
):
         self._automated_steps += 1

   def that(self, condition):
      def impl():
         self._manual_steps += 1
         with ManualSection(self):
            return _query(
               [f"Please verify whether {condition}."],
               "Is this right?",
               "Y", "N"))
      return impl

   def print_test_results(self):
      if(self._test_result):
      print("Verification failed. Please fix the process and try again.")
      for failure in self._test_result:
         print(failure)

   def _print_stats(self):
      self._automated.stop()
      total_time = self._automated.elapsed + self._manual.elapsed
      print(f"Process complete in {total_time}.")
      print(f"   Automated: {self._automated_steps} steps in {self._automated.elapsed}.")
      print(f"   Manual: {self._manual_steps} steps in {self._manual.elapsed}.")

Example script you might write
from utils import Process, that

def perform(go):
   go.tell("Please do such and such.")
   go.tell("Please do something else for me.")
   go.ask("Are the lights on?", lambda: go.tell("Please turn off the circuit breaker."))
   go.tell("Please do one more thing.")

def verify(go):
   go.tell("Please do something.")
   go.verify(go.that("the thing is colored blue"))
   go.print_test_result()

if(__name__ == "__main__"):
   Process.run(perform, verify)

The fourth process execution

Do this exactly like the third time. At this point there should be some process steps that are complete and work every time. Keep iterating until you have at least one known-correct step.

From here on you might merge from processes/whatever-we-are-automating/main/ to main/. Do that any time you have a known-good (though incompletely automated) process.

Identify and Automate Working Sub-steps

Now we will automate the known-good sub-steps. These might be actions, conditions, or verifications.

Automate 1-3 sub-steps each time you perform the process. Set an appropriate timebox and automate what you can. Only automate known-correct sub-steps.

To automate one action

  1. Extract the statement that calls tell() to a new method. Name it based on the command you make to the Actor.
  2. Check in.
  3. Call the new method from within a do() call so you can track step executions.
  4. Replace the tell() with print().
  5. Follow that with code to perform the action.
    1. If you can’t see how to automate the whole step, automate what you can and add more narrow tell() calls to do the parts you can’t yet automate.
  6. Finish the method with with ask("did I do it right?, sys.exit)
  7. Check in.
  8. Run it. Debug as needed. Check in.
    1. Make repeated runs easier by temporarily modifying the call to run(). Replace the perform function with a call to your step only.
  9. Remove the final ask("did I do it right?, sys.exit).
  10. Check in.

To automate one condition

  1. Replace the call to go.ask("question string", operation) with go.if(go.ask_yes_no("question string"), operation).
  2. Extract the expression that calls ask_yes_no() to its own method.
  3. Check in.
  4. Convert the call to ask_yes_no() to a print().
  5. Follow that with code to check the condition.
    1. If you can’t see how to automate the whole step, automate what you can and add more narrow ask() calls to do the parts you can’t yet automate.
  6. Check in.
  7. Run it. Debug as needed. Check in.
    1. Make repeated runs easier by temporarily modifying the call to run(). Replace the perform function with a call to your step only.

To automate one verification

  1. Extract the expression that calls that() into a new method.
  2. Have your new method define a no-op method def impl(): pass.
  3. Check in.
  4. Insert a new print() statement as the first line of impl, which prints “Please verify whether {condition}.” for whatever condition was in the that statement.
  5. Follow that with code to perform the verification.
    1. If you can’t see how to automate the whole step, automate what you can and add more narrow verify(that()) calls to do the parts you can’t yet automate.
  6. Check in.
  7. Run it. Debug as needed. Check in.
    1. Make repeated runs easier by temporarily modifying the call to run(). Replace the verify function with a call to your verification step only.

Automate Everything

You will automate more steps with every iteration of the process. You will naturally start with steps that are easy to automate or particularly annoying to do manually. Eventually you will run out of these steps. We will handle the rest in one of two ways.

If remaining steps are too hard

  1. Demonstrate the value so far to your product owner and manager.
  2. Define what infrastructural work would make part of the remaining process easy enough to automation. Present several options if possible.
  3. Ask them to either prioritize and fund you doing that work as a story, or find another group in the organization to do it (often an ops or tools team).
  4. Create a story to track the infrastructural work so you know when it completes.
  5. As a new last step, track and report costs.
    1. Add a manual step to record the process execution and 2 times spent into a spreadsheet.
    2. Add a manual step to report the manual time spend to your manager and product owner – or to anyone they designate.
  6. Periodically review the time spent on known-but-not-automated manual processes. Use your spreadsheet to find common patterns with other teams.
    1. Assess whether the time spent justifies organization-wide infrastructure investment. Some will and some won’t.

If remaining steps aren’t annoying enough to be worth automation

First make sure they aren’t worth it.

  1. Measure how long it takes to perform the remaining manual process and estimate how often you perform that.
  2. Estimate how long it would take to automate all remaining steps.
  3. Compute how long it would take to break even on the automation cost.
  4. If this time is short enough, present the figures to your product owner and schedule the effort accordingly.

If they still aren’t worth automating, then simply clean up everything as if you had completed the automation. As you automate other processes, you may find yourself wishing to execute this process as part of a larger orchestration. That can change the value of fully automating the process. However, until that happens, keep your partially-automated and fully consistent process. That may be the optimal result for your circumstance.

Clean up

Clean up anything which really didn’t work. Make your history easy to navigate.

  1. Delete any branches that turned out to be wrong directions and didn’t include useful learning / insights.
  2. Gather the branches from which you learned what not to do and move them somewhere appropriate. For example, you might have a branch prefix processes/bad-ideas-that-seemed-good/.
  3. Merge the winning branch into main if there is anything left unmerged.
  4. Schedule a final demo of the completely automated process.