Scheduled Job Anti-Patterns - Workflow Orchestration with Cron

Tags: programming devops

Scheduled jobs tend to suck a bit. They’re usually written after they’re needed and dropped into place with little testing and no plans for fixing them when things go pear-shaped.

This is the third in the series. Here’s the full list that we’ll cover:

Workflow Orchestration with Cron

I’m going to show you a small file built primarily for generating support tickets. There’s a crontab file on appserver01 that looks like this:

05 0 * * 1-5 /home/vrichards/salesdb/make_the_ach_file.pl 31 > /dev/null 2>&1
25 0 * * 1-5 /home/vrichards/salesdb/edit_the_ach_file.pl 31 > /dev/null 2>&1
27 0 * * 1-5 /home/vrichards/salesdb/copy_the_ach_file.pl 31 > /dev/null 2>&1

Sad yet? No? What if I told you there were 30 other sets of such jobs? Still not sad? What about when you log in to appserver02 and find:

30 0 * * 1-5 /home/vrichards/salesdb/send_the_ach_file.pl 31 > /dev/null 2>&1

Gah!!! On the surface these crontab files are built to generate a file and transmit it to a bank, but each of the following made me die a little bit on the inside:

  • Scripts running out of a developer’s home directory
  • Magic number parameters (“Tim!”, you yell, “the command has a help function and it’s up to date. That parameter is obvious and self documenting!” You’re a dirty liar, and I hate you)
  • Redirecting all output to /dev/null
  • Running one process across multiple hosts without justification when all steps could run on one host

Gross. All of it. We aren’t even here to talk about any of the above, but you should get in touch if you’d like to know why any of the above are terrible ideas.

We’re here to talk about trying to orchestrate a complex work flow with individually scheduled cron jobs. We are not here to talk about orchestrating a complex work flow with individually scheduled cron jobs, because that’s not really possible in any realistic way. We are here to discuss trying (and failing) to do the same.

The road to Hell is paved with good intentions

Our completely fictionalized developer, Victor Richards, means well. He really wants to be helpful and write things in a way that makes problem solving easy. Much like breaking up a monster function that spans pages of editor space into multiple easily composed functions, he’s broken up the creation and transmission of ACH files into discrete steps. That’s legitimately thoughtful. It allows the ops team to retransmit a file if the bank’s FTP server was down without regenerating it from scratch.

He’s built all the individual components required to make a fairly robust system that will churn out bank files each business day, but gluing them together with cron is a recipe for pain and an active support queue.

Turning up the heat

Victor’s good intentions turn Hellish toward the end of the month. Website traffic and orders go through the roof. It’s not unusual to double typical order volume at month end, and quarter and year end gets even crazier.

That first job in the chain, make_the_ach_file.pl, gets a full 20 minutes to run in the cron schedule. It’s a pretty beastly script that pulls all of yesterday’s orders out of the database to generate a bank file, and the queries used aren’t optimized all that well. The 20 minute window usually provides a 5 minute cushion before the next job, edit_the_ach_file.pl starts to run, at least until the calendar creeps up to the end of a month. The larger month end order volume means that the file generation either just barely completes in time or is late, and the multi-car pile up begins.

Donate Your Car

Traffic Control

We have three jobs that are tightly related, with each subsequent job being completely dependent on the job that runs before it. What we need is some way to ensure that the jobs run serially instead of achieving accidental parallelism when one of the jobs runs longer than expected. That turns out to be so easy that you’ll be forgiven for missing the obvious. It just takes one script:

#!/usr/bin/env bash
customer_id=$1
/home/vrichards/salesdb/make_the_ach_file.pl $customer_id
/home/vrichards/salesdb/edit_the_ach_file.pl $customer_id
/home/vrichards/salesdb/copy_the_ach_file.pl $customer_id

Then our crontab file on appserver01 looks like:

05 0 * * 1-5 /home/vrichards/salesdb/ach_file_suite.sh 31 > /dev/null 2>&1

As a bonus, we still have the individual scripts available for use during troubleshooting exercises should they be necessary.

This is still the poor man’s form of process orchestration, but we’ve eliminated an entire class of support issues by ensuring that all the individual jobs run in the correct sequence, regardless of how long any of those individual jobs takes to complete.

But wait! There’s more!

This is third in the series. Subscribe or come back tomorrow to see more.