Scheduled Job Anti-Patterns - Nothing Will Break, so Why Worry

Tags: programming devops

Scheduled jobs tend to suck a bit. They’re usually written after they’re needed and dropped into place with little testing and no plans for fixing them when things go pear-shaped.

This is the second in the series. Here’s the full list that we’ll cover:

Nothing Will Break, so Why Worry?

If you’ve spent any time at all running unix systems, you’ve seen the following pattern:

26 * * * * /usr/local/bin/fixStaleJobStatus.pl 2>&1 > /dev/null

Regardless of the return code or output of that script, you’ll never see a peep from it. All the output is dumped squarely into the bitbucket.

How do jobs end up configured like this? They’re usually written, tested, and watched closely for some period of time: a week, a month, or more. Once the author is convinced that the job is invincible, she gets tired of seeing the output in her inbox each day, so she adds the redirection to make things easy for herself.

Only marginally better are the admins that set up an elaborate maze of mail rules to filter any and all output from scheduled jobs into some rarely read folder. Sure, the output is there, but it may as well not be there if it’s getting ignored completely. Yes, it’s nice to be able to confirm the user error reports that will start to trickle in, sometimes weeks after the problem started, but it’s not exactly the professional way to handle things.

I suppose one step beyond that is the admin that rolls into the office at 10:45, opens Outlook, hits CTRL+A then Delete, and then gets up to retrieve his morning cup of joe. Cron output problems AND user problems solved all in one quick key chord. Shameful, but that dude is so far off the reservation that I’m sure he’s not here among us. Right?

So how do we dodge this?

  1. Find a cron wrapper script that only notifies the team on errors. Cronic makes this super easy to pull off. It’s a shell script, so it should run just about anywhere that matters without compiling anything.
  2. Edit jobs to write their results to a centralized location for intelligent monitoring and notifications. This has its merits, but you’re going to need some additional infrastructure and development time. A big enough shop will justify this. Email can only scale so far.
  3. Wait for the users to call. They know when the important stuff isn’t happening. If a job fails in the server room and no one complains, does it really need to be fixed?

The astute among you will notice that we’re reusing the same list from our last installment. Two serious suggestions and one suggestion that is unfortunately in use in far too many places. These two anti-patterns really are identical twins that were separated at birth. Some might even combine the two into one, but let’s consider them both separately lest we over-correct from one straight into the other.

But wait! There’s more!

This is the second in the series. Subscribe or come back tomorrow to see more.