The shell scripting language is a difficult, idiosyncratic and occasionally obscure language, but if we use it, there is no reasons to drop the methdologies and wisdoms software developers learnt over half a century of software development. We therefore cannot afford to overlook automated testing for our shell scripts and our shell functions, so that we confidently can use them, update them, and reason about them. Let us examine together how we can start automated testing of shell scripts with just a small effort.

A Testing Appetizer

Let us remember together the practical purpose of testing software. Tests give practiontioners confidence that our software functions properly, modifiy the state of a system in a transactional way, that errors are properly handled, that old bugs do not resurface, that refactoring do not alter the semantics of modified components, etc. Taking a step back, tests for software help us to compare the actual behaviour of a piece of software with its expected behaviour. When programmers need to understand the unexpected behaviour of a piece of software they need to reconciliate two incompatible assertions:

First, they believe that this piece of software functions correctly.
Second, they see evidences that this piece of software does not function correctly.

To reconciliate these two assertions, there is nothing more useful, systematic and consistently rewarding to take a scientific approach. This means here revisiting the system of beliefs supporting our faulty belief by comparing methodically these beliefs and the evidences speaking for them until we understand where our beliefs and the evidences diverge to extend our test suite, fix our software and reconciliate beliefs and evidences. In this scenario, the test suite of an applications plays a similar role to the experimental protocol used by physicists and chemists. The test suite produces a reproducible set of assertions about the behaviour of a piece of software which are anchor points where our beliefs and evidences are sewed together.

Unluckily, it is usually impossible to automatically prove the correctness of a piece of software. Even the simplest question we can ask about a piece of software, “Does it eventually stop?”, cannot be examined by algorithms, a theorem which we attribute to Alan Turing. There is two ways to accomodate with this tragedy: either we reduce drastically the expressivity of the software we consider, or we accept to be only weakly assertive when testing our software. Both directions have been extensively explored but only the second seems practical when writing software with weak and evolving requirements. (Of course the weakness here is to be considered relatively to the amount of specification needed to describe completely a microprocessor or a compiler for a language such as C.)

The Dependencies of our Testing Suite

Now that we want to write tests for our software written using the shell scripting language, come the question of the tools we agree to use for this task. Generally speaking advanced languages such as OCaml or Common Lisp could be called here to the rescue so that we can build a test laboratory full of useful features. In this article we however focus on techniques and methodologies usable with the shell language itself, so that the test suite and the tools surrounnding it are written with the shell language itself. The reason for this choice is that a key aspect of the shell language is its wide availability. The programs we write using the shell language are often expected to be ready to use just after they have been copied to a UNIX system and we show here how to write tests using nothing more than that.

The Main Testing Loop

Running tests is essentially a list of functions testing the software. These functions should return 0 (success) when the test is successful, 1 (failure) when the test is failing or something else if the test fails spectacularly. There is of course many other desirable features to support automated testing of shell scripts, such as the list of failed tests useful presentation of technical details in failure. The minuscule run_tests function below is however enough to get started with automated testing of shell scripts:

# run_tests [ TEST-1 [ TEST-2 [ … ]]]
#  Execute provided tests, displaying outcomes and returning overall success.
#
# Possible Improvements:
#  - Display a list of failed tests.
#  - Display useful details about failed tests, such as exit code,
#    standard output, standard error.
#  - Run tests in a shuffled order.
#  - Run tests in parallel.

run_tests()
{
  local overall_success current_test

  overall_success=0

  for current_test in "$@"; do
    if ( "${current_test}" ); then
      : do nothing
    else
      overall_success=1
    fi
  done

  return "${overall_success}"
}

An example function to test

Our example code is a function that pretends to download an archive file from AWS S3 using AWS CLI.

# download_backup_from_s3 BACKUP
#  Download backup from S3 and save it to /opt/myproject/backup/

download_backup_from_s3()
(
  local s3pathname backupdir

  s3pathname="$(configuration_s3_backups)"
  backupdir="$(configuration_backupdir)"

  if ! test -d "${backupdir}"; then
    croak 'Cannot find backup directory %s\n' "${backupdir}"
  fi

  if ! test -w "${backupdir}"; then
    croak 'Cannot write to backup directory %s\n' "${backupdir}"
  fi

  aws s3 cp "${s3pathname}/$1" "${backupdir}/$1"
)

Configuration is defined using a functional style, as it is easy for consumer code to modify.

configuration_s3_backups()
{
  printf 's3://myproject-backups-live'
}

configuration_backupdir()
{
  printf '/opt/myproject/backup/'
}

Error handling is simplistic and rely on the croak function below, whose name is inspired by ancient Perl chants:

# croak PRINTF-LIKE-ARGV
#  Write PRINTF-LIKE-ARGV on stderr as printf and exit with error 1
#
# The output is decorated with a 'Failure: ' and terminated with a
# newline.

croak()
{
  {
    printf 'Failure: '
    printf "$@"
    printf '\n'
  } 1>&2 
  exit 1
}

Observable behaviour of functions

Shell functions essentially work like UNIX processes, so when we compare the expected outcome to the actual outcome we want to look at the standard output, standard error and exit status, as well as modified parts of the file-system.

Examining the exit status code

The exit status code of the last executed command is assigned to the special variable $?. A value of 0 indicates success and other values mean an error. Note the difference with languages inspired by C where 0 is understood as a false condition while other values mean a true condition. When working with the $? special variable, it is important to remember that testing the value of $? will overwrite it. If this value needs to be examined several time, it is therefore important to store it in a safe place, such as a variable or a file.

Examining the standard output and the standard error

A convention shared by many UNIX programs is to write their results on the standard output and their diagnostic messages on the standard error. Automatic tsting of shell functions therefore requires us to examine what a process writes on these two file descriptors.

It is easy to redirect these through grep, sed or awk to do so. A more solid approach would however provide higher level functions that save what the process writes on these file descriptors in temporary files that can be examined later.

Examining files created or removed

Creating and deleting files is also a common activity for UNIX processes and shell functions and the commands test and stat can be used to examine the state of the filesystem. When it comes to the contents of the file, many external programs can be used.

Functions or processes reading and writing files at pathnames which are hardwired has its own challenges related to permissions and concurrency. Functions and processes reading and writing files at configurable pathnames are much easier to test.

Writing Tests

Creating a laboratory environment to run tests in safe and reproducible conditions is an important part of testing. Fortunately it is very easy to do so for shell scripts.

The configuration of the test laboratory creates a temporary directory for backupdir. This folder is automatically reclaimed when test test function terminates.

configure_laboratory()
{
  __configuration_backupdir=$(mktemp -d)
  trap "rm -rf ${__configuration_backupdir:?}" EXIT TERM KILL

  configuration_backupdir()
  {
    printf '%s' "${__configuration_backupdir}"
  }
}

A first laboratory condition we want to simulate is the lack of AWS Credentials. To do so, we define an aws function which shadows the aws CLI and mimics its behaviour when credentials are not available:

assume_aws_credentials_are_not_configured()
{
  aws()
  (
    1>&2 cat <<EOF
Unable to locate credentials. You can configure credentials by running "aws configure".
EOF
    exit 255
  )
}

A second laboratory condition we want to similate is the ability to download the backup file. We also shadow the aws CLI with a function that creates an empty file, a good enough approximation of our backup.

assume_backup_can_be_downloaded()
{
  aws()
  (
    case "$1__$2__$4" in
      s3__cp__*.gz)
        touch "$4"
        ;;
      *)
        exit 1
        ;;
    esac
  )
}

With this laboratory material available, we can write tests that ensure that the function fails when credentials are not configure or that the function succeeds when the file can be downloaded.

ensure_that_download_backup_fails_when_credentials_are_not_configured()
(
  configure_laboratory
  assume_aws_credentials_are_not_configured
  
  download_backup_from_s3 "latest.gz"

  test $? -gt 0
)

ensure_that_download_backup_succeeds_when_file_can_be_downloaded()
(
  configure_laboratory
  assume_backup_can_be_downloaded
  
  download_backup_from_s3 "latest.gz"

  test $? -eq 0
)

testsuite_download_backup()
{
  run_tests\
    ensure_that_download_backup_fails_when_credentials_are_not_configured\
    ensure_that_download_backup_succeeds_when_file_can_be_downloaded
}

Now we can run our testsuite

testsuite_download_backup

and examine the exit status code to validate our function. It takes a little bit of engineering to separate code from tests and make the laboratory functions reusable. Testing shell functions is at its core really simple and can be started with a very small effort.

Exercices

What do parenthesis instead of braces mean around the body of a function? From which ressources does it help to control the scope?

Write a record and examine functions that makes it easy to examine the output of a shell function as well as its exit code. These functions could be used as

record 'Download Backup' download_backup_from_s3 "latest.gz"
examine 'Download Backup' stdout | grep 'Ok'
examine 'Download Backup' stderr | wc

Extend the examine function from exercise 2 to support common queries, such as empty file, file containing a string, or positive exit code.
A drawback of the configure_laboratory function is that it installs a trap and require collaboration from tested code, to not overwrite this trap. Find a better alternative. (Think about the WITH-* idiom from Common Lisp.)
Write a test ensuring that download_backup_from_s3 retries once to download on network failure.
Describe how to use chroot or docker to test software creating files at unconfigurable locations without suffering from limitations related to permissions and concurrency.
Rewrite the test assume_backup_can_be_downloaded so that it does not interact with the file system. (Shadow the test command and the aws command.)
Improve the run_tests function so that it displays a list of failed tests.
Improve the run_tests function so that it displays details about failed tests, such as standard output, standard error, the trace of commands (as with set -x).

Shell Language Primer

Improve the reliability of shell scripts

Table of Contents