Starting Bash and Shell Scripting by learning Errors

Benjamin
5 min readMar 26, 2022

Everyone knows it, recurring tasks and tasks that we type into our shell over and over again. So bit by bit, you start turning those same commands over and over into shell scripts. No matter if it’s automating your application for Docker, installation scripts or simple maintenance tasks. Piece by piece we extend everything with scripts and keep trying more complex things.

Everything is working fine.

And then, one day, the shell script does something totally wrong.

That’s when you realize your mistake: Bash, and shell scripting languages in general, are buggy by default. If you’re not very, and I emphasize very, very, careful from day one, any shell script above a certain complexity is almost guaranteed to be buggy…. and retrofitting error checking is quite difficult and costly, so it’s hardly worth investing hours in shell scripting.

Problem #1: Errors don’t stop execution

What do you think will happen when we run it?

Solve this by adding set -e to the top of the shell script.

Okay great, we fixed it! And then…

Problem #2: Unknown variables cause no errors

It can’t find ls because we had a typo, writing $PTH instead of $PATH—and bash didn’t complain about unknown variables.

Let's fix it and setset -u to the top…

Problem #3: Pipes don’t catch errors

Okay, now we got it and we are safe just by adding some option flags. Maybe, but we didn’t solve all cases….

But no problem, the solution is set -o pipefail

Now, we’ve implemented most of the unofficial bash strict mode. But that’s still not enough.

Problem #4: Subshells are weird

If you use the $() syntax, then we start a subshell. And then when we run our code, all of a sudden we don’t get any error messages. What’s going on?

Errors in subshells aren’t treated as an error if they’re part of a command’s arguments. That means that subshell’s error just gets thrown away.

The one exception is setting a variable directly, so we need to write our code like this:

And now please don’t think that you have already eliminated all errors with these few tricks.

Some bad reasons to use shell scripts
What are some reasons to use shell scripts anyway?

Bad reason #1: It’s always there!
Just about every Unix computing environment has a simple shell. So when you’re writing packaging or startup scripts, it’s tempting to use a tool that you know is always there.

When you package a Python application, you can be almost certain that the development environment, CI, and runtime environment all have Python installed. So why not use a programming language that handles bugs by default?

More generally, almost every programming language with a reasonably large user base has some sort of scripted library of idioms. Rust, for example, also has xshell and other libraries. So in most cases, you can use the programming language of your choice instead of a shell script.

Bad reason #2: Just write correct code!

In theory, you can write correct shell scripts, even quite complex ones, if you know what you are doing and if you concentrate and don’t forget any of the boilerplates. You can even write unit tests.

In practice:

You probably won’t work alone; it’s unlikely that everyone on your team has the expertise.
Everyone gets tired, distracted, and makes mistakes in other ways.
In almost every complex shell script I’ve seen, the call to set -euo pipefail was missing, and adding it afterward is quite difficult (usually impossible).
I don’t know if I’ve ever seen an automated test for a shell script. I’m sure they exist, but they’re pretty rare.

Bad reason #3: Shellcheck will find all these errors!
If you write shell programs, shellcheck is a very useful way to find bugs. Unfortunately, that alone is not enough.

How does shellcheck do? It will catch some of the problems… but not all:

  1. If you run shellcheck, it will point out the issue with the export.
  2. If you run shellcheck -o all, so it runs all checks, it will also point out the problem with echo "$(nonexistentprogram ...)". That is, assuming you are using v0.8, which was released in November 2021. Older versions didn’t have this check, so any Linux distribution predating that will give you a shellcheck that doesn’t catch that problem.
  3. It doesn’t suggest set -euo pipefail.

If you’re relying on shellcheck I strongly recommend upgrading and making sure you run with -o all.

Stop creating shell scripts
Shell scripts are fine under certain circumstances:

  1. For one-off scripts that you physically regulate, you can pull off lax samples. In some cases, you really have no assurance that another programming language is available, and you want to use the shell to get things rolling.
  2. For sufficiently simple cases, it’s enough to just run a few commands in sequence, without subshells, restrictive reasons, or circles, set -euo pipefail (and make sure you use shellcheck — o all)
  3. If you end up doing anything beyond that, you’re much better off using a less error-prone programming language. Also, since most programming languages generally evolve over time, your wisest option is to start with something less broken

--

--