Increasingly we run our Perl programs inside docker containers, because there are advantages in terms of isolation and deployment. Containers provide applications with an idealized view of the OS - they see their own filesystem, their own networking stack, and their own set of processes.

Running the application as PID 1 inside the container comes with well-documented challenges around child process zombie reaping, but we knew about that and understood it. However, it turns out there is a separate issue with signal handling, which we had not fully appreciated.

The problem: shutting down gracefully

Recently we moved one of our Perl daemon processes to run inside docker - this is a system which has a few dozen worker instances running, consuming jobs from a queue.

The problem was, it was taking a long time to deploy all of these instances - each one would take 10 seconds to shut down. On closer inspection, ‘docker stop’ was waiting for them to terminate gracefully, then after 10 seconds giving up and sending a kill signal.

We reproduced this with a one-liner:

$ docker run debian perl -E 'sleep 300'
^C
[refuses to die]

(Of course, Ctrl+C sends SIGINT rather than SIGTERM, but sending SIGTERM manually had the same effect. ‘docker stop’ could shut it down, but only after the timeout and sending a SIGKILL.)

This confused us.

What’s going on: PID 1 signal handling

Adding a signal handler shows that the signal is actually received by the script:

$ docker run debian perl -E '$|=1; $SIG{TERM}=sub{say "Received SIGTERM"}; sleep 300 while 1'
[Send SIGTERM from other terminal]
Received SIGTERM

So although under normal circumstances an unhandled SIGTERM would mean the program shuts down, when running as PID 1 this is not true.

It turns out this behaviour is controlled by the kernel; this was the other half of the justification for Yelp’s dumb-init system, not just the zombies.

In other words, while normally the kernel would apply default behaviour if our process received a TERM or INT signal that it wasn’t handling, when running as PID 1 this is not applied.

Why this confused us: Golang is special

Why did it take us so long to notice this behaviour? We’ve been using docker for ages.

However, mostly we’ve been using it with statically-compiled Golang binaries. Golang handles TERM and INT signals itself, without relying on the kernel’s default behaviour. So those applications always shut down promptly when asked.

Other possible solutions

Other than adding signal handlers to all of our applications, we could instead use an init daemon such as dumb-init, or since Docker 1.13 you can pass an ‘–init’ flag to make docker run do something similar.

Another alternative would be to use a system other than docker - rkt runs container processes as PID 2 rather than PID 1, which is looking increasingly sensible given the special handling by the kernel.

For more on this problem and PID namespaces generally, try this Hacker Noon article.