Recently we discovered a crash in one of the modules we use. It was a simple little bug to fix and we thought it would be interesting to demonstrate how you can find more information about a crash in an XS module and potentially fix it. Just because it’s not all perl doesn’t mean it’s completely opaque.

Here at CV-Library we have really technical testers. Our testers take our code to pieces and try all sorts of evil inputs to see how they can break our code. I’m not sure that being able to torture code is an actual job requirement, but it certainly helps. While trying to do all sorts of strange things to a feature we’d written we happened to discover a bug in our cookie processing. The HTTP::XSCookies module was segfaulting with the cookies we were asking it to generate.

We first tracked down where in perl the code was crashing. This turned out to be when the bake_cookie function was called in the HTTP::XSCookies module. We grabbed the value that was being passed to it and created a test case to demonstrate the problem. Outside the test environment it still exhibited the same behaviour with the crash.

is bake_cookie( 'test', "!\"\x{a3}\$%^*(*^%\$\x{a3}\":1" ),
	'test=%21%22%a3%24%25%5e%2a%28%2a%5e%25%24%a3%22%3a1';

Initially we didn’t bother filling in the expected output on the test with anything valid since it crashed before it got to the comparison. With this input we initially observed a segfault (it should be noted that if you try this yourself you might not segfault*). Initially we used Devel::cst to get a simple stack trace which suggested the problem was within the URIencoding. Much like other Devel namespace modules you can just run perl with the -d flag, so perl -d:cst t/test-that-crashes.t.

The one thing missing from the output of that module was line numbers though. To get them we used gdb. Initially we just ran the test and let it crash. On Ubuntu this produces a file in /var/crash which contains lots of crash information. However, this needs to be converted to a traditional coredump which gdb can understand. We used apport-unpack to extract that file. This extracts a lot of files with information about the problem which can be sent to Ubuntu when you report a problem. One of those files is one named CoreDump containing what we want.

$ apport-unpack /var/crash/perl.crash crashinfo
$ cd crashinfo

With that we could run gdb to interpret the core dump. We can then ask it for a ‘back trace’ which gives us the full call stack in C at the time of the crash.

$ gdb perl CoreDump
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
...
Reading symbols from perl...(no debugging symbols found)...done.
[New LWP 2897]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `perl -d:cst test-cookie.pl'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f72280ef269 in raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
35	../sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
(gdb) bt
#0  0x00007f72280ef269 in raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#1  0x00007f7227601d28 in handler (signo=11, info=<optimised out>, context=<optimised out>) at lib/Devel/cst.xs:169
#2  <signal handler called>
#3  memcpy (__len=3, __src=0x42100e42180e4120, __dest=0x1121ccd) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
#4  url_encode (src=src@entry=0x7fffd17453e0, length=81, tgt=tgt@entry=0x7fffd17454d0) at uri.c:72
#5  0x00007f72271e63c1 in cookie_put_value (cookie=cookie@entry=0x7fffd17454d0, name=name@entry=0x11075d0 "azUser", nlen=<optimised out>, value=<optimised out>, vlen=<optimised out>, boolean=boolean@entry=0, encode=1) at cookie.c:43
#6  0x00007f72271e64ae in cookie_put_string (cookie=cookie@entry=0x7fffd17454d0, name=name@entry=0x11075d0 "azUser", nlen=<optimised out>, value=<optimised out>, vlen=<optimised out>, encode=encode@entry=1) at cookie.c:57
#7  0x00007f72271e5c2d in build_cookie (cookie=0x7fffd17454d0, pvalue=<optimised out>, pname=<optimised out>, my_perl=0x10ea010) at baker.xs:86
#8  XS_HTTP__XSCookies_bake_cookie (my_perl=0x10ea010, cv=<optimised out>) at baker.xs:218
#9  0x00000000004bd3ca in Perl_pp_entersub ()
#10 0x00000000004b6276 in Perl_runops_standard ()
#11 0x0000000000443c59 in perl_run ()
#12 0x000000000041cbbb in main ()

Working on the assumption that the bug is in the local code rather than the C standard library we can see where the bug is likely to be. Uri.c line 72 seems to be where the crash is occurring.

That gives us a good idea of where it’s crashing, and in truth we probably could figure out the problem from just looking at that and the source code, but it’s a lot easier to run it live. Running it live in gdb gives us better access to explore the state of the program when the problem occurs.

When we run the program it segfaults and pauses the program in the debugger at just the right moment to explore what is going on.

Note that in order to run the test we first simply load perl with gdb and then we specify the command line arguments later.

$ gdb perl
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
...
Reading symbols from perl...(no debugging symbols found)...done.
(gdb) set args t/20_cookie_baker_bake.t
(gdb) r
Starting program: /usr/bin/perl t/20_cookie_baker_bake.t
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Running val...
ok 1 - t100
Running HASH(0x817ed8)...
ok 2 - t101
...
ok 19 - t209
Running t210...
ok 20 - t210
# Testing url encoding

Program received signal SIGSEGV, Segmentation fault.
url_encode (src=src@entry=0x7fffffffda20, length=16, tgt=tgt@entry=0x7fffffffdb10) at uri.c:66
66	        memcpy(tgt->data + t, v, 3);

At this point it drops us into the program at the place the program has crashed. All the variables can be examined, we could even disassemble the program and look at the assembly under the hood if necessary.

We then did a little listing (l) to get our bearings, and then lots of prints (p) to examine the contents of the variables. We had the main source code up in another window to keep our bearings. It’s possible to view it in gdb, but it’s easier to just have it to one side.

(gdb) l
61	            tgt->data[t++] = src->data[s++];
62	            continue;
63	        }
64
65	        /* copy encoded character from our table */
66	        memcpy(tgt->data + t, v, 3);
67
68	        /* we used up 3 characters (%XY) in target
69	         * and 1 character from source */
70	        t += 3;
(gdb) p tgt->data
$1 = 0xd08c60 "test=%21%22"
(gdb) p t
$2 = 11
(gdb) p v
$3 = 0x42100e42180e4120 <error: Cannot access memory at address 0x42100e42180e4120>
(gdb) p src->data[s]
$5 = -93 '\243'
(gdb) p s
$6 = 2
(gdb) p/x src->data[s]
$7 = 0xa3

At this point looking at the code it started to look like the problem was with src->data[s] (the character being encoded) so we started to look at it in different ways with different casts. Once we looked at it with the same cast the code was using (int) it became clear that it was looking before the start of the array for the pointer to the string it wanted.

(gdb) p (unsigned int) src->data[s]
$8 = 4294967203
(gdb) p (int) src->data[s]
$9 = -93
(gdb) p (char) src->data[s]
$10 = -93 '\243'
(gdb) p (unsigned short) src->data[s]
$11 = 65443
(gdb) p (unsigned char) src->data[s]
$12 = 163 '\243'
(gdb) p/x (unsigned char) src->data[s]
$13 = 0xa3
(gdb) q

Having figured proven the correct cast would be (unsigned char) we tried that in the code, and sure enough that fixed the issue.

One pull request later and another fix by the author and a new version of the module was winging it’s way onto CPAN.

In this case while we loaded up the module in the live debugger we didn’t actually need to execute any instructions or jump about the code, but that’s generally possible when you have a more complex problem.

It should be noted that a perlbrew based environment makes all of this sort of problem solving a lot simpler. Assuming you’ve used that to build the perl and then a tool like cpanminus to install the modules then you will have all the source code and build artefacts necessary to make the C level debugging just work (well probably). That’s not to say that environments setup other ways won’t work; perlbrew just makes it highly likely that it will.

Hopefully that gives you a brief taste of using gdb to examine crashes and a few of the commands you might use. You can type “help cmd” to get more info about a command (and help more generally). A quick reference like this might also be handy - http://www.cs.umd.edu/class/spring2014/cmsc414-0201/downloads/gdb-refcard.pdf.

(*) When trying to reproduce the error, in order to document it for this blog post I was unable to reproduce the crash, only the bug. The out of bounds memory layout on the day in question must have been different and so it crashed more reliably. When it didn’t crash it must have been full of 0’s which the code could deal with without crashing. When it had values it thought they were valid pointers and tried to dereference them. Of course, it could have been worse if they turned out to be valid pointers to memory that wasn’t supposed to be exposed.