Systemtap for fun and profit

Kjartan Maraas pointed me at this Fedora bug yesterday — it points out that /proc/$pid/maps has been broken in Rawhide for a month. The patch that made linux/fs/proc/task_mmu.c (which is where map requests are handled) diverge from mainline is this one. I can’t read more about its motivation since the Bugzilla ID is security blocked.

So, where to start? mm_for_maps() is a new function that does a bunch of checks on the relationship between task and current before deciding whether to allow the request; I threw some printk()s in to find out which were failing, and found that we take the !__ptrace_may_attach(task)) path to the out label in the code below:

if (task->mm != mm)
    goto out;
if (task->mm != current->mm && !__ptrace_may_attach(task))
    goto out;

From there, it got hazy. __ptrace_may_attach() returns whatever security_ptrace() does. This takes us into pluggable LSMs land — any LSM that gave a struct security_operations with a pointer to a ptrace function will have a shot at returning an error that would be sent back to security_ptrace() to stop our request from completing.

But how do I tell which LSM is complaining, or even which LSMs are loaded? After all, they’re registered at runtime. Enter systemtap, as wisely suggested to me by Bill Nottingham (whom I now surely owe beer to). Systemtap is similar to the Solaris dtrace; it’ll let you instrument and track kernel functions and system calls for a running kernel. It was installed on my Rawhide machine by default, which is always a nice touch.

So, how to see which ptrace functions were registered? Enter my first systemtap probe:

unity:cjb~ % cat list-ptrace.stp
probe kernel.function("*ptrace*") {
    printf("%sn", probefunc())
unity:cjb~ % sudo stap list-ptrace.stp

At this point, our stp script is converted into C code and compiled into a kernel module, before being loaded into the running kernel. After running a cat /proc/<pid>/maps in another terminal, we see:


.. which suggests that cap_ptrace was called by __ptrace_may_attach, and that’s where our __ptrace_may_attach might be being turned down. To be sure that we got to cap_ptrace via __ptrace_may_attach, we can ask for a backtrace:

unity:cjb~ % cat cap-backtrace.stp
probe kernel.function("cap_ptrace") {
    printf("%s -> %sn", probefunc(), print_backtrace())
unity:cjb~ % sudo stap cap-backtrace.stp
cap_ptrace ->
trace for 6345 (cat)
 0xc04c2286 : cap_ptrace+0x7/0x49 []
 0xc042a600 : __ptrace_may_attach+0xac/0xae []
 0xc049350c : mm_for_maps+0x83/0xd8 []
 0xc0492892 : m_start+0x28/0x11d []
 0xc04800d9 : seq_read+0xdb/0x268 []
 0xc0446288 : audit_syscall_entry+0x104/0x12b []
 0xc047fffe : seq_read+0x0/0x268 []
 0xc04648e2 : vfs_read+0x9f/0x13e []
 0xc0464d2e : sys_read+0x3c/0x63 []
 0xc0403d07 : syscall_call+0x7/0xb []

We’re pretty sure that cap_ptrace was responsible. Hunting through its source, we see that it has a path to return -EPERM, which would do it. So, we recompile the kernel in order to have cap_ptrace tell us what return value it’s going to use, right? Well, no. Straight back to systemtap:

unity:cjb~ % cat return-codes.stp
probe kernel.function("*ptrace*").return {
    printf("%s -> ", probefunc())

The .return after the function pattern tells systemtap to trigger when the function is returning, and returnstr(1) asks for the return value as a decimal. There’s also print_regs(), if you prefer to see what’s in EAX directly. Over to the other terminal to cat a maps file again, and:

unity:cjb~ % sudo stap return-codes.stp
cap_ptrace -> 0
__ptrace_may_attach -> 0

That’s odd. cap_ptrace is returning 0, which we can see in its code is meant to mean success, and __ptrace_may_attach is receiving it back unharmed. Cue an “ah-hah!” moment as we realise that the conditional:

if (task->mm != current->mm && !__ptrace_may_attach(task))
    goto out;

.. has the wrong polarity; each of the functions that __ptrace_may_attach backs onto return zero for “success” (permission to attach), but the logic above is “if we’re not trying to get the map of the current process, and __ptrace_may_attach isn’t non-zero, we should fail”. The exclamation mark needs to disappear.

And so we’re done. My uses of systemtap weren’t nearly as complex as those in the tutorial, but I’m happy that I saved myself the kernel compiles. I’d somehow managed to miss any hype around systemtap; if you’re another systemtap user, please consider blogging your code!


  1. I’m actually disappointed that so early in the SystemTap programming language’s life we already have inconsistencies in naming like “print_backtrace” and “returnstr” (the use of underscores).

    Reminds me of PHP instead of Python.

  2. I’m not a proffesional hacker and unfamiliar with the linux code but by the looks of “something_may_verb” you expect it to say true if you may do so, right?

    Names like cap_ptrace or security_ptrace don’t really tell me what they return so it’s okay to lookup their contract before interpreting their return value, but why didn’t you fix __ptrace_may_attach by changing it’s name or by actually returning true in this case?

  3. Hi, Bram. It’s a good point, and the answer is that `__ptrace_may_attach` is used on more occasions than this one, with the semantics of zero=success. I think there should be a change, but I’m not sure where it should be; perhaps the LSM modules should be changed instead.

  4. Scott: agreed, we need to clean up such inconsistencies in the subroutine library. It has been written somewhat haphazardly, and we’ll clean up these style bugs. In fact, since just about all these routines are included in systemtap in source form (in a “tapset” directory), a more aesthetically conscious user can suggest/test/use and even submit improved names for them.

    It may provide additional consolatio to know that many of Chris’ scripts could have been written more compactly. For example:

    unity:cjb~ % cat cap-backtrace.stp
    probe kernel.function(“cap_ptrace”) {
    printf(“%s”, probefunc())
    print_backtrace() # more correct

    unity:cjb~ % cat return-codes.stp
    probe kernel.function(“*ptrace*”).return {
    printf(“%s -> %d”, probefunc(), retval())


Leave a Reply

Your email address will not be published. Required fields are marked *