Kjartan Maraas pointed me at this Fedora bug yesterday — it points out that /proc/$pid/maps
has been broken in Rawhide for a month. The patch that made linux/fs/proc/task_mmu.c
(which is where map requests are handled) diverge from mainline is this one. I can’t read more about its motivation since the Bugzilla ID is security blocked.
So, where to start? mm_for_maps()
is a new function that does a bunch of checks on the relationship between task
and current
before deciding whether to allow the request; I threw some printk()
s in to find out which were failing, and found that we take the !__ptrace_may_attach(task))
path to the out
label in the code below:
if (task->mm != mm)
goto out;
if (task->mm != current->mm && !__ptrace_may_attach(task))
goto out;
From there, it got hazy. __ptrace_may_attach()
returns whatever security_ptrace()
does. This takes us into pluggable LSMs land — any LSM that gave a struct security_operations
with a pointer to a ptrace function will have a shot at returning an error that would be sent back to security_ptrace()
to stop our request from completing.
But how do I tell which LSM is complaining, or even which LSMs are loaded? After all, they’re registered at runtime. Enter systemtap, as wisely suggested to me by Bill Nottingham (whom I now surely owe beer to). Systemtap is similar to the Solaris dtrace; it’ll let you instrument and track kernel functions and system calls for a running kernel. It was installed on my Rawhide machine by default, which is always a nice touch.
So, how to see which ptrace functions were registered? Enter my first systemtap probe:
unity:cjb~ % cat list-ptrace.stp
probe kernel.function("*ptrace*") {
printf("%sn", probefunc())
}
unity:cjb~ % sudo stap list-ptrace.stp
At this point, our stp script is converted into C code and compiled into a kernel module, before being loaded into the running kernel. After running a cat /proc/<pid>/maps
in another terminal, we see:
__ptrace_may_attach
cap_ptrace
.. which suggests that cap_ptrace
was called by __ptrace_may_attach
, and that’s where our __ptrace_may_attach
might be being turned down. To be sure that we got to cap_ptrace
via __ptrace_may_attach
, we can ask for a backtrace:
unity:cjb~ % cat cap-backtrace.stp
probe kernel.function("cap_ptrace") {
printf("%s -> %sn", probefunc(), print_backtrace())
}
unity:cjb~ % sudo stap cap-backtrace.stp
cap_ptrace ->
trace for 6345 (cat)
0xc04c2286 : cap_ptrace+0x7/0x49 []
0xc042a600 : __ptrace_may_attach+0xac/0xae []
0xc049350c : mm_for_maps+0x83/0xd8 []
0xc0492892 : m_start+0x28/0x11d []
0xc04800d9 : seq_read+0xdb/0x268 []
0xc0446288 : audit_syscall_entry+0x104/0x12b []
0xc047fffe : seq_read+0x0/0x268 []
0xc04648e2 : vfs_read+0x9f/0x13e []
0xc0464d2e : sys_read+0x3c/0x63 []
0xc0403d07 : syscall_call+0x7/0xb []
We’re pretty sure that cap_ptrace
was responsible. Hunting through its source, we see that it has a path to return -EPERM
, which would do it. So, we recompile the kernel in order to have cap_ptrace
tell us what return value it’s going to use, right? Well, no. Straight back to systemtap:
unity:cjb~ % cat return-codes.stp
probe kernel.function("*ptrace*").return {
printf("%s -> ", probefunc())
log(returnstr(1));
}
The .return
after the function pattern tells systemtap to trigger when the function is returning, and returnstr(1)
asks for the return value as a decimal. There’s also print_regs()
, if you prefer to see what’s in EAX directly. Over to the other terminal to cat a maps
file again, and:
unity:cjb~ % sudo stap return-codes.stp
cap_ptrace -> 0
__ptrace_may_attach -> 0
That’s odd. cap_ptrace
is returning 0, which we can see in its code is meant to mean success, and __ptrace_may_attach
is receiving it back unharmed. Cue an “ah-hah!” moment as we realise that the conditional:
if (task->mm != current->mm && !__ptrace_may_attach(task))
goto out;
.. has the wrong polarity; each of the functions that __ptrace_may_attach
backs onto return zero for “success” (permission to attach), but the logic above is “if we’re not trying to get the map of the current process, and __ptrace_may_attach
isn’t non-zero, we should fail”. The exclamation mark needs to disappear.
And so we’re done. My uses of systemtap weren’t nearly as complex as those in the tutorial, but I’m happy that I saved myself the kernel compiles. I’d somehow managed to miss any hype around systemtap; if you’re another systemtap user, please consider blogging your code!