Filesystem notifications revisited

After writing my previous post about notifications (which is required reading for the rest of this one, I’m afraid), I was able to talk to Robert Love about the Yi Yang patch, and why he thinks it didn’t get a good response. He gave a few reasons:

  • It’s lossy; you lose events from boot time, from before the userspace daemon starts, and if the daemon crashes.
  • Requires root to listen on netlink, as opposed to the inotify_add_watch(2) syscall interface of inotify.
  • For this purpose poll(2) is good, netlink is bad.

All of which are reasonable complaints. If that’s the wrong solution, though, what would the right one look like? Thankfully, Robert has ideas on that too (and I hope I explain them correctly):


To avoid the causes of lossiness above, the log should be an on-disk log maintained by the filesystem. Having it done by each filesystem separately isn’t necessarily awful for maintainability; the ext3 journalling layer is supposedly generic. There should be sequence points, so the (single) userspace daemon reading the log knows that it’s up to date as of sequence n, and can tell the kernel to clean the parts of the log up to n.
The on-disk log would be fixed in size and circular — so still lossy so far — but Robert has an idea (which Tridge and Rusty Russell are apparently also partly responsible for) to make sure the lossiness doesn’t hurt so bad. Here it is.

You do event “compression”; the log is stored in a tree of path names and events. If you have change events for a couple of hundred files in /home/foo/{bar,baz,etc}, you mark all of /home/foo as dirty and throw away the events inside it. Userspace has to go off and stat(2)-dance inside /home/foo to find which files have changed, but at least you’ve traded precision for accuracy and come out with a log that enables every change to be noticed. You’d keep reparenting as you run out of room in the log, so if you exhausted log space recording changes in /home/foo and /home/bar, all of /home gets marked dirty.
This is still root-only so far, but you can build security on it.


So! This is a lot more work than Yang’s elegant netlink patch, but I’ve decided that ridding the world of updatedb is a worthy goal, and so I’ll be starting work on Robert’s design next week. I’ve booked days off work to go to LinuxWorld Boston, and an interested friend is visiting from the UK as well. A further idea is to get the userspace daemon to export inotify-compatible events, so that programs like Beagle can use this new mechanism without requiring a rewrite. I’m assured that apps like F-spot and Leaftag are waiting for this kind of event notification too.

(Note: Firefox hung as I was writing this post, taking the unfinished blog entry with it, with strace hanging on a futex. I blame the flash on the LinuxWorld site. But! Getting a core file with gdb’s gcore and running strings on it gave me a perfectly-formatted blog post back. Yay!)

Swap files vs. swap partitions

It took far too long to find this half-remembered linux-kernel thread on Google amidst all the results (mostly from distro installer guides) claiming that swap partitions are preferable to swap files. Here’s hoping the link below will help future searchers.

Swap files and swap partitions have the same performance:

In 2.6 [swap files and swap partitions] have the same reliability and they will have the same performance unless the swapfile is badly fragmented.

– Andrew Morton, on linux-kernel.

(There is a performance difference under Linux 2.4, though, as explained at the link above.)

Update: Tim comments that swsusp (the kernel software suspend-to-disk support) only works with swap partitions, so there is still one good reason to use a swap partition. Suspend2 doesn’t have this limitation.

Lightweight filesystem notifications

sweet rattle of disk:

a new locatedb comes;

I should go to sleep.


— me.

I’ve been thinking, over the last week or two, about the right way to handle an incremental updatedb — and in turn, the right way to handle generic filesystem notifications from the kernel. Fortunately, someone else has been thinking about it too and has actually been writing code, but we’ll get to that in a moment.

These thoughts started off as a linux-kernel thread, with Jon Masters wondering if inotify can be used for this. Alas no, for what appear to be many reasons:

  • inotify has no support for recursive watches; you’d have to put a watch on every directory on the system.
  • Even this wouldn’t work, because there’s a hard limit on the number of watches on a system (8192 per “device”, but inotify doesn’t use a device interface anymore, it uses syscalls now) and this limit is an order of magnitude smaller than the number of directories on my /.
  • There’s a race condition which would kill performance, meaning that you have to do a stat(2) dance over each directory to make sure you see modifications — while you can guarantee that the kernel will deliver you each event for a directory you’re interested in, you can’t guarantee being able to register interest in a newly-created directory before something happens to files inside it, leaving you needing to scan inside the directory after registering the watch on it.

So that isn’t going to work. As Jon points out, there are more uses for this than stopping your Linux box acting as a bedtime alarm clock every day; anti-virus people want it (and already use LSMs for the purpose), and smart indexing/backup tools could use it. OS X and Vista both have this kind of indexing service.

What’s really needed is a lightweight layer that sends notifications of filesystem events to userspace via netlink, such that userspace can do what it wants with them. Luckily for us, that’s exactly the patch that appeared on linux-kernel yesterday, courtesy of Yi Yang. This is a small and non-invasive patch (I think the relevant code-review phrase here is: “This is elegant, but correct.”) that does just what we need and no more. It hasn’t had a great reception so far, but I’d love to see it in mainline.

And now, I really should go to sleep. G’night!

Promotion

I feel compelled to point you all at my wife’s science-y blog, ’cause it’s far more interesting than mine. Without further ado: Tipping the Spherical Cow.

gnome-terminal

Following on from my last productivity post, here’s an area in which I’m not excited about the time-saving new features I’m using, but would love to be: gnome-terminal. I see that it just branched for 2.14, leaving us free to fantasize about new features for HEAD. Here’s my top three:

Local search through scrollback:

screen(1) lets you search through scrollback. I do so a lot. But I wish it was better:

  • There’s no incremental search, and matches aren’t highlighted as you type (as in an emacs search-forward or vim hlsearch), which means that you don’t know whether the next match is going to be on the line above or the page above until you hit return and look around to see where the cursor went.
  • The latency is on the wrong end; when working over ssh tunnels, or on a low-bandwidth connection, it’s painful to try and operate screen’s search, leaving me making typos and confused and accidentally falling out of copy mode and ending up at the tail of the terminal again. You’ve been there, you know what I’m talking about.
  • I don’t run the majority of my shells inside screen, and there’s no way to stick the contents of an terminal inside screen once you realise you need to search through it.

Personally, I think this is a pretty compelling argument for local, incremental search through scrollback. Anyone else?

Horizontally-stretchable views:

I work in terminals with a lot of text (e-mail) and a lot of program (make) output, and want to minimise paging up and down — if you do too, you probably try and resize your terminal such that it’s still around 80 cols, but is the height of most of your screen. That’s fine, but vertical screen space is expensive; you can only do that for a small number of terminals.

This got me thinking about extending the terminal horizontally, such that text scrolls off the right segment of a terminal and immediately onto the left segment, where each segment is 80×24 or whatever. For more details, see the feature request I filed in GNOME Bugzilla asking what the gnome-terminal maintainers think about this.

Display-independent terminals:

I don’t care about this feature as much, but it’s a logical step forward from the separation of model and view that the last feature requires. Your gnome-terminals could exist outside of the X server they were started from and be attached/detached from X servers at will, as you move around; this is already possible with some X applications. The use cases are keeping terminal state between work and home, or even more importantly for me: not having to worry about losing terminals when your X server restarts.

Does anyone think any of these have potential?

Also, hello to Planet GNOME! You might be interested in my previous posts on zsh/ssh/emacs
and Linux on the Treo 650. If anyone’s willing to come up with a Hackergotchi for me, there’s a photo here. Thanks!

Productivity

The main purpose of this post is to show off a link I found documenting some of the under-used features of emacs — Effective Emacs. (Thanks to Edward O’Connor’s blog for the link.)

I’ve been on an optimisation binge recently, making sure that I’m getting the best out of my editor and shell. I decided to document some of the features I’m using:

zsh:

  • Hostname completion based on the contents of your ~/.ssh/known_hosts file. This requires you to turn off HashKnownHosts (see below), and add the following to your ~/.zshrc:
       hosts=(${${${${(f)"$(<$HOME/.ssh/known_hosts)"}:#[0-9]*}%%*}%%,*})
       zstyle ':completion:*:hosts' hosts $hosts
  • Remote filename completion over ssh, which works wonderfully with public key auth, remote host completion, and the ssh ControlMaster tip below. This is enabled by default; an example use is below, with the bold characters written by tab presses rather than by my keyboard directly:

    % scp foo.html printf.net:public_html/index.html

  • Colour matches in grep results (in green):

    export GREP_COLOR='01;32'
    alias grep='grep --color'
    
  • pushd: Few people seem to use directory stacks in their shell. After enabling auto_pushd as below, you can quickly popd back to the last directory you were in (and popd again for the directory before that, etc), or use dirs to see the stack of past directories that you can cd to using cd ~n, where n is the number given for that directory. To enable:

    setopt auto_pushd
    
  • The <() construct lets you avoid having to use temporary files as arguments to commands, like so:

    diff -y <(wc -l 1/*) <(wc -l 2/*)
    
  • zsh's cd has a useful three-argument syntax where the third argument is treated as a replacement for the portion of the current directory given in the second argument:

    ~/dir % ls
    foo1  foo2  foo3  foo4  foo5  foo6
    ~/dir % cd foo5
    ~/dir/foo5 % cd 5 6
    ~/dir/foo6 %
    

ssh:

  • By default, modern ssh hashes the known_hosts file so that someone who hacks access to your account doesn't have a list of where they might be able to go next. This is sensible, but breaks the hostname completion above, so I turn it off in ~/.ssh/config:

    HashKnownHosts no
    
  • New (4.0+) versions of OpenSSH have support for multiplexing several shells over a single ssh connection; this means that the second time you type ssh host, the first (already established) connection is used and told to spawn a new shell, making your new shell appear immediately instead of in a few seconds. This cuts login time for a new shell from 1.891s to 0.267s on my work machine. It also speeds up anything that uses a single ssh session per file such as bash/zsh remote filename completion (see above), or rsync/darcs/svn/etc over ssh. To enable, in ~/.ssh/config:

    ControlMaster auto
    ControlPath /tmp/%r@%h:%p
    

I have a few annoyances with ControlMaster — let me know if you know of a clean way to have the first connection for each host be created as a background process without a tty so that it can't easily be killed by accident.

emacs:

  • I use tramp to edit files remotely — this has the same host and filename completion as zsh, and is insanely useful for me; some of the machines I use at work are an ssh tunnel away (meaning high latency) and don't have emacs or vim installed (only vi, meaning no syntax highlighting). Setting up tramp is easy:

    (require 'tramp)
    C-x C-f /somehost:some/dir/and/file RET
    
  • I use gnus (see also my.gnus.org) for mail, news and RSS reading, and think it's the best mailer ever.

  • I use ERC as an IRC client.

My dotfiles are available online.

Treo 650 Linux

I’ve just taken advantage of the massively impressive work performed by Shadowmite
and others and installed Linux on Mad‘s Treo 650. Here’s a photo:

Plenty more on my Flickr page.

I used Luke’s instructions, but couldn’t get the supplied watch-and-upload script working; it wasn’t seeing the Treo’s responses, so I ended up (after verifying that the connection was okay with minicom) just pasting in:

    cat ./codeupload.load > /dev/ttyUSB0
    ./sendimage ./zImage 0xa0800000 > /dev/ttyUSB0
    ./sendimage ./initrd.gz 0xa1500000 > /dev/ttyUSB0
    ./sendimage ./linuxupload.bin 0xa1d00000 > /dev/ttyUSB0
    cat ./go.bin > /dev/ttyUSB0

.. and hitting space on the Treo to boot Linux.

Everything but the GSM radio is working now; it’s hoped that that’ll be on the way soon, as it presents itself as a simple AT command-set modem over a UART.