Lightweight filesystem notifications

sweet rattle of disk:

a new locatedb comes;

I should go to sleep.


— me.

I’ve been thinking, over the last week or two, about the right way to handle an incremental updatedb — and in turn, the right way to handle generic filesystem notifications from the kernel. Fortunately, someone else has been thinking about it too and has actually been writing code, but we’ll get to that in a moment.

These thoughts started off as a linux-kernel thread, with Jon Masters wondering if inotify can be used for this. Alas no, for what appear to be many reasons:

  • inotify has no support for recursive watches; you’d have to put a watch on every directory on the system.
  • Even this wouldn’t work, because there’s a hard limit on the number of watches on a system (8192 per “device”, but inotify doesn’t use a device interface anymore, it uses syscalls now) and this limit is an order of magnitude smaller than the number of directories on my /.
  • There’s a race condition which would kill performance, meaning that you have to do a stat(2) dance over each directory to make sure you see modifications — while you can guarantee that the kernel will deliver you each event for a directory you’re interested in, you can’t guarantee being able to register interest in a newly-created directory before something happens to files inside it, leaving you needing to scan inside the directory after registering the watch on it.

So that isn’t going to work. As Jon points out, there are more uses for this than stopping your Linux box acting as a bedtime alarm clock every day; anti-virus people want it (and already use LSMs for the purpose), and smart indexing/backup tools could use it. OS X and Vista both have this kind of indexing service.

What’s really needed is a lightweight layer that sends notifications of filesystem events to userspace via netlink, such that userspace can do what it wants with them. Luckily for us, that’s exactly the patch that appeared on linux-kernel yesterday, courtesy of Yi Yang. This is a small and non-invasive patch (I think the relevant code-review phrase here is: “This is elegant, but correct.”) that does just what we need and no more. It hasn’t had a great reception so far, but I’d love to see it in mainline.

And now, I really should go to sleep. G’night!

Comments

  1. Hi Chris,

    Here’s another idea for generic filesystem notifications: log them in a circular log.

    Why you may ask? Well, if you use userspace indexing tools, they will be started quite some time after mounting a volume and are stopped well before unmounting. Lots of files might be modified in the time in between. This information is lost to the userspace indexing tool because it is not running. A cirular log will provide help.

    This log should store compact versions of all notifications with a unique id. This id is obtained like this: it is the offset of a message in the log + (large number)*(the version number of the log). The version number of the log is the number of times the log has started writing at the front of the log.

    Now if a userspace indexing application starts, it can read the messages from the log starting at the position where it left off. The log should be large enough that this is usually the case.

    The entries in the log could be simply the inode number and a modification code.

    I think such a log would be easy to implement, even in userspace inside of the notification daemon.

    Reply

Leave a Reply to Jos van den Oever Cancel reply

Your email address will not be published. Required fields are marked *