Rawhide!

Keeping track of details for a few bugs I’m seeing in current Rawhide on the new laptop:

  • S3 sleep is broken, possibly libata-wide since Jeremy is hitting the same symptoms as me (no disk access on resume) and he’s on AHCI while I’m on sd_mod and ata_piix. The symptoms are syslogd complaining that it can’t write out the journal, and lots of repeated:
sd 0:0:0:0: SCSI error: return code = 0x40000
end_request: I/O error, dev sda, sector 18800717
  • pccardctl eject is giving me an oops, which is new to Rawhide; looks like this is upstream, since it’s the same oops as in this lkml post.
  • The lvm(8) in my kernel-2141 initrd fails to boot, giving:
Volume group for uuid not found: (uuid)
0 logical volume(s) in volume group "group0" now active
mount: could not find filesystem /dev/root

I reverted bin/lvm in the initrd to the version found in 2136, which has the same version number but a different md5sum — that booted, so this looks like a new bug in lvm.

Linux on the Alienware m5500

I’ve bought myself an Alienware m5500, which is a laptop based on the Uniwill 259EN3. I wanted a laptop with a high resolution screen (the m5500 does 1920×1200 on a 15″ LCD) and Intel graphics. (I don’t want to support either ATI or nVidia, and IBM and Dell make you choose an ATI/nVidia card if you go above their standard resolution screens). My machine has:

  • Intel Pentium M 730 1.6GHz 2MB L2 Cache 533MHz FSB
  • 512M RAM, 40G disk
  • Alienware m5500 15.4″ WideUXGA 1920×1200 LCD
  • Intel GMA900 (915GM/i810) Extreme Graphics
  • Intel PRO/Wireless 2200 b/g Wireless Card

I have Fedora Core 5 running on it now, but I’m sad to say that it wasn’t without pain:

I booted from the FC5 install CD. The kernel booted, switched over to Anaconda, and I got a black screen and hung system. Tried again, this time using these instructions to start a VNC installer. That worked, and I soon found out that Fedora now has no NTFS support at all, which meant that the single NTFS partition taking up the entire disk would have to be deleted. (Which sounded fine at the time but is annoying me now — see the wifi section below.) Ubuntu’s installer can do NTFS resizing, but Fedora thinks that the kernel NTFS support violates patents and is not legally redistributable.

After the install, I got the same X crash on my first boot. Booting into runlevel 3 got me to a shell. I determined that the X failure was this bug, which requires applying this patch to Xorg to get the card working. Now that Xorg is modular, it’s (thankfully) sufficient to just download the latest xorg-x11-server.src.rpm, make the change, and rpmbuild -ba to have the specfile take care of building you a new RPM.

That got X working at 800×600; to get to 1920×1200 you need to use 915resolution and the following modeline:

ModeLine "1920x1200" 230 1920 1936 2096 2528 1200 1201 1204 1250

Now, on to wireless. I downloaded the ieee80211 stack, ipw2200 driver and ipw2000 firmware, and:

unity:cjb~ % sudo modprobe ipw2200
ipw2200: Detected Intel PRO/Wireless 2200BG Network Connection
ipw2200: Radio Frequency Kill Switch is On:
Kill switch must be turned off for wireless networking to work.
ipw2200: Detected geography ZZM (11 802.11bg channels, 0 802.11a channels)

There is a wifi button on the case, next to the power switch — which implies that it’s a hardware button, rather than a software button like the function keys on the keyboard — but pressing it turns on the “sleep” LED rather than the “wifi” LED, and ipw2200 continues to make the same complaint. I see that drivers exist to control the kill switch in software, but none I’ve seen work on this machine. I suspect that booting into Windows (which I can’t do now since I wasn’t able to keep/resize the partition) would get the wifi working in Linux either temporarily or permanently. I took a backup of the disk (dd if=/dev/sda of=nfs-mounted-backup) and booted the XP recovery CD that came with the machine, but it either hung or needed many minutes at “Setup is inspecting your computer’s hardware configuration”, and my patience for Windows is pretty limited. I’ll try again if I don’t get any other suggestions.

I posted some cries for help about this and dug out my very old, trusty Orinoco wireless PC Card and inserted it. And the machine hung immediately, and did so repeatably. I dug around until I found this linux-kernel thread which contains a safer /etc/pcmcia/config.opts, and that’s working fine. Update: Hmph. It works anytime after init has loaded (udev, perhaps?), but the machine hangs at the cs: memory probe if I try to boot with the card already inserted. Any ideas?

I set about installing AIGLX, and the performance is good, especially for an Intel chipset. The photo at the bottom shows AIGLX and true transparency in gnome-terminal, which is lovely to have (at last), especially on a laptop when you’re likely to want to be copying from a web page into a terminal, etc.

So, that’s about it for day one. To summarise:

ACPI

Well, sleep (suspend-to-ram; S3) seemed to be working out of the box, but it isn’t now. It looks like the IDE driver isn’t coming back; if I can keep an ssh session connected, I see repeated in dmesg:

sd 0:0:0:0: SCSI error: return code = 0x40000
end_request: I/O error, dev sda, sector 18800717

I’ll look into having the sd module removed and restarted — anyone know how I should do that on FC5/Rawhide? I also had to change resume_video() in /etc/pm/functions-intel to get video back:

#/usr/sbin/vbetool post
#/usr/sbin/vbetool vbestate restore < /var/run/vbestate
/usr/sbin/vbetool dpms on

Suspend to disk works, and doesn't even take that long, but kicks me back to the gdm prompt (!) from a logged in session. Update: Working now, after putting 915resolution in the resume path as well. Thanks, Jens!

The ACPI function keys don't do anything in software, not sure which driver they need.

CPU scaling

Working, switches between 750MHz/1.1GHz/1.6GHz automatically.

Graphics

Working after patching Xorg, acceleration works, AIGLX works.

Synaptics touchpad

Working, I disabled tap-to-click and vertical scrolling:

Option      "VertScrollDelta" "0"
Option      "HorizScrollDelta" "0"
Option      "MaxTapMove" "0"
Option      "MaxTapTime" "0"

USB, Sound (snd-intel-hda), SD card reader, DVD burning, Firewire.

Working.

VGA out

Works, after adding the following:

Option "MonitorLayout" "CRT,LFP"
Option "Clone" "true"
Option "SWCursor" "true"

I'll keep this post updated as I learn more about the machine. Here's the photo:

Filesystem notifications revisited

After writing my previous post about notifications (which is required reading for the rest of this one, I’m afraid), I was able to talk to Robert Love about the Yi Yang patch, and why he thinks it didn’t get a good response. He gave a few reasons:

  • It’s lossy; you lose events from boot time, from before the userspace daemon starts, and if the daemon crashes.
  • Requires root to listen on netlink, as opposed to the inotify_add_watch(2) syscall interface of inotify.
  • For this purpose poll(2) is good, netlink is bad.

All of which are reasonable complaints. If that’s the wrong solution, though, what would the right one look like? Thankfully, Robert has ideas on that too (and I hope I explain them correctly):


To avoid the causes of lossiness above, the log should be an on-disk log maintained by the filesystem. Having it done by each filesystem separately isn’t necessarily awful for maintainability; the ext3 journalling layer is supposedly generic. There should be sequence points, so the (single) userspace daemon reading the log knows that it’s up to date as of sequence n, and can tell the kernel to clean the parts of the log up to n.
The on-disk log would be fixed in size and circular — so still lossy so far — but Robert has an idea (which Tridge and Rusty Russell are apparently also partly responsible for) to make sure the lossiness doesn’t hurt so bad. Here it is.

You do event “compression”; the log is stored in a tree of path names and events. If you have change events for a couple of hundred files in /home/foo/{bar,baz,etc}, you mark all of /home/foo as dirty and throw away the events inside it. Userspace has to go off and stat(2)-dance inside /home/foo to find which files have changed, but at least you’ve traded precision for accuracy and come out with a log that enables every change to be noticed. You’d keep reparenting as you run out of room in the log, so if you exhausted log space recording changes in /home/foo and /home/bar, all of /home gets marked dirty.
This is still root-only so far, but you can build security on it.


So! This is a lot more work than Yang’s elegant netlink patch, but I’ve decided that ridding the world of updatedb is a worthy goal, and so I’ll be starting work on Robert’s design next week. I’ve booked days off work to go to LinuxWorld Boston, and an interested friend is visiting from the UK as well. A further idea is to get the userspace daemon to export inotify-compatible events, so that programs like Beagle can use this new mechanism without requiring a rewrite. I’m assured that apps like F-spot and Leaftag are waiting for this kind of event notification too.

(Note: Firefox hung as I was writing this post, taking the unfinished blog entry with it, with strace hanging on a futex. I blame the flash on the LinuxWorld site. But! Getting a core file with gdb’s gcore and running strings on it gave me a perfectly-formatted blog post back. Yay!)

Swap files vs. swap partitions

It took far too long to find this half-remembered linux-kernel thread on Google amidst all the results (mostly from distro installer guides) claiming that swap partitions are preferable to swap files. Here’s hoping the link below will help future searchers.

Swap files and swap partitions have the same performance:

In 2.6 [swap files and swap partitions] have the same reliability and they will have the same performance unless the swapfile is badly fragmented.

— Andrew Morton, on linux-kernel.

(There is a performance difference under Linux 2.4, though, as explained at the link above.)

Update: Tim comments that swsusp (the kernel software suspend-to-disk support) only works with swap partitions, so there is still one good reason to use a swap partition. Suspend2 doesn’t have this limitation.

Lightweight filesystem notifications

sweet rattle of disk:

a new locatedb comes;

I should go to sleep.


— me.

I’ve been thinking, over the last week or two, about the right way to handle an incremental updatedb — and in turn, the right way to handle generic filesystem notifications from the kernel. Fortunately, someone else has been thinking about it too and has actually been writing code, but we’ll get to that in a moment.

These thoughts started off as a linux-kernel thread, with Jon Masters wondering if inotify can be used for this. Alas no, for what appear to be many reasons:

  • inotify has no support for recursive watches; you’d have to put a watch on every directory on the system.
  • Even this wouldn’t work, because there’s a hard limit on the number of watches on a system (8192 per “device”, but inotify doesn’t use a device interface anymore, it uses syscalls now) and this limit is an order of magnitude smaller than the number of directories on my /.
  • There’s a race condition which would kill performance, meaning that you have to do a stat(2) dance over each directory to make sure you see modifications — while you can guarantee that the kernel will deliver you each event for a directory you’re interested in, you can’t guarantee being able to register interest in a newly-created directory before something happens to files inside it, leaving you needing to scan inside the directory after registering the watch on it.

So that isn’t going to work. As Jon points out, there are more uses for this than stopping your Linux box acting as a bedtime alarm clock every day; anti-virus people want it (and already use LSMs for the purpose), and smart indexing/backup tools could use it. OS X and Vista both have this kind of indexing service.

What’s really needed is a lightweight layer that sends notifications of filesystem events to userspace via netlink, such that userspace can do what it wants with them. Luckily for us, that’s exactly the patch that appeared on linux-kernel yesterday, courtesy of Yi Yang. This is a small and non-invasive patch (I think the relevant code-review phrase here is: “This is elegant, but correct.”) that does just what we need and no more. It hasn’t had a great reception so far, but I’d love to see it in mainline.

And now, I really should go to sleep. G’night!

Promotion

I feel compelled to point you all at my wife’s science-y blog, ’cause it’s far more interesting than mine. Without further ado: Tipping the Spherical Cow.

gnome-terminal

Following on from my last productivity post, here’s an area in which I’m not excited about the time-saving new features I’m using, but would love to be: gnome-terminal. I see that it just branched for 2.14, leaving us free to fantasize about new features for HEAD. Here’s my top three:

Local search through scrollback:

screen(1) lets you search through scrollback. I do so a lot. But I wish it was better:

  • There’s no incremental search, and matches aren’t highlighted as you type (as in an emacs search-forward or vim hlsearch), which means that you don’t know whether the next match is going to be on the line above or the page above until you hit return and look around to see where the cursor went.
  • The latency is on the wrong end; when working over ssh tunnels, or on a low-bandwidth connection, it’s painful to try and operate screen’s search, leaving me making typos and confused and accidentally falling out of copy mode and ending up at the tail of the terminal again. You’ve been there, you know what I’m talking about.
  • I don’t run the majority of my shells inside screen, and there’s no way to stick the contents of an terminal inside screen once you realise you need to search through it.

Personally, I think this is a pretty compelling argument for local, incremental search through scrollback. Anyone else?

Horizontally-stretchable views:

I work in terminals with a lot of text (e-mail) and a lot of program (make) output, and want to minimise paging up and down — if you do too, you probably try and resize your terminal such that it’s still around 80 cols, but is the height of most of your screen. That’s fine, but vertical screen space is expensive; you can only do that for a small number of terminals.

This got me thinking about extending the terminal horizontally, such that text scrolls off the right segment of a terminal and immediately onto the left segment, where each segment is 80×24 or whatever. For more details, see the feature request I filed in GNOME Bugzilla asking what the gnome-terminal maintainers think about this.

Display-independent terminals:

I don’t care about this feature as much, but it’s a logical step forward from the separation of model and view that the last feature requires. Your gnome-terminals could exist outside of the X server they were started from and be attached/detached from X servers at will, as you move around; this is already possible with some X applications. The use cases are keeping terminal state between work and home, or even more importantly for me: not having to worry about losing terminals when your X server restarts.

Does anyone think any of these have potential?

Also, hello to Planet GNOME! You might be interested in my previous posts on zsh/ssh/emacs
and Linux on the Treo 650. If anyone’s willing to come up with a Hackergotchi for me, there’s a photo here. Thanks!

Productivity

The main purpose of this post is to show off a link I found documenting some of the under-used features of emacs — Effective Emacs. (Thanks to Edward O’Connor’s blog for the link.)

I’ve been on an optimisation binge recently, making sure that I’m getting the best out of my editor and shell. I decided to document some of the features I’m using:

zsh:

  • Hostname completion based on the contents of your ~/.ssh/known_hosts file. This requires you to turn off HashKnownHosts (see below), and add the following to your ~/.zshrc:
       hosts=(${${${${(f)"$(<$HOME/.ssh/known_hosts)"}:#[0-9]*}%%*}%%,*})
       zstyle ':completion:*:hosts' hosts $hosts
  • Remote filename completion over ssh, which works wonderfully with public key auth, remote host completion, and the ssh ControlMaster tip below. This is enabled by default; an example use is below, with the bold characters written by tab presses rather than by my keyboard directly:

    % scp foo.html printf.net:public_html/index.html

  • Colour matches in grep results (in green):

    export GREP_COLOR='01;32'
    alias grep='grep --color'
    
  • pushd: Few people seem to use directory stacks in their shell. After enabling auto_pushd as below, you can quickly popd back to the last directory you were in (and popd again for the directory before that, etc), or use dirs to see the stack of past directories that you can cd to using cd ~n, where n is the number given for that directory. To enable:

    setopt auto_pushd
    
  • The <() construct lets you avoid having to use temporary files as arguments to commands, like so:

    diff -y <(wc -l 1/*) <(wc -l 2/*)
    
  • zsh's cd has a useful three-argument syntax where the third argument is treated as a replacement for the portion of the current directory given in the second argument:

    ~/dir % ls
    foo1  foo2  foo3  foo4  foo5  foo6
    ~/dir % cd foo5
    ~/dir/foo5 % cd 5 6
    ~/dir/foo6 %
    

ssh:

  • By default, modern ssh hashes the known_hosts file so that someone who hacks access to your account doesn't have a list of where they might be able to go next. This is sensible, but breaks the hostname completion above, so I turn it off in ~/.ssh/config:

    HashKnownHosts no
    
  • New (4.0+) versions of OpenSSH have support for multiplexing several shells over a single ssh connection; this means that the second time you type ssh host, the first (already established) connection is used and told to spawn a new shell, making your new shell appear immediately instead of in a few seconds. This cuts login time for a new shell from 1.891s to 0.267s on my work machine. It also speeds up anything that uses a single ssh session per file such as bash/zsh remote filename completion (see above), or rsync/darcs/svn/etc over ssh. To enable, in ~/.ssh/config:

    ControlMaster auto
    ControlPath /tmp/%r@%h:%p
    

I have a few annoyances with ControlMaster — let me know if you know of a clean way to have the first connection for each host be created as a background process without a tty so that it can't easily be killed by accident.

emacs:

  • I use tramp to edit files remotely — this has the same host and filename completion as zsh, and is insanely useful for me; some of the machines I use at work are an ssh tunnel away (meaning high latency) and don't have emacs or vim installed (only vi, meaning no syntax highlighting). Setting up tramp is easy:

    (require 'tramp)
    C-x C-f /somehost:some/dir/and/file RET
    
  • I use gnus (see also my.gnus.org) for mail, news and RSS reading, and think it's the best mailer ever.

  • I use ERC as an IRC client.

My dotfiles are available online.

Treo 650 Linux

I’ve just taken advantage of the massively impressive work performed by Shadowmite
and others and installed Linux on Mad‘s Treo 650. Here’s a photo:

Plenty more on my Flickr page.

I used Luke’s instructions, but couldn’t get the supplied watch-and-upload script working; it wasn’t seeing the Treo’s responses, so I ended up (after verifying that the connection was okay with minicom) just pasting in:

    cat ./codeupload.load > /dev/ttyUSB0
    ./sendimage ./zImage 0xa0800000 > /dev/ttyUSB0
    ./sendimage ./initrd.gz 0xa1500000 > /dev/ttyUSB0
    ./sendimage ./linuxupload.bin 0xa1d00000 > /dev/ttyUSB0
    cat ./go.bin > /dev/ttyUSB0

.. and hitting space on the Treo to boot Linux.

Everything but the GSM radio is working now; it’s hoped that that’ll be on the way soon, as it presents itself as a simple AT command-set modem over a UART.