Fun with graphics drivers —
It all started, as most things in the universe have, with a slight popping noise and a bad smell.
One of my coworkers sent me a message last weekend, while I was at home, pointing out that my desktop machine at work smelled like it was burning. After discussing the merits of turning off computers emitting burning smells before chatting with their owners about it, we had a look inside and found this:
The leftmost set of capacitors is fine, the set to the left of the fan is not. It’s actually a very well-controlled demolition; an electrolytic capacitor has boiled and blown out the top of its cap, which is perforated to enable exactly this failure method, and then once one cap is gone the rest are obliged to follow suit ‘cause the current on them increases in response. So, anyway, this left me in the market for a new graphics card.
The card in the picture is an nVidia 7300, which I bought two years ago because it was the only cheap card that could handle the dual-link DVI required for my 30” display with free drivers. Two months ago, though, Dave Airlie committed support for dual-link TMDS on ATI Avivo cards. These cards had historically been the worst of the worst for drivers—the free drivers were left nowhere without any specifications, and even the ATI binary driver for Linux took a long time to add support. Thanks largely to AMD’s latest NDA-less specification drops, though, we’re well on the way to a free accelerated driver for these cards, so I bought a X1600+ from buy.com.
When I booted X on it with the radeon driver, though, it looked craptastic. Since I like hacking on drivers, and since I didn’t have much of a choice, I thought I’d try to figure out what the problem was. The rest of this post is about what I tried during the subsequent bonding exercise between me and the registers on my new graphics card.
First, I asked around on IRC, and learned that it isn’t a known problem and that obtaining a register dump might be a good idea. You can get the “avivotool” that does this:
git clone git://people.freedesktop.org/~airlied/radeontool
git checkout -b avivo origin/avivo
I made the register dumps, but nothing was standing out. Then I lucked out: I tried the fglrx (proprietary) driver to see whether the problem happened there—it didn’t, and furthermore it didn’t happen on the radeon driver after fglrx had run. This suggests that fglrx is setting useful registers that radeon doesn’t even know enough about to reset when it starts.
The next step is to diff register dumps with broken-radeon and post-fglrx working-radeon. This got >100 differently-set registers. Dave Airlie gave me a basic explanation of the register layout: under 0×1000 and over 0×6000 are setup, and the rest are mostly the 2D acceleration space.
Manually setting the <0x1000 space to the working values didn’t come up with anything, though, and trying every register by hand was getting tedious. I wrote a perl script that takes two register dumps, assumes one is “good” and one “bad”, and offers to set each differently-set register to the “good” state, pausing for a second for you to inspect the screen output for a fix inbetween each.
And, eventually, it got it right—0×6590 (which radeon_reg.h knows to be
AVIVO_D1SCL_SCALER_ENABLE ) and 0×6594 are both set when the card boots, but unset by the fglrx driver, and unsetting them while running radeon results in a fixed screen image. We can do this simply in the driver:
At this point, Alex Deucher (who works at AMD/ATI on open-source driver development—awesome!) stepped in and explained that the scaler code doesn’t work yet, and the driver should really turn it off when it isn’t being asked for. So now we do.
Here’s a photo of the working result. Planet Emacsers will be pleased to see a ridiculous amount of emacs in it.