Fun with graphics drivers

It all started, as most things in the universe have, with a slight popping noise and a bad smell.

One of my coworkers sent me a message last weekend, while I was at home, pointing out that my desktop machine at work smelled like it was burning. After discussing the merits of turning off computers emitting burning smells before chatting with their owners about it, we had a look inside and found this:

The leftmost set of capacitors is fine, the set to the left of the fan is not. It’s actually a very well-controlled demolition; an electrolytic capacitor has boiled and blown out the top of its cap, which is perforated to enable exactly this failure method, and then once one cap is gone the rest are obliged to follow suit ‘cause the current on them increases in response. So, anyway, this left me in the market for a new graphics card.

The card in the picture is an nVidia 7300, which I bought two years ago because it was the only cheap card that could handle the dual-link DVI required for my 30” display with free drivers. Two months ago, though, Dave Airlie committed support for dual-link TMDS on ATI Avivo cards. These cards had historically been the worst of the worst for drivers—the free drivers were left nowhere without any specifications, and even the ATI binary driver for Linux took a long time to add support. Thanks largely to AMD’s latest NDA-less specification drops, though, we’re well on the way to a free accelerated driver for these cards, so I bought a X1600+ from buy.com.

When I booted X on it with the radeon driver, though, it looked craptastic. Since I like hacking on drivers, and since I didn’t have much of a choice, I thought I’d try to figure out what the problem was. The rest of this post is about what I tried during the subsequent bonding exercise between me and the registers on my new graphics card.

First, I asked around on IRC, and learned that it isn’t a known problem and that obtaining a register dump might be a good idea. You can get the “avivotool” that does this:

git clone git://people.freedesktop.org/~airlied/radeontool
git checkout -b avivo origin/avivo

I made the register dumps, but nothing was standing out. Then I lucked out: I tried the fglrx (proprietary) driver to see whether the problem happened there—it didn’t, and furthermore it didn’t happen on the radeon driver after fglrx had run. This suggests that fglrx is setting useful registers that radeon doesn’t even know enough about to reset when it starts.

The next step is to diff register dumps with broken-radeon and post-fglrx working-radeon. This got >100 differently-set registers. Dave Airlie gave me a basic explanation of the register layout: under 0×1000 and over 0×6000 are setup, and the rest are mostly the 2D acceleration space.

Manually setting the <0×1000 space to the working values didn’t come up with anything, though, and trying every register by hand was getting tedious. I wrote a perl script that takes two register dumps, assumes one is “good” and one “bad”, and offers to set each differently-set register to the “good” state, pausing for a second for you to inspect the screen output for a fix inbetween each.

And, eventually, it got it right—0×6590 (which radeon_reg.h knows to be AVIVO_D1SCL_SCALER_ENABLE ) and 0×6594 are both set when the card boots, but unset by the fglrx driver, and unsetting them while running radeon results in a fixed screen image. We can do this simply in the driver:

OUTREG(0x6590, 0);
OUTREG(0x6594, 0);

At this point, Alex Deucher (who works at AMD/ATI on open-source driver development—awesome!) stepped in and explained that the scaler code doesn’t work yet, and the driver should really turn it off when it isn’t being asked for. So now we do.

Here’s a photo of the working result. Planet Emacsers will be pleased to see a ridiculous amount of emacs in it.

Comments

  1. Great!
    I hope the best for AMD/ATI open source drivers.

    Anyway, it’s funny; where I live, here in Argentina, that Nvidia video card would never go to the trash without changing the boiled capacitors and trying to take it back to life. :-)

    Regards,
    Marcelo

    Reply
  2. Hi Marcelo!

    > that Nvidia video card would never go to the trash without changing the boiled capacitors and trying to take it back to life. :-)

    You’re not the first person to say that to me. :) The fan died too, I think, but it should still be salvagable. I needed something to let me continue working again quickly, though.

    Reply
  3. Better check if you dont have some cheap(and sh****) power supply like Codegen, L&C(known also as Tracer/LC/Megabajt), rubikon, etc, because 90% of the hardware failures are caused by them.

    Reply
  4. > I’d heard EVGA was actually a pretty good brand for nVidia cards, and had good returns policy – kind of surprising.

    I don’t have any complaints about EVGA. I haven’t tried to return the card, since it’s presumably out of warranty, and it looks like the problem is that the fan got clogged (after more than a year running in the same machine) and failed.

    Reply
  5. I had the exact same model of evga video card die on me in a very similar way as yours. The fan died and then the capacitors popped. Strange how that is.

    Reply
  6. Hi Justin,

    Yeah, the card I replaced the eVGA one with has died in a similar way since this blog post; I’m left thinking that graphics cards just aren’t intended to last more than a year.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>