ken at linuxfromscratch.org
Fri Jun 8 15:56:30 PDT 2007
On Fri, Jun 08, 2007 at 11:23:19PM +0200, Tijnema wrote:
> It can build GCC fine, I've builded a complete GCC package (Including
> ada, fortran,...) lately, I just didn't run GCC testsuite completely,
> but compiling and installing works fine, and gcc works fine now :)
Oh, I misunderstood. Testing should be easier than compiling, a
lot of the time the box sits fairly idle during tests.
> I can swap psu of the thing, but that's it, I don't have replacement
> parts for such old PC... I mean it's socket A with SD Ram...
> Oh yeah, i can replace GFX card and LAN card etc, but would that
> matter? I don't think so...
I've known memory to fail, particularly PC100/PC133 SDRAM, but you
said you had run memtest. Unrelatedly, I was idly reading a
magazine this afternoon which showed how to thermal paste, and
showed an athlon XP in the picture (the answer, as you probably
know, is "use as little as possible").
I suppose you could try lm_sensors. Build _all_ the possible
sensors as modules, let the package try them out and hope it doesn't
lock up the box modprobing isa etc, then see if the results (if any)
make any sense - often, you need a degree in the black arts to get
the configuration correct, but I suppose you could keep an eye on
indicated voltages and temperatures while running a test.
> And how sure are you that it is a hardware problem, and not a software
> problem? Maybe a specific number of simultaneous threads?
I'm not especially sure, but your hardware is at least common and
in wide use, so I'm fairly confident it doesn't have too many weird
bugs. I don't think the number of threads will make much difference
- a non-SMP machine can schedule exactly one process at a time (and
you'll have things like X, xterm, kswapd, kjournald kicking in even
when it's nominally idle, so it is always switching processes
although something like CONFIG_HZ_1000 would probably increase the
overhead and hence the load.
However, you said you had to underclock it to get it to boot. That
certainly sounds like a hardware problem.
If you can be bothered to try the things that usually lock it up,
you could have 'top' running at the same time to see what sort of
load average and swap usage you have.
My personal experience is that shutdowns or reboots are caused by
(i) insane hardware (which doesn't apply to your case)
(ii) hardware failing or just wearing out
(iii) inadequate cooling
(iv) cosmic rays, or suchlike - I went through a period at the
start of the year when one of my x86_64 boxes would sporadically
reboot. Each time it was in X, so nothing made it to the logs, and
each time it happened to be running a 32-bit kernel. After taking
it to lkml, the problem eventually disappeared. In the meantime, a
separate thread on lkml strayed into the area of "there has been a
lot of cosmic ray (or alpha particle, or something - I no longer
remember) activity which can cause 'random' errors".
das eine Mal als Tragödie, das andere Mal als Farce
More information about the lfs-support