Mike.McCarty at sbcglobal.net
Mon Dec 14 14:56:51 PST 2009
Mykal Funk wrote:
> Rod Waldren wrote:
>> I've had a bad battery manifest problems in many ways, not just losing
>> time while powered down. Most recently I was having random problems and
>> odd instability with a system. It was rock solid after replacing the
>> battery. It's wasn't as bad as old Macs which completely lost their
>> minds if the battery was bad, but along similar lines. If it's bad
>> replace it or temporarily swap in a good one to see it it helps.
This indicates either a hardware problem, incompetent design, or
> I replaced the battery and the behavior didn't change. The time loss
That's what I would expect.
> occurs only under high load. When uptime reports a high load average,
> the system loses time like crazy. When it is just sitting doing nothing
> it keeps perfect time. I don't know what to make of it. Perhaps someone
> else can.
This sounds like dropped interrupts. I've co designed, written,
and helped to write or support a few embedded RTOS, and on one
of them we eventually had to include a "missed clock interrupt"
counter. If the RTI handler discovered it was re entered while
still processing, it incremented a counter, and immediately
returned. This occurred on small machine with little processing
power under the hood. We had app designed who insisted that some
of their work needed to be done at interrupt level, and over the
objections of the kernel crew (including me) they got their way.
Anyway, if you have people who think that all CPUs come with
an infinite supply of cycles per second, sometimes you will find
that you run out. The kernel crew's eventual way out was simply
to note the fact, and on the next interupt where we hadn't
been re entered (due to lots of apps hooking the interrupt chain)
we then processed the "missed ticks".
It sounds like something in the kernel is holding interrupts
off too long, or is processing too long, so that there are
missing clock interrupts.
You might try reducing the real time clock interrupt rate, or trying
to select a more "real time" style scheduling strategy. I'm not
any sort of expert on the Linux kernel to know how to go about that,
but some others here may be able to assist. ISTR there is a way
to tune the maximum interrupt rate the kernel will program the
RTI for. For the PC, it's IRQ 8. ISTR there is a way to tell
the kernel to use a high-res timer, that is about 1000 interrupts
per second. If that's true, then there should also be a way to
cut it back.
You might investigate CONFIG_HZ and CONFIG_HIGH_RES_TIMERS.
Certainly, you'd want to turn the latter off, I think.
I searched for "linux clock interrupt rate" (no quotes
in the search, of course) and turned up some stuff which
looks somewhat confirmatory of my hypothetical cause.
This looks like it might have related information
may be the reason for the problem, 1000 interrupts per second
may be more than your hardware can support when the system
becomes loaded, and spends more time with interrupts disabled,
or at least in interrupt processing. Now, they are looking
at it from the other perspective, that is, increasing the
HW clock rate on the host machine relative to the guest.
"The 2.6 Linux kernel in SLES9 changes the amount of
interrupts it uses for clock ticks as compared to the 2.4
kernel in SLES8 from 100\second to 1000\second. A dual-
processor Linux 2.6 kernel can fire up to 3000\sec. This
is usually not an issue when running on a bare metal server. "
In your case, it may well be. So, figuring out how to make
your kernel revert back to the pre 2.6 days rate of 100
interrupts per second may fix your problem. Anyone here who
knows how to do this is requested to chime in.
Oppose globalization and One World Governments like the UN.
This message made from 100% recycled bits.
You have found the bank of Larn.
I speak only for myself, and I am unanimous in that!
More information about the lfs-support