qrux.qed at gmail.com
Wed Feb 15 18:47:37 PST 2012
On Feb 15, 2012, at 5:00 PM, Bruce Dubbs wrote:
> Qrux wrote:
>> Is there a reason ntpd is run with -x?
>> The big slew is nice, but is there a reason it's preferred over the kernel discipline?
> When you are booting, there is probably nothing else really depending on
Whether or not things run at boot time are sensitive to timestamps is irrelevant. Because, if nothing cares, then it doesn't matter whether you step or slew.
It also wasn't the question I was asking. I run ntpd in daemon mode, because I want it to keep correcting my time after boot, and that's where the slewing/stepping behavior is relevant. From the man page for ntpd, about -x:
"Slew up to 600 seconds.
"Normally, the time is slewed if the offset is less than the step threshold, which is 128 ms by default, and stepped if above the threshold...Since the slew rate of typical Unix kernels is limited to 0.5 ms/s, each second of adjustment requires an amortization interval of 2000s."
If the kernel slew rate is limited to 0.5 ms/s, then your clock had better not drift by more than ~43 seconds/day, because no amount of slew will correct this. So, to me, this is kind of silly. Turning slew up to 600 s is kinda meaningless, unless you can also adjust the slew rate (and I don't see any mention about kconfig parameter to change that). I would bet that a 43 s/d drift is rare on "reasonably current hardware", and that if you're seeing it, you're doing something silly like chaining UPSs or keeping your PCs in a bad thermal environment (clock oscillators are very sensitive to temperature).
* Most people probably don't drift by more than 43 s/d.
* If they did, -x isn't their solution; it just hides a bigger issue.
* Kernel discipline (which -x disables) handles leap-seconds better.
* So, I propose turning -x off.
In addition, the BLFS ntpd is also run with -g. Long story short, it's better to step "while you can" (i.e., before anything time-sensitive starts, like your application stack with database servers or network authentication servers like LDAP or Kerberos). In fact, the kernel does it anyway, when it loads the "reference time" from the CMOS RTC.
Last leap-second was in 2008. The next leap-second was originally scheduled for June 2012. I heard back in Jan that might be postponed. Either way, I think '-x' should not be the default.
* * * Additional Info * * *
Getting back to -x...I guess slewing is fine if you really need a slew of up to 600 seconds, and you have the kernel support to do it. But, why choose that as the default over kernel discipine? The situation where -x would benefit you would be the most rare of situations where you either have a one-time error and could afford a 14d slew, or you see this kind of drift often enough and could adjust the kernel slew rate to deal with it. In fact, if your system needs -x, you probably don't care about good time anyway--or, should be depending on it. If you need to run ntpd with -x, you probably have bigger fish to fry, first.
Time discipline is about who gets to discipline the clock, and how. NTP can do it through adjtime()--with microsecond resolution--or through adjtimex() which allows much higher precision (at the cost of portability, since, AFAIK, that system call is only available on Linux & FreeBSD). The latter (adjtimex) requires kernel support. In addition to precision (though, at the cost of slightly lower accuracy due to less algorithmic sophistication), there is another benefit to kernel discipline...
There is a time when kernel discipline is better than "always slewing by the default slew rate limit"...Which is during leap seconds. During a leap-second, using a non-kernel discipline and a slow monotonic slew, time will go forward a little bit faster, but very slowly. Which means, using -x, the full extent of that leap second won't be registered until 2000 seconds later. Practically speaking, it comes down to: when the next leap second hits, do you want to be off by over half-a-second over a period of 1000 s, or would you rather have each timestamp to be off by a few dozen microseconds (arguably the difference between the higher-accuracy-NTP discipline, or the kernel's own slightly-less-than-fanatically-accurate discipline). Using the kernel discipline, which can overcome the default slew rate, it will be registered very-near immediately.
I would think leap-second-correctness > possibly-absurdly-high-accuracy-that-may-not-matter. You could reframe my original question as: "Why is the BLFS default choice to opt for a possibly-more-accurate time in place of a more-correct-time?" The NTP slewing is maybe more sophisticated than the kernel's. But, it won't handle leap-seconds as well.
More information about the blfs-dev