[blfs-dev] NTP

Qrux qrux.qed at gmail.com
Thu Feb 16 14:13:40 PST 2012

On Feb 16, 2012, at 4:38 AM, Matthew Burgess wrote:

> On Thu, 16 Feb 2012 11:16:12 +0000, Andrew Benton <b3nton at gmail.com> wrote:
>> On Wed, 15 Feb 2012 18:47:37 -0800
>> Qrux <qrux.qed at gmail.com> wrote:
>>> 	* So, I propose turning -x off.
>> I agree, I run ntpd -g
>> However, I also think the ntpd bootscript will work fine for most
>> people and for those (like me) who think it should be done differently
>> it's trivial to edit the bootscript; your distro, your rules and all
>> that ;)
> It probably doesn't affect many LFSers, but Oracle's RAC installation/
> configuration wizard explicitly checks for '-x' in the ntpd options.
> It does this because you really don't want your database server's time
> from jumping backwards, and '-x' (or 'tinker step 0' in /etc/ntp.conf)
> is the only way to guarantee that won't happen.

Interesting!  Sounds like Oracle...

As for the issue--I still stand by my original position that defaults should be sensible, and obey "least surprise".  Running NTP by default with -x is surprising.  I'll leave the 'why' below the fold.


* * *

Technical details follow, for those who are jumping up and down saying:

	"My app cares about time!  So, running NTP with -x protects me!"

In case anyone has forgotten, NTP gives slewing by default.  The question is not whether monotonically increasing time is good.  You get that with OR WITHOUT -x.  The issue is, -x doesn't guarantee anything.  Man page:

	"-x, --slew: Slew up to 600 seconds.  Normally, the time is slewed...and stepped if above the threshold.  This option sets the threshold to 600 s, which is well within the accuracy window to set the clock manually."

It simply raises the threshold to 600s, from 128ms.  And, in cases where you clock is drifting by more than 10 minutes in the polling interval (and you're saying your app cares about time?) then it wants YOU TO MANUALLY ADJUST THE TIME, before running NTP again.  I want to see you do that by hand, and keep things monotonically increasing, especially if you drifted forward.  I know...you'll shut down your production machine until those 600 s have elapsed, right?  And, in that same situation where you've drifted beyond 600s, if you combine -x with -g, you simply get a big step that doesn't shutdown ntpd--but, the point is, you get a STEP.  Lacking the -g, ntpd simply stops itself.

I, too, care about time in my apps.  So, I've looked into it.  And, in the little I know, -x protects nothing.  People spend all kinds of time worrying about various other minutiae (MTBF of hard drives, vibration in their systems causing bad feedback on platters, dual-redundant power supplies, etc, etc, etc) and they want absolutely order-dependent mission-critical applications to depend on the same technology that powers their Timex from 1982?  No.  Real apps that *really* care about time go out of their way to make sure their time hardware is as good as anything else.  They get crystal clocks enclosed inside a temperature-controlled, vibration-dampened enclosure with electronic conditioning. And, if they're careful, they use the CO as a *counter*, not as a *clock*.  Monotonicity is about counting ticks on a counter, not getting time from a clock.

So, -x is not a guarantee.  It's a stop-gap, for when your clock (or the environment around your clock) is failing miserably.  If you're in a situation where you're drifting for more than 600 seconds in a single polling interval, NTP is going to step you anyway, forward or back.  Or, it will simply quit.  And let you do it.  At which point...What happens in your situation?  You shut down your high-volume production machine because you lost access to your timesource?

Plus, this is completely missing the point.  It's not about whether or not slewing is good.  It's about choosing between:

	* (A) slew beyond 128ms drift

	* (B) using a kernel discipline

The issue is, if you care about timekeeping (Oracle default installs don't give a flying crap), you don't let your clock drift more than (and I'm averaging here), 43 minutes/day.  Why 43?  NTP already keeps monotonically increasing time by slewing single deltas less than 128ms--and that all happens without -x.  43 minutes is simply the aggregate of the total number of 128ms drifts that NTP can correct BY DEFAULT (i.e., without -x) in a given day.  The arithmetic--if you accept the fact that the "typical Unix slew rate is limited to 0.5 ms/s", a 128ms drift will take 256 seconds to amortize.  So, if you lose less 128ms every 256 seconds, that's fine, because THE DEFAULT SLEWING WILL TAKE CARE OF YOU.  And, 128ms every 256 seconds totals to 43 seconds per day.  And, up to that amount of drift, the default slew will take care of it.

There is an exception, which is where you get single drifts in a polling interval past 128ms.  The default maximum polling interval is 1024 s.  Which means your clock would have to have a stability of less than 1 part in 8000.  Crystal clocks themselves have accuracies specified in PPMs, and the error is caused mostly by temperature and electronic variances (ambient temperature and power supply).  If you want to see clock skew, chain 2 UPSes, and run your PC off that.

	So, again: -x is not a a guarantee...

	* ...and, it's trading off kernel discipline...

	* ...in a situation that probably never needs -x.

The argument that "slew is good, we want it always" is...completely backwards.  If you care about high-precision slewing, you'd want it in the kernel and you would look into things like the nanokernel patch, etc.  Which means, you definitely want kernel discipline.  And if you *need* -x, what you actually need is a better motherboard and better environmental controls, since temperature and power have direct effects on the clock's error.

So, getting back to your RAC system...Sure, it can check for it.  But let's hope your database app doesn't stop operating when you can't find a timesource.  True high-volume systems that require absolutely monotonic time don't mess around with NTP as a dependency for their database--they use NTP only to condition the system's wallclock.  They can use the POSIX methods clock_gettime(2) with CLOCK_MONOTONIC* clocksources.  That's what stuff like the 1003.b real-time specs are for.  They might pin themselves to a CPU and reach down and access the hardware clocks (TSC, HPET) to get a monontonic timestamp which they know will be increasing.  Or, they simply create one actual monotonic timesource, and access its time.  Sure, you might lose accuracy w.r.t walltime if that's a bottleneck, or you might lose some performance.  But, when order matters...it matters.

Monotonic clocks are wonderful.  And, if you cared about monotonicity, you might look into one of the monotonic timezones.

But, frankly, it's not the 90% use-case of NTP, which is to keep as-good-as-possible wallclock time.  NTP keeps UTC time--which supports stuff like leap seconds.  Even UTC doesn't give a rat's ass about monotonic time.  And, that has nothing to do with highly order-depedent application stacks.  Think about a high-availability database system, which falls over to another physical machine when the first stops working.  Oracle (IDK about their RAC sub-product) certain supports physical clustering.  How much would you trust your, let's say mutual fund company, to a system that can migrate not just to another core, or CPU, but to another physical machine?  You want -x slewed timestamps to protect the ordering of events?  That's fairly...trusting.

Back when I was using Oracle, they made plenty of demands about wanting the tablespaces to be on raw devices.  Sure, that was about disks--but, in the context of absolutely order-dependent time application, I'm sure that wouldn't stop them from making demands--i.e., setting constraints--on CPUs and virtualization (e.g., needing to pin processes to CPUs, having access to the RDTSC family of instructions, etc)...if the consultants you hire knew anything about time.  In actual order-dependent systems, they don't care about wall-time.  Or, really, even, slewing.  Just causality.  Being able to pin ticks to timestamps is secondary, because most of where that matters is in human interpretation.  In those situations, order matters first, and the exact mapping to wallclock is secondary (which means, in most of those situations, millisecond-level error won't matter--when was the last time you got millisecond timestamps on your investment statements?).

To go even further, it has been the source of some legal disputes (about exactly when certain transactions have occurred--I think the big cases are in Europe).  But, it's safe to say that the legislature hasn't caught up.  If you're concerned about the situation where you had to argue that your system uses a monotonic clock for transactions, but NTP for wallclock, and they may have disagreed by several microseconds, you have issues that transcend "ntpd -x".


More information about the blfs-dev mailing list