 from Alice's Adventures in Wonderland, Lewis Carroll
from Alice's Adventures in Wonderland, Lewis Carroll
Listen carefully to what I say; it is very complicated.
In order to provide robust backup sources, primary (stratum-1) servers are usually operated in a diversity configuration, in which the server operates with a number of remote servers in addition to one or more radio or modem clocks. In these configurations the suite of algorithms used in NTP to refine the data from each peer separately and to select and combine the data from a number of servers and clocks. As the result of these algorithms, a set of survivors are identified which can presumably provide the most reliable and accurate time. Ordinarily, the individual clock offsets of the survivors are combined on a weighted average basis to produce an offset used to control the system clock.
However, because of small but significant systematic time offsets between the survivors, it is in general not possible to achieve the lowest jitter and highest stability in these configurations. This happens because the selection algorithm tends to clockhop between survivors of substantially the same quality, but showing small systematic offsets between them. In addition, there are a number of configurations involving pulse-per-second (PPS) signals, modem backup services and other special cases, so that a set of mitigation rules becomes necessary to select a single peer from among the survivors. These rules are based on a set of special characteristics of the various remote servers and reference clock drivers specified in the configuration file.
The prefer scheme works on the set of peers that have survived the sanity checks and intersection algorithms of the clock selection procedures. Ordinarily, the members of this set can be considered truechimers and any one of them could in principle provide correct time; however, due to various error contributions, not all can provide the most accurate and stable time. The job of the clustering algorithm, which is invoked at this point, is to select the best subset of the survivors providing the least variance in the combined ensemble average, compared to the variance in each member of the subset separately. The detailed operation of the clustering algorithm, which is given in the RFC-1305, is beyond the scope of discussion here. It operates in rounds, where a survivor, presumably the worst of the lot, is discarded in each round until one of several termination conditions is met. An example terminating condition is when the number of survivors is about to be reduced below three.
In the prefer scheme the clustering algorithm is modified so that the prefer peer is never discarded; on the contrary, its potential removal becomes a termination condition. If the original algorithm were about to toss out the prefer peer, the algorithm terminates immediately. The prefer peer can still be discarded by the sanity checks and intersection algorithms, of course, but it will always survive the clustering algorithm. If it does not survive or for some reason it fails to provide updates, it will eventually become unreachable and the clock selection will remitigate to select the next best source.
Along with this behavior, the clock selection procedures are modified so that the combining algorithm is not used when a prefer peer is present. Instead, the offset of the prefer peer is used exclusively as the synchronization source. In the usual case involving a radio clock and a flock of remote stratum-1 peers, and with the radio clock designated a prefer peer, the result is that the high quality radio time disciplines the server clock as long as the radio itself remains operational and with valid time, as determined from the remote peers, sanity checks and intersection algorithm.
Reference clock drivers operate in the manner described in the Reference Clock Drivers page and its dependencies. The drivers are ordinarily operated at stratum zero, so that as the result of ordinary NTP operations, the server itself operates at stratum one, as required by the NTP specification. In some cases described below, the driver is intentionally operated at an elevated stratum, so that it will be selected only if no other survivor is present with a lower stratum. In the case of the PPS peer or kernel time discipline, these sources appear active only if the prefer peer has survived the intersection and clustering algorithms, as described below, and its clock offset relative to the current local clock is less than a specified value, currently 128 ms.
The modem clock drivers are a special case. Ordinarily, the update interval between modem calls to synchronize the system clock is many times longer than the interval between polls of either a remote server or local radio clock. In order to provide the best stability, the operation of the clock discipline algorithm changes gradually from a phase-lock mode at the shorter update intervals to a frequency-lock mode at the longer update intervals. If remote servers or local radio clocks together with a modem peer operate in the same client, the following things can happen.
First the clock selection algorithm can select one or more remote servers or radio clocks and the clock discipline algorithm will optimize for the shorter update intervals. Then, the selection algorithm can select the modem peer, which requires a much different optimization. The intent in the design is to allow the modem peer to control the system clock either when no other source is available or, if the modem peer happens to be marked as prefer, then it always controls the clock, as long as it passes the sanity checks and intersection algorithm. There still is room for suboptimal operation in this scheme, since a noise spike can still cause a clockhop either way. Nevertheless, the optimization function is slow to adapt, so that a clockhop or two does not cause much harm.
The local clock driver is another special case. Normally, this driver is eligible for selection only if no other source is available. When selected, vernier adjustments introduced via the configuration file or remotely using the ntpdc program can be used to trim the local clock frequency and time. However, if the local clock driver is designated the prefer peer, this driver is always selected and all other sources are ignored. This behavior is intended for use when the kernel time is controlled by some means external to NTP, such as the NIST lockclock algorithm or another time synchronization protocol such as DTSS. In this case the only way to disable the local clock driver is to mark it unsynchronized using the leap indicator bits. In the case of modified kernels with the ntp_adjtime() system call, this can be done automatically if the external synchronization protocol uses it to discipline the kernel time.
The clustering algorithm repeatedly discards outlyers in order to reduce the residual jitter in the survivor population. As required by the NTP specification, these operations continue until either a specified minimum number of survivors remain or the minimum select dispersion of the population is greater than the maximum peer dispersion of any member. The mitigation rules require an additional terminating condition which stops these operations at the point where the prefer peer is about to be discarded.
The mitigation rules establish the choice of system peer, which determine the stratum, reference identifier and several other system variables which are visible to clients of the local server. In addition, they establish which source or combination of sources control the local clock.
First, it should be pointed out that the PPS signal is inherently ambiguous, in that it provides a precise seconds epoch, but does not provide a way to number the seconds. In principle and most commonly, another source of synchronization, either the timecode from an associated radio clock, or even one or more remote NTP servers, is available to perform that function. In all cases, a specific, configured peer or server must be designated as associated with the PPS signal. This is done using the prefer keyword as described previously. The PPS signal can be associated in this way with any peer, but is most commonly used with the radio clock generating the PPS signal.
The PPS signal can be used in two ways to discipline the local clock, one using a special PPS driver described in the PPS Clock Discipline page, the other using PPS signal support in the kernel, as described in the A Kernel Model for Precision Timekeeping page. In either case, the signal must be present and within nominal jitter and wander error tolerances. In addition, the associated prefer peer must have survived the sanity checks and intersection algorithms and the dispersion settled below 1 s. This insures that the radio clock hardware is operating correctly and that, presumably, the PPS signal is operating correctly as well. Second, the absolute offset of the local clock from that peer must be less than 128 ms, or well within the 0.5-s unambiguous range of the PPS signal itself. In the case of the PPS driver, the time offsets generated from the PPS signal are propagated via the clock filter to the clock selection procedures just like any other peer. Should these pass the sanity checks and intersection algorithms, they will show up along with the offsets of the prefer peer itself. Note that, unlike the prefer peer, the PPS peer samples are not protected from discard by the clustering algorithm. These complicated procedures insure that the PPS offsets developed in this way are the most accurate, reliable available for synchronization.
The PPS peer remains active as long as it survives the intersection algorithm and the prefer peer is reachable; however, like any other clock driver, it runs a reachability algorithm on the PPS signal itself. If for some reason the signal fails or displays gross errors, the PPS peer will either become unreachable or stray out of the survivor population. In this case the clock selection remitigates as described above.
When kernel support for the PPS signal is available, the PPS signal is interfaced to the kernel serial driver code via a modem control lead. As the PPS signal is derived from external equipment, cables, etc., which sometimes fail, a good deal of error checking is done in the kernel to detect signal failure and excessive noise. The way in which the mitigation rules affect the kernel discipline is as follows.
PPS support requires the PPS driver (type 22) and PPSAPI interface described in the Pulse-per-second (PPS) Signal Interfacing> page. In order to operate, the prefer peer must be designated and the kernel support enabled by the enable pps command in the configuration file and the signal must be present and within nominal jitter and wander error tolerances. In the NTP daemon, the PPS discipline is active only when the prefer peer is among the survivors of the clustering algorithm, and its absolute offset is within 128 ms, as determined by the PPS driver. Under these conditions the kernel disregards updates produced by the NTP daemon and uses its internal PPS source instead. The kernel maintains a watchdog timer for the PPS signal; if the signal has not been heard or is out of tolerance for more than some interval, currently two minutes, the kernel discipline is declared inoperable and operation continues as if it were not present.
 David L. Mills <mills@udel.edu>
 David L. Mills <mills@udel.edu>