InkdownInkdown
Start writing

Arpit Bhayani Blogs

336 files·168 subfolders

Shared Workspace

Arpit Bhayani Blogs
001 Ai Topological Sort

015-clock-sync-nightmare

Shared from "Arpit Bhayani Blogs" on Inkdown

Clock Synchronization Is a Nightmare

Source: https://arpitbhayani.me/blogs/clock-sync-nightmare Date: 2025-12-23

Time seems simple. But we engineers lose sleep over something as basic as keeping clocks in sync. Here’s why…


Time seems simple. But we engineers lose sleep over something as basic as keeping clocks in sync. Here’s why…

The answer lies in this one simple statement - there is no global clock. When you have thousands of machines spread across data centers, continents, and time zones, each operating independently, the simple question of “what time is it?” becomes surprisingly complex.

Clock synchronization sits at the core of some of the most challenging problems in distributed systems, affecting everything from database consistency to debugging to financial transactions.

Let’s dig deeper…

001-ai-topological-sort.md
tldr.md
002 Temporal Primer
002-temporal-primer.md
tldr.md
003 Rag Production
003-rag-production.md
tldr.md
004 Structure Of Llm Chat
004-structure-of-llm-chat.md
tldr.md
005 How Llms Work
005-how-llms-work.md
tldr.md
006 Monolith Is Distributed System
006-monolith-is-distributed-system.md
tldr.md
007 Defensive Databases
007-defensive-databases.md
tldr.md
008 Bm25
008-bm25.md
tldr.md
009 Join Algorithms
009-join-algorithms.md
tldr.md
010 Venting At Work
010-venting-at-work.md
tldr.md
011 Half Life
011-half-life.md
tldr.md
012 Multi Paxos
012-multi-paxos.md
tldr.md
013 Mysql Replication Internals
013-mysql-replication-internals.md
tldr.md
014 Bloom Filters
014-bloom-filters.md
tldr.md
015 Clock Sync Nightmare
015-clock-sync-nightmare.md
tldr.md
016 Kafka Partitions
016-kafka-partitions.md
tldr.md
017 Product Quantization
017-product-quantization.md
tldr.md
018 Qkv Matrices
018-qkv-matrices.md
tldr.md
019 Deleted Production
019-deleted-production.md
tldr.md
020 How Llm Inference Works
020-how-llm-inference-works.md
tldr.md
021 Blocking Queues
021-blocking-queues.md
tldr.md
022 Heartbeats In Distributed Systems
022-heartbeats-in-distributed-systems.md
tldr.md
023 Cassandra Writes
023-cassandra-writes.md
tldr.md
024 Redis Replication
024-redis-replication.md
tldr.md
025 Arrogant People At Work
025-arrogant-people-at-work.md
tldr.md
026 Cdn Content Replication
026-cdn-content-replication.md
tldr.md
027 Cant Fix Everything Day One
027-cant-fix-everything-day-one.md
tldr.md
028 Emotions At Work
028-emotions-at-work.md
tldr.md
029 Grpc Http2
029-grpc-http2.md
tldr.md
030 Meetings With No Agenda Are A Waste Of Time
030-meetings-with-no-agenda-are-a-waste-of-time.md
tldr.md
031 Growth Is Not About Doing Everything
031-growth-is-not-about-doing-everything.md
tldr.md
032 Career Longevity Vs Job Hopping
032-career-longevity-vs-job-hopping.md
tldr.md
033 Stay Relevant At Higher Salary Levels
033-stay-relevant-at-higher-salary-levels.md
tldr.md
034 Why Consensus
034-why-consensus.md
tldr.md
035 Database Deadlocks
035-database-deadlocks.md
tldr.md
036 Cpu Cache Locality
036-cpu-cache-locality.md
tldr.md
037 Eventual Consistency
037-eventual-consistency.md
tldr.md
038 Dns Udp Tcp
038-dns-udp-tcp.md
tldr.md
039 Masters
039-masters.md
tldr.md
040 Empathy Makes Great Engineers Unstoppable
040-empathy-makes-great-engineers-unstoppable.md
tldr.md
041 Good Mentors Build People
041-good-mentors-build-people.md
tldr.md
042 Always Have Back Burner Projects
042-always-have-back-burner-projects.md
tldr.md
043 Before You Push Back Know What Youre Standing On
043-before-you-push-back-know-what-youre-standing-on.md
tldr.md
044 Be The One They Can Count On
044-be-the-one-they-can-count-on.md
tldr.md
045 How Much People Bet On You
045-how-much-people-bet-on-you.md
tldr.md
046 How To Get Leadership To Say Yes To Your Project
046-how-to-get-leadership-to-say-yes-to-your-project.md
tldr.md
047 Dont Let Your Best Ideas Die In Silence
047-dont-let-your-best-ideas-die-in-silence.md
tldr.md
048 Be Someone Others Want To Work With
048-be-someone-others-want-to-work-with.md
tldr.md
049 Dont Fall For Xy Problem Ask Right Questions
049-dont-fall-for-xy-problem-ask-right-questions.md
tldr.md
050 Biggest Lie Startups Tell Engineers
050-biggest-lie-startups-tell-engineers.md
tldr.md
051 Promotions Are Proactive Not Reactive
051-promotions-are-proactive-not-reactive.md
tldr.md
052 Not Enough To Be Right Learn To Be Heard
052-not-enough-to-be-right-learn-to-be-heard.md
tldr.md
053 No One Ships Alone
053-no-one-ships-alone.md
tldr.md
054 Not Every Mistake Needs A Correction
054-not-every-mistake-needs-a-correction.md
tldr.md
055 Build Influence At Work
055-build-influence-at-work.md
tldr.md
056 Your Soft Skills Arent Soft At All
056-your-soft-skills-arent-soft-at-all.md
tldr.md
057 Experience Before Forming Opinion
057-experience-before-forming-opinion.md
tldr.md
058 Curiosity And High Bias For Action
058-curiosity-and-high-bias-for-action.md
tldr.md
059 Worklog
059-worklog.md
tldr.md
060 Mistakes And Growth
060-mistakes-and-growth.md
tldr.md
061 Own It Instead Of Sweeping It Aside
061-own-it-instead-of-sweeping-it-aside.md
tldr.md
062 Dont Wait Step Up
062-dont-wait-step-up.md
tldr.md
063 Temporary Fix Is Permanent
063-temporary-fix-is-permanent.md
tldr.md
064 Interview Bias And What Sets You Apart
064-interview-bias-and-what-sets-you-apart.md
tldr.md
065 Saying This Isnt My Problem Is A Problem
065-saying-this-isnt-my-problem-is-a-problem.md
tldr.md
066 Okr
066-okr.md
tldr.md
067 Miscommunication
067-miscommunication.md
tldr.md
068 When In Doubt Code It Out
068-when-in-doubt-code-it-out.md
tldr.md
069 Follow Up Without Annoying People
069-follow-up-without-annoying-people.md
tldr.md
070 Lead Projects That Land
070-lead-projects-that-land.md
tldr.md
071 Abstract Thinking Skill Next Decade
071-abstract-thinking-skill-next-decade.md
tldr.md
072 We Engineers Suck At Task Estimation
072-we-engineers-suck-at-task-estimation.md
tldr.md
073 Shiny Object Syndrome In Tech
073-shiny-object-syndrome-in-tech.md
tldr.md
074 3p
074-3p.md
tldr.md
075 Leverage The Equilibrium
075-leverage-the-equilibrium.md
tldr.md
076 On Demand Container Loading In Aws Lambda
076-on-demand-container-loading-in-aws-lambda.md
tldr.md
077 Sql Has Problems We Can Fix Them Pipe Syntax In Sql
077-sql-has-problems-we-can-fix-them-pipe-syntax-in-sql.md
tldr.md
078 Nanolog A Nanosecond Scale Logging System
078-nanolog-a-nanosecond-scale-logging-system.md
tldr.md
079 Best Resource Is Mythical
079-best-resource-is-mythical.md
tldr.md
080 Wtf The Who To Follow Service At Twitter
080-wtf-the-who-to-follow-service-at-twitter.md
tldr.md
081 Know A Lot
081-know-a-lot.md
tldr.md
082 Out Of Syllabus
082-out-of-syllabus.md
tldr.md
083 Negotiate The Offer
083-negotiate-the-offer.md
tldr.md
084 Never Bad Mouth Your Ex Exployer
084-never-bad-mouth-your-ex-exployer.md
tldr.md
085 Culture Fit
085-culture-fit.md
tldr.md
086 Quantification In Resume
086-quantification-in-resume.md
tldr.md
087 Hiring Is Unfair
087-hiring-is-unfair.md
tldr.md
088 Questions For Interviewers
088-questions-for-interviewers.md
tldr.md
089 Collaboration Communication
089-collaboration-communication.md
tldr.md
090 Out Of Vicious Interview Cycle
090-out-of-vicious-interview-cycle.md
tldr.md
091 Pitch Projects Not Ideas
091-pitch-projects-not-ideas.md
tldr.md
092 Read Design Docs
092-read-design-docs.md
tldr.md
093 Read Rca Docs
093-read-rca-docs.md
tldr.md
094 Start Generalist
094-start-generalist.md
tldr.md
095 Do Not Rely On Summaries
095-do-not-rely-on-summaries.md
tldr.md
096 Structure Your Design Interviews
096-structure-your-design-interviews.md
tldr.md
097 Title Inflation
097-title-inflation.md
tldr.md
098 Find Your Own Project
098-find-your-own-project.md
tldr.md
099 Six Pointers To Crack Coding And Design Interviews
099-six-pointers-to-crack-coding-and-design-interviews.md
tldr.md
100 Keep Yourself Unblocked
100-keep-yourself-unblocked.md
tldr.md
101 Genetic Knapsack
101-genetic-knapsack.md
tldr.md
102 Pseudorandom Number Generation Lfsr
102-pseudorandom-number-generation-lfsr.md
tldr.md
103 How Indexes Work On Partitioned And Sharded Data
103-how-indexes-work-on-partitioned-and-sharded-data.md
tldr.md
104 Some Data Partitioning Strategies For Distributed Data Stores
104-some-data-partitioning-strategies-for-distributed-data-stores.md
tldr.md
105 Data Partitioning
105-data-partitioning.md
tldr.md
106 Leaderless Replication
106-leaderless-replication.md
tldr.md
107 Conflict Resolution
107-conflict-resolution.md
tldr.md
108 Conflict Detection
108-conflict-detection.md
tldr.md
109 Multi Master Replication
109-multi-master-replication.md
tldr.md
110 Monotonic Reads
110-monotonic-reads.md
tldr.md
111 Read Your Write Consistency
111-read-your-write-consistency.md
tldr.md
112 Handling Outages Master Replica
112-handling-outages-master-replica.md
tldr.md
113 Replication Formats
113-replication-formats.md
tldr.md
114 Replication Strategies
114-replication-strategies.md
tldr.md
115 Master Replica Replication
115-master-replica-replication.md
tldr.md
116 Durability
116-durability.md
tldr.md
117 Isolation
117-isolation.md
tldr.md
118 Atomicity
118-atomicity.md
tldr.md
119 Consistency
119-consistency.md
tldr.md
120 Architectures In Distributed Systems
120-architectures-in-distributed-systems.md
tldr.md
121 Mistaken Beliefs Of Distributed Systems
121-mistaken-beliefs-of-distributed-systems.md
tldr.md
122 Fork Bomb
122-fork-bomb.md
tldr.md
123 Chained Operators Python
123-chained-operators-python.md
tldr.md
124 Taxonomy On Sql
124-taxonomy-on-sql.md
tldr.md
125 The Weird Walrus
125-the-weird-walrus.md
tldr.md
126 Fully Persistent Arrays
126-fully-persistent-arrays.md
tldr.md
127 Persistent Data Structures Introduction
127-persistent-data-structures-introduction.md
tldr.md
128 Constant Folding Python
128-constant-folding-python.md
tldr.md
129 String Interning Python
129-string-interning-python.md
tldr.md
130 Recursion Visualizer Python
130-recursion-visualizer-python.md
tldr.md
131 Flajolet Martin
131-flajolet-martin.md
tldr.md
132 2q Cache
132-2q-cache.md
tldr.md
133 Israeli Queues
133-israeli-queues.md
tldr.md
134 1d Terrain
134-1d-terrain.md
tldr.md
135 Jaccard Minhash
135-jaccard-minhash.md
tldr.md
136 Ts Smoothing
136-ts-smoothing.md
tldr.md
137 Lfu
137-lfu.md
tldr.md
138 Morris Counter
138-morris-counter.md
tldr.md
139 Slowsort
139-slowsort.md
tldr.md
140 Bitcask
140-bitcask.md
tldr.md
141 Phi Accrual
141-phi-accrual.md
tldr.md
142 10x Engineer
142-10x-engineer.md
tldr.md
143 Decipher Repeated Key Xor
143-decipher-repeated-key-xor.md
tldr.md
144 Decipher Single Xor
144-decipher-single-xor.md
tldr.md
145 Python Iterable Integers
145-python-iterable-integers.md
tldr.md
146 Inheritance C
146-inheritance-c.md
tldr.md
147 Rum
147-rum.md
tldr.md
148 Consistent Hashing
148-consistent-hashing.md
tldr.md
149 Python Caches Integers
149-python-caches-integers.md
tldr.md
150 Fractional Cascading
150-fractional-cascading.md
tldr.md
151 Copy On Write
151-copy-on-write.md
tldr.md
152 Midpoint Insertion Caching Strategy
152-midpoint-insertion-caching-strategy.md
tldr.md
153 Fsm Python
153-fsm-python.md
tldr.md
154 Bayesian Average
154-bayesian-average.md
tldr.md
155 Sliding Window Ratelimiter
155-sliding-window-ratelimiter.md
tldr.md
156 Idf
156-idf.md
tldr.md
157 Better Programmer
157-better-programmer.md
tldr.md
158 Python Prompts
158-python-prompts.md
tldr.md
159 Rule 30 Cellular Automata
159-rule-30-cellular-automata.md
tldr.md
160 Function Overloading
160-function-overloading.md
tldr.md
161 Isolation Forest
161-isolation-forest.md
tldr.md
162 Image Steganography
162-image-steganography.md
tldr.md
163 Long Integers Python
163-long-integers-python.md
tldr.md
164 I Changed My Python
164-i-changed-my-python.md
tldr.md
165 Benchmark And Compare Pagination Approach In Mongodb
165-benchmark-and-compare-pagination-approach-in-mongodb.md
tldr.md
166 Mongodb Cursor Skip Is Slow
166-mongodb-cursor-skip-is-slow.md
tldr.md
167 Fast And Efficient Pagination In Mongodb
167-fast-and-efficient-pagination-in-mongodb.md
tldr.md
168 Making Http Requests Using Netcat
168-making-http-requests-using-netcat.md
tldr.md

The Illusion of Accurate Time

Every computer has an internal clock, typically driven by a quartz crystal oscillator. These oscillators work by vibrating at a specific frequency when voltage is applied. The standard frequency for most computer clocks is 32768 Hz, chosen because it is a power of two and makes counting down to one second straightforward.

The catch: quartz crystals are not perfect. Their oscillation frequency varies based on many factors; here are a few…

Temperature is the biggest culprit. Standard quartz crystals exhibit frequency drift in the tens of parts per million when temperature changes. A temperature deviation of ~10 degrees Celsius can cause drift equivalent to about 110 seconds per year. The crystal vibrates faster or slower depending on ambient temperature, and data center environments are not perfectly controlled.

Another culprit is manufacturing variation. No two crystals are identical. Even crystals from the same production batch will have slightly different characteristics. Aging compounds this problem as crystals change properties over time.

The result is that two computers started at exactly the same time, never communicating with each other, will inevitably drift apart. After just one day, they might differ by hundreds of milliseconds. After a month, they could be seconds apart.

Why Clock Skew Breaks Things

Clock skew = the difference in time between two clocks at any given instant. Clock drift = the rate at which clocks diverge over time. Both cause serious problems in distributed systems.

Consider a simple example with a distributed make system. You edit a source file on your client machine, which has a clock slightly behind the server where the compiled object file lives. When make runs, it compares timestamps. If the server clock is ahead, the object file appears newer than the source file you just edited, and make does not recompile. Your changes silently disappear from the build.

Plain text

Database systems face even more critical timestamp issues. When two transactions happen at nearly the same time on different nodes, the database must determine which happened first. If clocks are out of sync, the database might order them incorrectly, violating consistency guarantees.

Imagine a banking system where a customer deposits money at one branch (Node A) and immediately withdraws at another branch (Node B). If Node B clock is behind Node A, the withdrawal transaction might get a timestamp earlier than the deposit. A snapshot read at the wrong time could show the withdrawal but not the deposit, making it appear the customer withdrew money they did not have.

Logging and debugging become nearly impossible when clocks disagree. Distributed tracing relies on timestamps to reconstruct the sequence of events across services. When clocks are skewed, the resulting traces show impossible sequences where effects appear before causes.

Physical Clock Synchronization

The simplest approach to clock synchronization is to periodically query a trusted time server and adjust local clocks accordingly. Let’s look at different algorithms and approaches based on this…

Cristian Algorithm

Cristian algorithm, proposed in 1989, works with a centralized time server assumed to have accurate time. A client requests the time, the server responds with its current time, and the client adjusts.

The challenge is network delay. By the time the response arrives, the server time is stale. Cristian algorithm estimates the one way delay as half the round trip time.

Plain text

This works reasonably well when network delays are symmetric, meaning request and response take the same time. In practice, delays are often asymmetric due to different routing paths, varying network congestion, and processing delays.

Berkeley Algorithm

The Berkeley algorithm takes a different approach, assuming no single machine has an accurate time. Instead, it uses consensus among multiple machines.

A designated time daemon periodically polls all machines for their clock values. It computes the average, discards outliers, and tells each machine how much to adjust. Rather than sending absolute times, which would suffer from network delays, it sends relative adjustments.

Plain text

A critical detail: computers should never jump their clocks backward. Doing so violates the assumption of monotonic time that many algorithms depend on. Instead of rewinding, the Berkeley algorithm slows clocks gradually to let them catch up.

Network Time Protocol

NTP uses a hierarchical system of time servers organized into strata.

Stratum 0 devices are high precision time sources like atomic clocks and GPS receivers. Stratum 1 servers connect directly to stratum 0 sources. Each lower stratum synchronizes with the level above, with stratum numbers increasing up to 15.

NTP can typically maintain time within tens of milliseconds over the public internet and can achieve sub-millisecond accuracy on local area networks. However, several factors limit its precision.

Plain text

Network asymmetry is particularly problematic. If the path from client to server differs from server to client, the assumption that one way delay equals half the round-trip breaks down. Satellite links where uplink and downlink have different latencies are a classic example.

Operating system overhead adds uncertainty. When an NTP packet arrives, it passes through the network stack, gets timestamped by the kernel, and eventually reaches the NTP daemon. Each step introduces variable delays measured in microseconds.

Milliseconds Are Not Enough

For many applications, NTP accuracy is sufficient. Web servers, file systems, and most business applications tolerate clocks being tens of milliseconds apart. But some domains demand much tighter synchronization.

Financial trading systems measure latency in microseconds. A trade timestamped incorrectly by even a few milliseconds can have significant legal and financial implications. High-frequency trading strategies depend on knowing the precise order of events.

Telecommunications systems require synchronization for TDM (Time Division Multiplexing) where different users share a channel by taking turns. If timing drifts, transmissions from different users collide.

Scientific experiments, particularly in physics, need nanosecond precision to correlate measurements across instruments.

Precision Time Protocol

PTP, defined by IEEE 1588, achieves sub-microsecond accuracy by using hardware timestamping. Instead of the operating system recording when a packet arrived, specialized network interface cards timestamp packets as they cross the wire, eliminating software delays.

Plain text

PTP requires support throughout the network path. Switches must be PTP aware, acting as boundary clocks that maintain synchronization hop by hop. This makes PTP expensive to deploy but essential for applications requiring nanosecond precision.

Meta announced in 2022 that they were migrating from NTP to PTP across their data centers. The investment in PTP infrastructure paid off in reduced errors and better debugging capability.

Logical clocks and causality

Lamport introduced the concept of logical clocks based on a simple observation: if two events are causally related, we should be able to order them. If event A sends a message that event B receives, A happened before B. If both events happen on the same process, the earlier one happens before the later one.

Events that are not connected by any chain of causality are concurrent. They could have happened in either order, and from the system’s perspective, there is no meaningful way to distinguish.

Lamport Timestamps

Lamport timestamps implement this intuition with a simple algorithm. Each process maintains a counter. Before any event, increment the counter. When sending a message, include the counter value. When receiving a message, set your counter to the maximum of your current value and the received value, then increment.

Plain text

If event A has a lower Lamport timestamp than event B, we know one of two things: either A happened before B, or they are concurrent. The converse is guaranteed: if A happened before B, then A has a lower timestamp than B.

Plain text

The limitation is that Lamport timestamps cannot tell you if two events are concurrent. Events with timestamps 5 and 7 might be causally related or might have happened independently on different processes with no communication between them.

Vector Clocks

Vector clocks extend Lamport timestamps to capture full causality information. Instead of a single counter, each process maintains a vector with an entry for every process in the system.

Plain text

Comparing two vector clocks tells you precisely whether one event happened before another or whether they are concurrent. If every entry in VC1 is less than or equal to the corresponding entry in VC2, and at least one is strictly less, then VC1 happened before VC2.

Plain text

The downside of vector clocks is space overhead. With N processes, each timestamp requires O(N) space. For large distributed systems with thousands of nodes, this becomes impractical. Various optimizations exist, including compressed vector clocks and interval tree clocks, but the fundamental scaling challenge remains.

Google Spanner and TrueTime

Google faced the clock synchronization problem at an unprecedented scale with Spanner, its globally distributed database. They needed strong consistency guarantees across data centers spanning continents, which requires knowing the order of transactions.

Here’s a video of me explaining this.

Their solution was TrueTime, a globally distributed clock infrastructure that provides time with bounded uncertainty.

TrueTime uses two types of time sources in every data center. GPS receivers get time directly from satellites. Atomic clocks provide backup and cross-validation. Using both provides resilience since they have independent failure modes. GPS can fail due to antenna problems or signal interference. Atomic clocks can drift, but are not affected by the same issues.

The key innovation is that TrueTime does not return a single timestamp. It returns an interval guaranteed to contain the true time.

Plain text

When a Spanner transaction commits, it gets a timestamp. Before reporting the commit to the client, Spanner waits until TT.after(commit_timestamp) returns true. This “commit wait” ensures that any subsequent transaction starting after the wait will see a later timestamp.

Plain text

The waiting period is typically around 7 milliseconds in the worst case. This seems like a significant latency cost, but it is actually overlapped with other commit processing, and the benefit of strong consistency is worth the cost for Spanner’s use cases.

Hybrid Logical Clocks

Not everyone has Google’s resources to deploy atomic clocks in every data center. CockroachDB, inspired by Spanner but designed to run on commodity hardware, uses hybrid logical clocks (HLC).

Again, I have a podcast with their Chief Architect, Ben Darnell. We discussed this at length, give it a watch.

HLC combines physical time with a logical component. The physical part stays close to wall clock time. The logical part handles cases where multiple events happen within the same physical clock tick or when clock skew causes issues.

Plain text

HLC maintains the property that timestamps are closely related to physical time, making debugging easier. You can look at an HLC timestamp and know approximately when an event occurred. The logical component ensures that causally related events are always ordered correctly, even if physical clocks skew.

Plain text

CockroachDB uses HLC with a configurable maximum clock offset, typically 500 milliseconds for local deployments. If a node detects its clock is too far off from others, it removes itself from the cluster rather than risk consistency violations.

YugabyteDB similarly uses HLC and recently integrated with AWS Time Sync Service for tighter synchronization. Their benchmarks showed up to 3x reduction in transaction latency when using precision time sources because smaller uncertainty bounds mean less waiting during transactions.

How do You Pick One Over the Other

Choosing synchronization granularity

What level of synchronization do you actually need? Many systems do fine with NTP providing millisecond accuracy. If your transactions take tens of milliseconds anyway, submillisecond clock skew is irrelevant.

If you need stronger ordering guarantees, consider whether logical clocks might suffice. Lamport timestamps are trivial to implement and have zero space overhead beyond a single integer per message.

For databases requiring external consistency or systems where physical time matters (audit logs, financial records), invest in better synchronization infrastructure. PTP hardware, dedicated time servers, and careful network design can get you to microseconds.

Handling clock anomalies

Clocks can jump backward. NTP might suddenly correct a clock that drifted significantly. Virtual machine migrations can cause clock discontinuities. Operating system bugs can cause time to jump.

Robust systems detect these anomalies. Monitor the rate of clock change and alert on sudden jumps. Never use raw system time for critical ordering; use a clock abstraction that can handle anomalies.

Plain text
Dealing with Leap Seconds

Leap seconds are inserted occasionally to keep UTC aligned with Earth’s rotation. A minute with a leap second has 61 seconds. Many systems handle this poorly.

Plain text

Google and AWS use “leap smearing” where the leap second is spread over a longer period, making each second slightly longer or shorter. This avoids the discontinuity but means your clock is not precisely UTC during the smear period.

The good news is that the International Bureau of Weights and Measures has decided to stop adding leap seconds by 2035, eliminating this problem for future systems.

The fundamental tradeoff

Clock synchronization in distributed systems always involves a tradeoff between accuracy, latency, and complexity.

Tighter synchronization requires either better hardware (atomic clocks, GPS, PTP-capable switches) or more communication overhead (frequent sync messages, consensus protocols). Both have costs.

Looser synchronization is cheaper but means your timestamps are less reliable. You might need to build larger safety margins into your algorithms or accept weaker ordering guarantees.

Logical clocks sidestep physical time entirely but require passing clock information with every message and cannot tell you when something actually happened.

There is no universal right answer. The appropriate solution depends on your requirements, budget, and tolerance for complexity.

Footnote

Clock synchronization is an interesting and difficult problem, and the solution ranges from simple NTP to Google’s atomic clock infrastructure for global databases.

It is important to ack that perfect synchronization is impossible, and hence we should choose appropriate bounds and trade-offs while picking one over the other.