InkdownInkdown
Start writing

Arpit Bhayani Blogs

336 files·168 subfolders

Shared Workspace

Arpit Bhayani Blogs
001 Ai Topological Sort

150-fractional-cascading

Shared from "Arpit Bhayani Blogs" on Inkdown

Fractional Cascading - Speeding Up Multiple Binary Searches

Source: https://arpitbhayani.me/blogs/fractional-cascading Date: 2020-05-10

Explore Binary Search variations k-searches, unified search & fractional cascading. Optimize search time in k sorted lists!


Binary Search is an algorithm that finds the position of a target value in a sorted list. The algorithm exploits the fact that the list is sorted, and is devised such that is does not have to even look at all the n elements, to decide if a value is present or not. In the worst case, the algorithm checks the log(n) number of elements to make the decision.

001-ai-topological-sort.md
tldr.md
002 Temporal Primer
002-temporal-primer.md
tldr.md
003 Rag Production
003-rag-production.md
tldr.md
004 Structure Of Llm Chat
004-structure-of-llm-chat.md
tldr.md
005 How Llms Work
005-how-llms-work.md
tldr.md
006 Monolith Is Distributed System
006-monolith-is-distributed-system.md
tldr.md
007 Defensive Databases
007-defensive-databases.md
tldr.md
008 Bm25
008-bm25.md
tldr.md
009 Join Algorithms
009-join-algorithms.md
tldr.md
010 Venting At Work
010-venting-at-work.md
tldr.md
011 Half Life
011-half-life.md
tldr.md
012 Multi Paxos
012-multi-paxos.md
tldr.md
013 Mysql Replication Internals
013-mysql-replication-internals.md
tldr.md
014 Bloom Filters
014-bloom-filters.md
tldr.md
015 Clock Sync Nightmare
015-clock-sync-nightmare.md
tldr.md
016 Kafka Partitions
016-kafka-partitions.md
tldr.md
017 Product Quantization
017-product-quantization.md
tldr.md
018 Qkv Matrices
018-qkv-matrices.md
tldr.md
019 Deleted Production
019-deleted-production.md
tldr.md
020 How Llm Inference Works
020-how-llm-inference-works.md
tldr.md
021 Blocking Queues
021-blocking-queues.md
tldr.md
022 Heartbeats In Distributed Systems
022-heartbeats-in-distributed-systems.md
tldr.md
023 Cassandra Writes
023-cassandra-writes.md
tldr.md
024 Redis Replication
024-redis-replication.md
tldr.md
025 Arrogant People At Work
025-arrogant-people-at-work.md
tldr.md
026 Cdn Content Replication
026-cdn-content-replication.md
tldr.md
027 Cant Fix Everything Day One
027-cant-fix-everything-day-one.md
tldr.md
028 Emotions At Work
028-emotions-at-work.md
tldr.md
029 Grpc Http2
029-grpc-http2.md
tldr.md
030 Meetings With No Agenda Are A Waste Of Time
030-meetings-with-no-agenda-are-a-waste-of-time.md
tldr.md
031 Growth Is Not About Doing Everything
031-growth-is-not-about-doing-everything.md
tldr.md
032 Career Longevity Vs Job Hopping
032-career-longevity-vs-job-hopping.md
tldr.md
033 Stay Relevant At Higher Salary Levels
033-stay-relevant-at-higher-salary-levels.md
tldr.md
034 Why Consensus
034-why-consensus.md
tldr.md
035 Database Deadlocks
035-database-deadlocks.md
tldr.md
036 Cpu Cache Locality
036-cpu-cache-locality.md
tldr.md
037 Eventual Consistency
037-eventual-consistency.md
tldr.md
038 Dns Udp Tcp
038-dns-udp-tcp.md
tldr.md
039 Masters
039-masters.md
tldr.md
040 Empathy Makes Great Engineers Unstoppable
040-empathy-makes-great-engineers-unstoppable.md
tldr.md
041 Good Mentors Build People
041-good-mentors-build-people.md
tldr.md
042 Always Have Back Burner Projects
042-always-have-back-burner-projects.md
tldr.md
043 Before You Push Back Know What Youre Standing On
043-before-you-push-back-know-what-youre-standing-on.md
tldr.md
044 Be The One They Can Count On
044-be-the-one-they-can-count-on.md
tldr.md
045 How Much People Bet On You
045-how-much-people-bet-on-you.md
tldr.md
046 How To Get Leadership To Say Yes To Your Project
046-how-to-get-leadership-to-say-yes-to-your-project.md
tldr.md
047 Dont Let Your Best Ideas Die In Silence
047-dont-let-your-best-ideas-die-in-silence.md
tldr.md
048 Be Someone Others Want To Work With
048-be-someone-others-want-to-work-with.md
tldr.md
049 Dont Fall For Xy Problem Ask Right Questions
049-dont-fall-for-xy-problem-ask-right-questions.md
tldr.md
050 Biggest Lie Startups Tell Engineers
050-biggest-lie-startups-tell-engineers.md
tldr.md
051 Promotions Are Proactive Not Reactive
051-promotions-are-proactive-not-reactive.md
tldr.md
052 Not Enough To Be Right Learn To Be Heard
052-not-enough-to-be-right-learn-to-be-heard.md
tldr.md
053 No One Ships Alone
053-no-one-ships-alone.md
tldr.md
054 Not Every Mistake Needs A Correction
054-not-every-mistake-needs-a-correction.md
tldr.md
055 Build Influence At Work
055-build-influence-at-work.md
tldr.md
056 Your Soft Skills Arent Soft At All
056-your-soft-skills-arent-soft-at-all.md
tldr.md
057 Experience Before Forming Opinion
057-experience-before-forming-opinion.md
tldr.md
058 Curiosity And High Bias For Action
058-curiosity-and-high-bias-for-action.md
tldr.md
059 Worklog
059-worklog.md
tldr.md
060 Mistakes And Growth
060-mistakes-and-growth.md
tldr.md
061 Own It Instead Of Sweeping It Aside
061-own-it-instead-of-sweeping-it-aside.md
tldr.md
062 Dont Wait Step Up
062-dont-wait-step-up.md
tldr.md
063 Temporary Fix Is Permanent
063-temporary-fix-is-permanent.md
tldr.md
064 Interview Bias And What Sets You Apart
064-interview-bias-and-what-sets-you-apart.md
tldr.md
065 Saying This Isnt My Problem Is A Problem
065-saying-this-isnt-my-problem-is-a-problem.md
tldr.md
066 Okr
066-okr.md
tldr.md
067 Miscommunication
067-miscommunication.md
tldr.md
068 When In Doubt Code It Out
068-when-in-doubt-code-it-out.md
tldr.md
069 Follow Up Without Annoying People
069-follow-up-without-annoying-people.md
tldr.md
070 Lead Projects That Land
070-lead-projects-that-land.md
tldr.md
071 Abstract Thinking Skill Next Decade
071-abstract-thinking-skill-next-decade.md
tldr.md
072 We Engineers Suck At Task Estimation
072-we-engineers-suck-at-task-estimation.md
tldr.md
073 Shiny Object Syndrome In Tech
073-shiny-object-syndrome-in-tech.md
tldr.md
074 3p
074-3p.md
tldr.md
075 Leverage The Equilibrium
075-leverage-the-equilibrium.md
tldr.md
076 On Demand Container Loading In Aws Lambda
076-on-demand-container-loading-in-aws-lambda.md
tldr.md
077 Sql Has Problems We Can Fix Them Pipe Syntax In Sql
077-sql-has-problems-we-can-fix-them-pipe-syntax-in-sql.md
tldr.md
078 Nanolog A Nanosecond Scale Logging System
078-nanolog-a-nanosecond-scale-logging-system.md
tldr.md
079 Best Resource Is Mythical
079-best-resource-is-mythical.md
tldr.md
080 Wtf The Who To Follow Service At Twitter
080-wtf-the-who-to-follow-service-at-twitter.md
tldr.md
081 Know A Lot
081-know-a-lot.md
tldr.md
082 Out Of Syllabus
082-out-of-syllabus.md
tldr.md
083 Negotiate The Offer
083-negotiate-the-offer.md
tldr.md
084 Never Bad Mouth Your Ex Exployer
084-never-bad-mouth-your-ex-exployer.md
tldr.md
085 Culture Fit
085-culture-fit.md
tldr.md
086 Quantification In Resume
086-quantification-in-resume.md
tldr.md
087 Hiring Is Unfair
087-hiring-is-unfair.md
tldr.md
088 Questions For Interviewers
088-questions-for-interviewers.md
tldr.md
089 Collaboration Communication
089-collaboration-communication.md
tldr.md
090 Out Of Vicious Interview Cycle
090-out-of-vicious-interview-cycle.md
tldr.md
091 Pitch Projects Not Ideas
091-pitch-projects-not-ideas.md
tldr.md
092 Read Design Docs
092-read-design-docs.md
tldr.md
093 Read Rca Docs
093-read-rca-docs.md
tldr.md
094 Start Generalist
094-start-generalist.md
tldr.md
095 Do Not Rely On Summaries
095-do-not-rely-on-summaries.md
tldr.md
096 Structure Your Design Interviews
096-structure-your-design-interviews.md
tldr.md
097 Title Inflation
097-title-inflation.md
tldr.md
098 Find Your Own Project
098-find-your-own-project.md
tldr.md
099 Six Pointers To Crack Coding And Design Interviews
099-six-pointers-to-crack-coding-and-design-interviews.md
tldr.md
100 Keep Yourself Unblocked
100-keep-yourself-unblocked.md
tldr.md
101 Genetic Knapsack
101-genetic-knapsack.md
tldr.md
102 Pseudorandom Number Generation Lfsr
102-pseudorandom-number-generation-lfsr.md
tldr.md
103 How Indexes Work On Partitioned And Sharded Data
103-how-indexes-work-on-partitioned-and-sharded-data.md
tldr.md
104 Some Data Partitioning Strategies For Distributed Data Stores
104-some-data-partitioning-strategies-for-distributed-data-stores.md
tldr.md
105 Data Partitioning
105-data-partitioning.md
tldr.md
106 Leaderless Replication
106-leaderless-replication.md
tldr.md
107 Conflict Resolution
107-conflict-resolution.md
tldr.md
108 Conflict Detection
108-conflict-detection.md
tldr.md
109 Multi Master Replication
109-multi-master-replication.md
tldr.md
110 Monotonic Reads
110-monotonic-reads.md
tldr.md
111 Read Your Write Consistency
111-read-your-write-consistency.md
tldr.md
112 Handling Outages Master Replica
112-handling-outages-master-replica.md
tldr.md
113 Replication Formats
113-replication-formats.md
tldr.md
114 Replication Strategies
114-replication-strategies.md
tldr.md
115 Master Replica Replication
115-master-replica-replication.md
tldr.md
116 Durability
116-durability.md
tldr.md
117 Isolation
117-isolation.md
tldr.md
118 Atomicity
118-atomicity.md
tldr.md
119 Consistency
119-consistency.md
tldr.md
120 Architectures In Distributed Systems
120-architectures-in-distributed-systems.md
tldr.md
121 Mistaken Beliefs Of Distributed Systems
121-mistaken-beliefs-of-distributed-systems.md
tldr.md
122 Fork Bomb
122-fork-bomb.md
tldr.md
123 Chained Operators Python
123-chained-operators-python.md
tldr.md
124 Taxonomy On Sql
124-taxonomy-on-sql.md
tldr.md
125 The Weird Walrus
125-the-weird-walrus.md
tldr.md
126 Fully Persistent Arrays
126-fully-persistent-arrays.md
tldr.md
127 Persistent Data Structures Introduction
127-persistent-data-structures-introduction.md
tldr.md
128 Constant Folding Python
128-constant-folding-python.md
tldr.md
129 String Interning Python
129-string-interning-python.md
tldr.md
130 Recursion Visualizer Python
130-recursion-visualizer-python.md
tldr.md
131 Flajolet Martin
131-flajolet-martin.md
tldr.md
132 2q Cache
132-2q-cache.md
tldr.md
133 Israeli Queues
133-israeli-queues.md
tldr.md
134 1d Terrain
134-1d-terrain.md
tldr.md
135 Jaccard Minhash
135-jaccard-minhash.md
tldr.md
136 Ts Smoothing
136-ts-smoothing.md
tldr.md
137 Lfu
137-lfu.md
tldr.md
138 Morris Counter
138-morris-counter.md
tldr.md
139 Slowsort
139-slowsort.md
tldr.md
140 Bitcask
140-bitcask.md
tldr.md
141 Phi Accrual
141-phi-accrual.md
tldr.md
142 10x Engineer
142-10x-engineer.md
tldr.md
143 Decipher Repeated Key Xor
143-decipher-repeated-key-xor.md
tldr.md
144 Decipher Single Xor
144-decipher-single-xor.md
tldr.md
145 Python Iterable Integers
145-python-iterable-integers.md
tldr.md
146 Inheritance C
146-inheritance-c.md
tldr.md
147 Rum
147-rum.md
tldr.md
148 Consistent Hashing
148-consistent-hashing.md
tldr.md
149 Python Caches Integers
149-python-caches-integers.md
tldr.md
150 Fractional Cascading
150-fractional-cascading.md
tldr.md
151 Copy On Write
151-copy-on-write.md
tldr.md
152 Midpoint Insertion Caching Strategy
152-midpoint-insertion-caching-strategy.md
tldr.md
153 Fsm Python
153-fsm-python.md
tldr.md
154 Bayesian Average
154-bayesian-average.md
tldr.md
155 Sliding Window Ratelimiter
155-sliding-window-ratelimiter.md
tldr.md
156 Idf
156-idf.md
tldr.md
157 Better Programmer
157-better-programmer.md
tldr.md
158 Python Prompts
158-python-prompts.md
tldr.md
159 Rule 30 Cellular Automata
159-rule-30-cellular-automata.md
tldr.md
160 Function Overloading
160-function-overloading.md
tldr.md
161 Isolation Forest
161-isolation-forest.md
tldr.md
162 Image Steganography
162-image-steganography.md
tldr.md
163 Long Integers Python
163-long-integers-python.md
tldr.md
164 I Changed My Python
164-i-changed-my-python.md
tldr.md
165 Benchmark And Compare Pagination Approach In Mongodb
165-benchmark-and-compare-pagination-approach-in-mongodb.md
tldr.md
166 Mongodb Cursor Skip Is Slow
166-mongodb-cursor-skip-is-slow.md
tldr.md
167 Fast And Efficient Pagination In Mongodb
167-fast-and-efficient-pagination-in-mongodb.md
tldr.md
168 Making Http Requests Using Netcat
168-making-http-requests-using-netcat.md
tldr.md

Binary Search could be tweaked to output the position of the target value, or return the position of the smallest number greater than the target value i.e. position where the target value should have been present in the list.

Things become more interesting when we have to perform an iterative binary search on k lists in which we find the target value in each of the k lists independently. The problem statement could be formally defined as

Given k lists of n sorted integers each, and a target value x, return the position of the smallest value greater than or equal to x in each of the k lists. Preprocessing of the list is allowed before answering the queries.

The naive approach - k binary searches

The expected output of this iterative search is the position of the smallest value greater than or equal to x in each of the k lists. This is a classical Binary Search problem and hence in this approach, we fire k binary searches on k lists for the target value x and collect the positions.

k-binary searches

Python has an in-built module called bisect which has the function bisect_left which outputs the smallest value greater than or equal to x in a list which is exactly what we need to output and hence python-based solution using this k-binary searches approach could be

Plain text

Time and Space Complexity

Each of the k lists have size n and we know the time complexity of performing a binary search in one list of n elements is O(log(n)). Hence we deduce that the time complexity of this k-binary searches approach is O(klog(n)).

This approach does not really require any additional space and hence the space complexity is O(1).

The k-binary searches approach is thus super-efficient on space but not so much on time. Hence by trading some space, we could reap some benefits on time, and on this exact principle, the unified binary search approach is based.

Unified binary search

This approach uses some extra space, preprocessing and computations to reduce search time. The preprocessing actually involves precomputing the positions of all elements in all the k lists. This precomputation enables us to perform just one binary search and get the required precalculated positions in one go.

Preprocess

The preprocessing is done in two phases; in the first phase, we compute a position tuple for each element and associate it with the same. In phase two of preprocessing, we create an auxiliary list containing all the elements of all the lists, on which we then perform a binary search for the given target value.

Computing position tuple for each element

Position tuple is a k item tuple where every ith item denotes the position of the associated element in the ith list. We compute this tuple by performing a binary search on all the k lists treating the element as the target value.

From the example above, the position tuple of 4th element in the 4th list i.e 79 will be [3, 5, 4, 3] which denotes its position in all 4 lists. In list 1, 79 is at index 3, in list 2, 79 is actually out of bounds but would be inserted at index 5 hence the output 5, we could also have returned a value marking out of bounds, like -2, in list 3, 79 is not present but the smallest number greater than 79 is 94 and which is at index 4 and in list 4, 79 is present at index 3. This makes the position tuple for 79 to be [3, 5, 4, 3].

Given a 2-dimensional array arr we compute the position tuple for an element (i, j) by performing a binary search on all k lists as shown in python code below

Plain text
Creating a huge list

Once we have all the position tuples and they are well associated with the corresponding elements, we create an auxiliary list of size k * n that holds all the elements from all the k lists. This auxiliary list is again kept sorted so that we could perform a binary search on it.

Working

Given a target value, we perform a binary search in the above auxiliary list and get the smallest element greater than or equal to this target value. Once we get the element, we now get the associated position tuple. This position tuple is precisely the position of the target element in all the k lists. Thus by performing one binary search in this huge list, we are able to get the required positions.

unified binary search

Complexity

We are performing binary search just once on the list of size k * n hence, the time complexity of this approach is O(log(kn)) which is a huge improvement over the k-binary searches approach where it was O(klog(n)).

This approach, unlike k-binary searches, requires an additional space of O(k.kn) since each element holds k item position tuple and there are in all k * n elements.

Fractional cascading is something that gives us the best of both worlds by creating bridges between the lists and narrowing the scope of binary searches on subsequent iterations. Let’s find out how.

Fractional Cascading

Fractional cascading is a technique through which we speed up the iterative binary searches by creating bridges between the lists. The main idea behind this approach is to dampen the need to perform binary searches on subsequent lists after performing the search on one.

In the k-binary searches approach, we solved the problem by performing k binary searches on k lists. If, after the binary search on the first list, we would have known a range within which the target value was present in the 2nd list, we would have limited our search within that subset which helps us save a bunch of computation time. The bridges, defined above, provides us with a shortcut to reach the subset of the other list where that target value would be present.

Fractional Cascading the Idea

Fractional cascading is just an idea through which we could speed up binary searches, implementations vary with respect to the underlying data. The bridges could be implemented using pointers, graphs, or array indexes.

Preprocess

Preprocessing is a super-critical step in fractional cascading because it is responsible for speeding up the iterative binary searches. Preprocessing actually sets up all the bridges from all the elements from one list to the range of items in the lower list where the element could be found. These bridges then cascade to all the lists on the lower levels.

Create Auxiliary Lists

The first step in pre-processing is to create k auxiliary lists from k original lists. These lists are created bottom-up which means lists on the lower levels are created first - M(i+1) is created before M(i). An auxiliary list M(i) is created as a sorted list of elements of the original list L(i) and half of the previously created auxiliary list M(i+1). The half elements of auxiliary lists are chosen by picking every other element from it.

Create Auxiliary Lists

By picking every other element from lower-level lists, we fill the gaps in value ranges in the original list L(i), giving us a uniform spread of values across all auxiliary lists. Another advantage of picking every other element is that we eradicate the need for performing binary searches on subsequent lists altogether. Now we only need to perform a binary search for list M(0) and for every other list, we only need to check the element we reach via the bridge and an element before that - a constant time comparison.

Position tuples

A position tuple for Fractional Cascading is a 2 item tuple, associated with each element of the auxiliary list, where the first item is the position of the element in the original list on the same level - serving as the required position - and the second element is the position of the element in the auxiliary list on the lower level - serving as the bridge from one level to another.

Create position pointerss

The position tuple for each element in the auxiliary array can be created by doing a binary search on the original list and the auxiliary list on the lower level. Given a 2-dimensional array arr and auxiliary lists m_arr we compute the position tuples for element (i, j) by performing a binary search on all k original and auxiliary lists as shown in python code below

Plain text

Fractional Cascading in action

We start by performing a binary search on the first auxiliary list M(0) from which we get the element corresponding to the target value. The position tuple for this element contains the position corresponding to the original list L(0) and bridge that will take us to the list M(1). Now when we move to the list M(1) through the bridge and have reached the index b.

Since auxiliary lists have uniform range spread, because of every other element being promoted, we are sure that the target value should be checked again at the index b and b - 1; because if the value was any lower it would have been promoted and bridged to other value and hence the trail we trace would be different from what we are tracing now.

Once we know which of the b and b-1 index to pick (depending on the values at the index and the target value) we add the first item of the position tuple to the solution set and move the auxiliary list on the lower level and the entire process continues.

Once we reach the last auxiliary list and process the position tuple there and pick the element, our solution set contains the required positions and we can stop the iteration.

Plain text

The entire working code could be found here github.com/arpitbbhayani/fractional-cascading

Time and space complexity

In Fractional Cascading, we perform binary search once on the auxiliary list M(0) and then make k constant comparisons for each of the subsequent levels; hence the time complexity is O(k + log(n)).

The auxiliary lists could at most contain all the elements from the original list plus 1/2 |L(n)| + 1/4 |L(n-1)| + 1/8 |L(n-2)| + ... which is less than all elements of the original list combined. Thus the total size of the auxiliary list cannot exceed twice the original lists. The position tuple for each of the elements is also a constant 2 item tuple thus the space complexity of Fractional Cascading is O(kn).

Thus Fractional Cascading has time complexity very close to the k-binary searches approach with a very low space complexity as compared to the unified binary searches approach; thus giving us the best of both worlds.

Fractional Cascading in real world

Fractional Cascading is used in FD-Trees which are used in databases to address the asymmetry of read-write speeds in tree indexing on the flash disk. Fractional cascading is typically used in range search data structures like Segment Trees to speed up lookups and filters.

References

  • Fractional Cascading - Wikipedia
  • Fractional Cascading - Original Paper by Bernard Chazelle and Leonidas Guibas
  • Fractional Cascading Revisited
  • Fractional Cascading - Brown University