Note: The passage is originally written in Chinese and translated into English via ChatGPT.
B-Tree, LLRB
Hug usually smoothly transitions and introduces new concepts, but the transitions written in the notes can be a bit verbose, so I will provide a concise version here.
Limitations of BST
- First, let’s introduce two properties of BST: height and average depth.
- Height: Determines the worst-case runtime to find a node.
- Average depth: Determines the average-case runtime to find a node.
- Now let’s look at various properties of BSTs.
- Worst case: $\Theta(N)$ height
- Best case: $\Theta(\log N)$ height
- Random trees have $\Theta(\log N)$ average depth and height (bushy).
- Random trees perform well, but the problem arises when we keep inserting data from the right side, leading to imbalance, such as:
- add(“01-Jan-2019, 10:31:00”)
- add(“01-Jan-2019, 18:51:00”)
B-Trees / 2-3 trees / 2-3-4 trees
To address the issue of imbalance, we keep inserting new keys into existing nodes and perform node splitting. This gives rise to a new type of tree called B-trees.
B-trees should be called juicy trees.
Definition
- B-trees are self-balancing trees and can be classified into different types:
- B-trees of order L=3 (like the ones used today) are also called a 2-3-4 tree or a 2-4 tree (as shown in the figure).
- B-trees of order L=2 are also called a 2-3 tree (as shown in the figure).
- B-trees of larger L are used in practice for databases, etc.
Construction Mechanism
(node-split)
-
- Insert into a leaf until the number of keys exceeds L.
-
- In case of excessive keys, send the second (1st) node up and split the node.
Examples:
Properties
(or Invariants)
Based on the construction mechanism mentioned above, we can derive some properties of B-trees. B-trees are balanced in that:
- All leaves must be the same distance from the source.
- Consider: When a root split occurs, the height increases by 1, and all nodes descend one level.
- If there is no root split, the height remains the same.
- A non-leaf node with k items must have exactly k+1 children.
Runtime Analysis
- Height: $\Theta(\log N)$ for both worst-case and best-case scenarios.
- In the best case, each node has L items (full), so the height grows with $\Theta(\log_{L+1}N)$.
- In the worst case, each node has only 1 item, so the height grows with $\Theta(\log_2N)$, the same as a BST.
- Operation runtime: Both
contains()
andadd()
have a complexity of $O(\log N)$.- Worst case for
contains()
: $C = (\text{number of layers}) \log_{L+1}{N} \cdot (\text{nodes per layer}) 1 \cdot (\text{work per node}) L$ - Worst case for
add()
: It involves some split operations, but the simplified complexity remains the same.
- Worst case for
Limitations of B-Trees
Can you guess why the code implementation is not included above?
Left Leaning Red-Black Trees
Definition
A BST with left glue links that represents a 2-3 tree is often called a “Left Leaning Red Black Binary Search Tree” or LLRB.
In other words, LLRB is a BST that corresponds to a specific B-Tree. Some remarks about LLRB trees:
- LLRBs are normal BSTs!
- There is a one-to-one correspondence between an LLRB and an equivalent 2-3 tree.
- The red color is just a convenient representation. Red links don’t have any special functionality.
Properties
Similar to BSTs, LLRB trees have the following properties:
- All leaves must be the same distance from the source (counting only black links).
- No node has two red links.
Construction Mechanism
- In principle, LLRB trees are based on 2-3 trees, where the smaller item in a node is moved down. For example:
- However, it is not practical to implement a 2-3 tree and then convert it to an LLRB tree. In the code
, LLRB trees are implemented directly using red links, rotation, and color flipping:
-
Always use a red link when inserting (analogous to adding items to a node in a 2-3 tree).
-
When inserting items on the right, use
rotateLeft()
. -
When inserting items on the left twice, use
rotateRight()
. -
A new rule allows representing temporary 4-nodes as BST nodes with two red links.
-
In case of temporary 4-nodes, use
flipColors()
. -
If a rotation or color flip operation causes an additional violation, fix it.
Code Implementation
The following code includes the implementation of these methods, including the modified Node
class, put()
, the newly added rotate
methods, flipColors()
, and isRed()
.
|
|
Runtime Analysis
- Height: $O(logN)$
contains()
: $O(logN)$insert()
: $O(logN)$- $O(logN)$ to add the new node
- $O(logN)$ for rotation and color flip operations per insert (After simplification and ignoring constant terms, the time complexity remains O(log N).)
Summary of search trees
逻辑梳理
Cited from the slides [lec 18, 19sp]
-
Binary search trees** are simple, but they are subject to imbalance.
-
2-3 Trees (B Trees) are balanced, but painful to implement and relatively slow.
-
LLRBs insertion is simple to implement (but delete is hard).
- Works by maintaining mathematical bijection with a 2-3 trees.
-
Java’s TreeMap is a red-black tree (not left leaning).
-
Maintains correspondence with 2-3-4 tree (is not a 1-1 correspondence).
-
Allows glue links on either side (see Red-Black Tree).
-
More complex implementation, but significantly (?) faster.
-
Complexity/runtime对比
WC = worst case
BST | B-Trees | LLRB | |
---|---|---|---|
Height | $O(logN)$ | $\Theta(logN)$ | $O(logN)$ |
WC:$O(N)$ | WC:$O(log N)$ | WC:$O(log N)$ | |
contains() |
$O(logN)$ | $O(log N)$ | $O(logN)$ |
WC:$O(N)$ | WC:$O(log N)$ | WC:$O(log N)$ | |
insert() |
$O(logN)$ | $O(log N)$ | $O(logN)$ |
WC:$O(N)$ | WC:$O(log N)$ | WC:$O(log N)$ |
- Explanation: B-trees and LLRB trees are self-balanced, which prevents the extreme case of a BST degenerating into a linked list. This property makes them faster compared to regular BSTs.
In summary:
- B-trees achieve self-balancing to avoid the worst-case complexity of regular BSTs.
- LLRB trees not only inherit the self-balancing property of B-trees but also retain the ease of implementation characteristic of regular BSTs.