61B Notes - B-Trees & LLRB

March 28, 2023 · eimy | Translations:

中文

Note: The passage is originally written in Chinese and translated into English via ChatGPT.

B-Tree, LLRB

Hug usually smoothly transitions and introduces new concepts, but the transitions written in the notes can be a bit verbose, so I will provide a concise version here.

Limitations of BST

First, let’s introduce two properties of BST: height and average depth.
- Height: Determines the worst-case runtime to find a node.
- Average depth: Determines the average-case runtime to find a node.

Now let’s look at various properties of BSTs.
- Worst case: $\Theta(N)$ height
- Best case: $\Theta(\log N)$ height
- Random trees have $\Theta(\log N)$ average depth and height (bushy).
Random trees perform well, but the problem arises when we keep inserting data from the right side, leading to imbalance, such as:
- add(“01-Jan-2019, 10:31:00”)
- add(“01-Jan-2019, 18:51:00”)

B-Trees / 2-3 trees / 2-3-4 trees

To address the issue of imbalance, we keep inserting new keys into existing nodes and perform node splitting. This gives rise to a new type of tree called B-trees.

B-trees should be called juicy trees.

Definition

B-trees are self-balancing trees and can be classified into different types:
- B-trees of order L=3 (like the ones used today) are also called a 2-3-4 tree or a 2-4 tree (as shown in the figure).
- B-trees of order L=2 are also called a 2-3 tree (as shown in the figure).
- B-trees of larger L are used in practice for databases, etc.

Construction Mechanism

(node-split)

1. Insert into a leaf until the number of keys exceeds L.
1. In case of excessive keys, send the second (1st) node up and split the node.

Examples:

【No cascading reaction】

【Involves cascading reaction】

【Involves root split】

Properties

(or Invariants)

Based on the construction mechanism mentioned above, we can derive some properties of B-trees. B-trees are balanced in that:

All leaves must be the same distance from the source.
- Consider: When a root split occurs, the height increases by 1, and all nodes descend one level.
- If there is no root split, the height remains the same.
A non-leaf node with k items must have exactly k+1 children.

Runtime Analysis

Height: $\Theta(\log N)$ for both worst-case and best-case scenarios.
- In the best case, each node has L items (full), so the height grows with $\Theta(\log_{L+1}N)$.
- In the worst case, each node has only 1 item, so the height grows with $\Theta(\log_2N)$, the same as a BST.
Operation runtime: Both contains() and add() have a complexity of $O(\log N)$.
- Worst case for contains(): $C = (\text{number of layers}) \log_{L+1}{N} \cdot (\text{nodes per layer}) 1 \cdot (\text{work per node}) L$
- Worst case for add(): It involves some split operations, but the simplified complexity remains the same.

Limitations of B-Trees

Can you guess why the code implementation is not included above?

Left Leaning Red-Black Trees

Definition

A BST with left glue links that represents a 2-3 tree is often called a “Left Leaning Red Black Binary Search Tree” or LLRB.

In other words, LLRB is a BST that corresponds to a specific B-Tree. Some remarks about LLRB trees:

LLRBs are normal BSTs!
There is a one-to-one correspondence between an LLRB and an equivalent 2-3 tree.
The red color is just a convenient representation. Red links don’t have any special functionality.

Properties

Similar to BSTs, LLRB trees have the following properties:

All leaves must be the same distance from the source (counting only black links).
No node has two red links.

Construction Mechanism

In principle, LLRB trees are based on 2-3 trees, where the smaller item in a node is moved down. For example:

However, it is not practical to implement a 2-3 tree and then convert it to an LLRB tree. In the code

, LLRB trees are implemented directly using red links, rotation, and color flipping:

Always use a red link when inserting (analogous to adding items to a node in a 2-3 tree).
When inserting items on the right, use rotateLeft().
When inserting items on the left twice, use rotateRight().
A new rule allows representing temporary 4-nodes as BST nodes with two red links.
In case of temporary 4-nodes, use flipColors().
If a rotation or color flip operation causes an additional violation, fix it.

Code Implementation

The following code includes the implementation of these methods, including the modified Node class, put(), the newly added rotate methods, flipColors(), and isRed().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65


// LLRB insertion (p.439)
public class RedBlackBST<Key extends Comparable<Key>, Value> {
    private Node root;
    private class Node // BST node with color bit (see page 433)
    private boolean isRed(Node h)    // See page 433.
    private Node rotateLeft(Node h)  // See page 434.
    private Node rotateRight(Node h) // See page 434.
    private void flipColors(Node h)  // See page 436.
    private int size()               // See page 398.
    public void put(Key key, Value val) {  
        // Search for key. Update value if found; grow table if new.
        root = put(root, key, val);
        root.color = BLACK;
    }
    private Node put(Node h, Key key, Value val) {
        if (h == null)  { // Do standard insert, with red link to parent.
           return new Node(key, val, 1, RED);
        }
        int cmp = key.compareTo(h.key);
        if (cmp < 0) {
            h.left  = put(h.left,  key, val);
        }
        else if (cmp > 0) {
            h.right = put(h.right, key, val);
        } else {
            h.val = val;
        } if (isRed(h.right) && !isRed(h.left)) { // 基于BST只需要修改这里三个clause
            h = rotateLeft(h);
        } if (isRed(h.left) && isRed(h.left.left)) {
            h = rotateRight(h);
        } if (isRed(h.left) && isRed(h.right)) {
            flipColors(h);
            h.N = size(h.left) + size(h.right) + 1;
            return h;
        }
    }


	// some omitted methods within RedBlackBST Class
    private boolean isRed(Node x) {
       if (x == null) return false;
       return x.color == RED;
    }

    private Node rotateLeft(Node h) { // rotateRight() is similar
       Node x = h.right;
       h.right = x.left;
       x.left = h;
       x.color = h.color;
       h.color = RED;
       x.N = h.N;
       h.N = 1 + size(h.left)
               + size(h.right);
       return x;
    }

    private void flipColors(Node h) {
       h.color = RED;
       h.left.color = BLACK;
       h.right.color = BLACK;
    }
    
}

// delete比较麻烦，详细见p.441.

Runtime Analysis

Height: $O(logN)$
contains(): $O(logN)$
insert(): $O(logN)$
- $O(logN)$ to add the new node
- $O(logN)$ for rotation and color flip operations per insert （After simplification and ignoring constant terms, the time complexity remains O(log N).）

Summary of search trees

逻辑梳理

Cited from the slides [lec 18, 19sp]

Binary search trees** are simple, but they are subject to imbalance.
2-3 Trees (B Trees) are balanced, but painful to implement and relatively slow.
LLRBs insertion is simple to implement (but delete is hard).
- Works by maintaining mathematical bijection with a 2-3 trees.
Java’s TreeMap is a red-black tree (not left leaning).
- Maintains correspondence with 2-3-4 tree (is not a 1-1 correspondence).
- Allows glue links on either side (see Red-Black Tree).
- More complex implementation, but significantly (?) faster.

Complexity/runtime对比

WC = worst case

	BST	B-Trees	LLRB
Height	$O(logN)$	$\Theta(logN)$	$O(logN)$
	WC：$O(N)$	WC：$O(log N)$	WC：$O(log N)$
`contains()`	$O(logN)$	$O(log N)$	$O(logN)$
	WC：$O(N)$	WC：$O(log N)$	WC：$O(log N)$
`insert()`	$O(logN)$	$O(log N)$	$O(logN)$
	WC：$O(N)$	WC：$O(log N)$	WC：$O(log N)$

Explanation: B-trees and LLRB trees are self-balanced, which prevents the extreme case of a BST degenerating into a linked list. This property makes them faster compared to regular BSTs.

In summary:

B-trees achieve self-balancing to avoid the worst-case complexity of regular BSTs.
LLRB trees not only inherit the self-balancing property of B-trees but also retain the ease of implementation characteristic of regular BSTs.