Note: The passage is originally written in Chinese and translated into English via ChatGPT.


B-Tree, LLRB

Hug usually smoothly transitions and introduces new concepts, but the transitions written in the notes can be a bit verbose, so I will provide a concise version here.

Limitations of BST

  • First, let’s introduce two properties of BST: height and average depth.
    • Height: Determines the worst-case runtime to find a node.
    • Average depth: Determines the average-case runtime to find a node.
  • Now let’s look at various properties of BSTs.
    • Worst case: $\Theta(N)$ height
    • Best case: $\Theta(\log N)$ height
    • Random trees have $\Theta(\log N)$ average depth and height (bushy).
  • Random trees perform well, but the problem arises when we keep inserting data from the right side, leading to imbalance, such as:
    • add(“01-Jan-2019, 10:31:00”)
    • add(“01-Jan-2019, 18:51:00”)

B-Trees / 2-3 trees / 2-3-4 trees

To address the issue of imbalance, we keep inserting new keys into existing nodes and perform node splitting. This gives rise to a new type of tree called B-trees.

B-trees should be called juicy trees.

Definition

  • B-trees are self-balancing trees and can be classified into different types:
    • B-trees of order L=3 (like the ones used today) are also called a 2-3-4 tree or a 2-4 tree (as shown in the figure).
    • B-trees of order L=2 are also called a 2-3 tree (as shown in the figure).
    • B-trees of larger L are used in practice for databases, etc.
image-20230328102217909

Construction Mechanism

(node-split)

    1. Insert into a leaf until the number of keys exceeds L.
    1. In case of excessive keys, send the second (1st) node up and split the node.

Examples:

【No cascading reaction】
【Involves cascading reaction】
【Involves root split】

Properties

(or Invariants)

Based on the construction mechanism mentioned above, we can derive some properties of B-trees. B-trees are balanced in that:

  • All leaves must be the same distance from the source.
    • Consider: When a root split occurs, the height increases by 1, and all nodes descend one level.
    • If there is no root split, the height remains the same.
  • A non-leaf node with k items must have exactly k+1 children.

Runtime Analysis

  • Height: $\Theta(\log N)$ for both worst-case and best-case scenarios.
    • In the best case, each node has L items (full), so the height grows with $\Theta(\log_{L+1}N)$.
    • In the worst case, each node has only 1 item, so the height grows with $\Theta(\log_2N)$, the same as a BST.
  • Operation runtime: Both contains() and add() have a complexity of $O(\log N)$.
    • Worst case for contains(): $C = (\text{number of layers}) \log_{L+1}{N} \cdot (\text{nodes per layer}) 1 \cdot (\text{work per node}) L$
    • Worst case for add(): It involves some split operations, but the simplified complexity remains the same.

Limitations of B-Trees

Can you guess why the code implementation is not included above?

Left Leaning Red-Black Trees

Definition

A BST with left glue links that represents a 2-3 tree is often called a “Left Leaning Red Black Binary Search Tree” or LLRB.

In other words, LLRB is a BST that corresponds to a specific B-Tree. Some remarks about LLRB trees:

  • LLRBs are normal BSTs!
  • There is a one-to-one correspondence between an LLRB and an equivalent 2-3 tree.
  • The red color is just a convenient representation. Red links don’t have any special functionality.

Properties

Similar to BSTs, LLRB trees have the following properties:

  • All leaves must be the same distance from the source (counting only black links).
  • No node has two red links.

Construction Mechanism

  • In principle, LLRB trees are based on 2-3 trees, where the smaller item in a node is moved down. For example:
  • However, it is not practical to implement a 2-3 tree and then convert it to an LLRB tree. In the code

, LLRB trees are implemented directly using red links, rotation, and color flipping:

  • Always use a red link when inserting (analogous to adding items to a node in a 2-3 tree).

    image-20230328211407567
  • When inserting items on the right, use rotateLeft().

    image-20230328211424939
  • When inserting items on the left twice, use rotateRight().

    image-20230328212334776
  • A new rule allows representing temporary 4-nodes as BST nodes with two red links.

    image-20230328212423779
  • In case of temporary 4-nodes, use flipColors().

  • If a rotation or color flip operation causes an additional violation, fix it.

    image-20230328213407205 image-20230328213419511

Code Implementation

The following code includes the implementation of these methods, including the modified Node class, put(), the newly added rotate methods, flipColors(), and isRed().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// LLRB insertion (p.439)
public class RedBlackBST<Key extends Comparable<Key>, Value> {
    private Node root;
    private class Node // BST node with color bit (see page 433)
    private boolean isRed(Node h)    // See page 433.
    private Node rotateLeft(Node h)  // See page 434.
    private Node rotateRight(Node h) // See page 434.
    private void flipColors(Node h)  // See page 436.
    private int size()               // See page 398.
    public void put(Key key, Value val) {  
        // Search for key. Update value if found; grow table if new.
        root = put(root, key, val);
        root.color = BLACK;
    }
    private Node put(Node h, Key key, Value val) {
        if (h == null)  { // Do standard insert, with red link to parent.
           return new Node(key, val, 1, RED);
        }
        int cmp = key.compareTo(h.key);
        if (cmp < 0) {
            h.left  = put(h.left,  key, val);
        }
        else if (cmp > 0) {
            h.right = put(h.right, key, val);
        } else {
            h.val = val;
        } if (isRed(h.right) && !isRed(h.left)) { // 基于BST只需要修改这里三个clause
            h = rotateLeft(h);
        } if (isRed(h.left) && isRed(h.left.left)) {
            h = rotateRight(h);
        } if (isRed(h.left) && isRed(h.right)) {
            flipColors(h);
            h.N = size(h.left) + size(h.right) + 1;
            return h;
        }
    }


	// some omitted methods within RedBlackBST Class
    private boolean isRed(Node x) {
       if (x == null) return false;
       return x.color == RED;
    }

    private Node rotateLeft(Node h) { // rotateRight() is similar
       Node x = h.right;
       h.right = x.left;
       x.left = h;
       x.color = h.color;
       h.color = RED;
       x.N = h.N;
       h.N = 1 + size(h.left)
               + size(h.right);
       return x;
    }

    private void flipColors(Node h) {
       h.color = RED;
       h.left.color = BLACK;
       h.right.color = BLACK;
    }
    
}

// delete比较麻烦,详细见p.441.

Runtime Analysis

  • Height: $O(logN)$
  • contains(): $O(logN)$
  • insert(): $O(logN)$
    • $O(logN)$ to add the new node
    • $O(logN)$ for rotation and color flip operations per insert (After simplification and ignoring constant terms, the time complexity remains O(log N).)

Summary of search trees

逻辑梳理

Cited from the slides [lec 18, 19sp]

  • Binary search trees** are simple, but they are subject to imbalance.

  • 2-3 Trees (B Trees) are balanced, but painful to implement and relatively slow.

  • LLRBs insertion is simple to implement (but delete is hard).

    • Works by maintaining mathematical bijection with a 2-3 trees.
  • Java’s TreeMap is a red-black tree (not left leaning).

    • Maintains correspondence with 2-3-4 tree (is not a 1-1 correspondence).

    • Allows glue links on either side (see Red-Black Tree).

    • More complex implementation, but significantly (?) faster.

Complexity/runtime对比

WC = worst case

BST B-Trees LLRB
Height $O(logN)$ $\Theta(logN)$ $O(logN)$
WC:$O(N)$ WC:$O(log N)$ WC:$O(log N)$
contains() $O(logN)$ $O(log N)$ $O(logN)$
WC:$O(N)$ WC:$O(log N)$ WC:$O(log N)$
insert() $O(logN)$ $O(log N)$ $O(logN)$
WC:$O(N)$ WC:$O(log N)$ WC:$O(log N)$
  • Explanation: B-trees and LLRB trees are self-balanced, which prevents the extreme case of a BST degenerating into a linked list. This property makes them faster compared to regular BSTs.

In summary:

  • B-trees achieve self-balancing to avoid the worst-case complexity of regular BSTs.
  • LLRB trees not only inherit the self-balancing property of B-trees but also retain the ease of implementation characteristic of regular BSTs.