B-Tree, LLRB

Hug一般会顺滑地衔接/引入知识点,但这些衔接部分写在笔记里多少有点啰嗦,所以这里会略写。

BST的局限

  • 先引入两个BST的属性:height和average depth
    • height:determines the worst case runtime to find a node
    • average depth:determines the average case runtime to find a node
  • 然后看看BST的各种属性
    • Worst case $ \Theta( N)$ height
    • Best case $ \Theta(log N)$ height
    • Random trees have $ \Theta(log N)$ average depth and height (bushy)
  • 随机情况下表现还不错,问题是现实中会不断从右边insert数据,这会导致不平衡,比如
    • add(“01-Jan-2019, 10:31:00”)
    • add(“01-Jan-2019, 18:51:00”)

B-Trees / 2-3 trees / 2-3-4 trees

接上,为了防止不平衡,我们不断把新的key insert进原来的node中,然后进行node splitting。

于是就出现了新的树B-trees。

B-trees should be called juicy trees.

定义

  • B-tree是一种自平衡树,总的分为以下几类:
    • B-trees of order L=3 (like we used today) are also called a 2-3-4 tree or a 2-4 tree(如图)
    • B-trees of order L=2 are also called a 2-3 tree (如图)
    • B-trees of large L are used in practice for databases etc.
image-20230328102217909

构成机制

(node-split)

  • 1/ Insert into a leaf until the number of keys exceeds L

  • 2/ In case of excessive keys, send the second (1 st) node up, and split the node

几个例子:

【非连锁反应】
【涉及到连锁反应】
【涉及到root split】

性质

(or Invariants)

基于以上构成机制,可以得出B-tree的一些性质。B-trees are balanced in that:

  • All leaves must be the same distance from the source

    • 考虑:涉及到root-split,height+1,所有node都会下降一层;
    • 如不涉及root-split,height不变
  • A non-leaf node with k items must have exactly k+1 children

Runtime Analysis

  • Height:Θ(log N) for both worst cast and best case
    • best case是每个node都有L个items(满的) $\Rightarrow$ Height grows with $\Theta(log_{L+1}N)$
    • worst case是每个node只有1个item $\Rightarrow$ 所以和BST一样Height grows with $\Theta(log_2N)$
  • operation runtime: contains() and add() are both $O(log N)$
    • 【worst case】contains(): $C = (层数)log_{L+1}{N} \space·\space (node \space per \space layer)1 \space·\space (work \space per \space node)L$
    • 【worst case】add(): 多了一些split operation,但化简后是一样的

B-Tree的局限

猜猜为什么上面没有代码实现部分

Left Leaning Red-Black Trees

定义

A BST with left glue links that represents a 2-3 tree is often called a “Left Leaning Red Black Binary Search Tree” or LLRB

可以理解为:LLRB是一种和某一个B-Tree对应的BST。具体备注:

  • LLRBs are normal BSTs!
  • There is a 1-1 correspondence between an LLRB and an equivalent 2-3 tree.
  • The red is just a convenient fiction. Red links don’t “do” anything special

性质

和BST一样

尤其需要注意的几个性质:

  • All leaves must be the same distance from the source(只数black links)
  • No node has two red links

构成机制

  • 原理上是基于2-3 tree,将node中较小的item下移,如
  • 实际上,因为不可能先实现2-3 tree,再调整成LLRB。在代码中会利用red link, rotate, flip color直接实现LLRB:

    • always use a red link when inserting (类比在2-3中会先往node中添加item)

      image-20230328211407567
    • when inserting items on the right, rotateLeft()

      image-20230328211424939
    • when inserting on the left twice, rotateRight()

      image-20230328212334776
    • (a new rule) allows representing temporary 4-nodes as BST nodes with two red links

      image-20230328212423779
    • In case of tmp 4-nodes, flipcolor().

    • If a rotation or flip operation cause an additional violation, fix it

      image-20230328213407205 image-20230328213419511

代码实现

以下是这些方法的代码实现,包括调整过的Node Class、put()、新增的rotate系列、flipColors()isred()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// LLRB insertion (p.439)
public class RedBlackBST<Key extends Comparable<Key>, Value> {
    private Node root;
    private class Node // BST node with color bit (see page 433)
    private boolean isRed(Node h)    // See page 433.
    private Node rotateLeft(Node h)  // See page 434.
    private Node rotateRight(Node h) // See page 434.
    private void flipColors(Node h)  // See page 436.
    private int size()               // See page 398.
    public void put(Key key, Value val) {  
        // Search for key. Update value if found; grow table if new.
        root = put(root, key, val);
        root.color = BLACK;
    }
    private Node put(Node h, Key key, Value val) {
        if (h == null)  { // Do standard insert, with red link to parent.
           return new Node(key, val, 1, RED);
        }
        int cmp = key.compareTo(h.key);
        if (cmp < 0) {
            h.left  = put(h.left,  key, val);
        }
        else if (cmp > 0) {
            h.right = put(h.right, key, val);
        } else {
            h.val = val;
        } if (isRed(h.right) && !isRed(h.left)) { // 基于BST只需要修改这里三个clause
            h = rotateLeft(h);
        } if (isRed(h.left) && isRed(h.left.left)) {
            h = rotateRight(h);
        } if (isRed(h.left) && isRed(h.right)) {
            flipColors(h);
            h.N = size(h.left) + size(h.right) + 1;
            return h;
        }
    }


	// some omitted methods within RedBlackBST Class
    private boolean isRed(Node x) {
       if (x == null) return false;
       return x.color == RED;
    }

    private Node rotateLeft(Node h) { // rotateRight() is similar
       Node x = h.right;
       h.right = x.left;
       x.left = h;
       x.color = h.color;
       h.color = RED;
       x.N = h.N;
       h.N = 1 + size(h.left)
               + size(h.right);
       return x;
    }

    private void flipColors(Node h) {
       h.color = RED;
       h.left.color = BLACK;
       h.right.color = BLACK;
    }
    
}

// delete比较麻烦,详细见p.441.

Runtime Analysis

  • Height: $O(logN)$
  • contains(): $O(logN)$
  • insert(): $O(logN)$
    • $O(logN)$ to add the new node
    • $O(logN)$ for rotation and color flip operations per insert (化简后忽略常数 所以还是 $O(logN)$)

Summary of search trees

逻辑梳理

Cited from the slides [lec 18, 19sp]

  • Binary search trees** are simple, but they are subject to imbalance.

  • 2-3 Trees (B Trees) are balanced, but painful to implement and relatively slow.

  • LLRBs insertion is simple to implement (but delete is hard).

    • Works by maintaining mathematical bijection with a 2-3 trees.
  • Java’s TreeMap is a red-black tree (not left leaning).

    • Maintains correspondence with 2-3-4 tree (is not a 1-1 correspondence).

    • Allows glue links on either side (see Red-Black Tree).

    • More complex implementation, but significantly (?) faster.

Complexity/runtime对比

WC = worst case

BST B-Trees LLRB
Height $O(logN)$ $\Theta(logN)$ $O(logN)$
WC:$O(N)$ WC:$O(log N)$ WC:$O(log N)$
contains() $O(logN)$ $O(log N)$ $O(logN)$
WC:$O(N)$ WC:$O(log N)$ WC:$O(log N)$
insert() $O(logN)$ $O(log N)$ $O(logN)$
WC:$O(N)$ WC:$O(log N)$ WC:$O(log N)$

说明:B树和LLRB是self-balanced(不会出现BST变成LinkedList的极端情况),所以比较快。

总的来说:

  • B-trees的自平衡避免了BST的worst-cast complexity,
  • 而LLRB不仅继承了B-trees的自平衡特性,还继承了BST易于实现的特点。