B-Tree, LLRB
Hug一般会顺滑地衔接/引入知识点,但这些衔接部分写在笔记里多少有点啰嗦,所以这里会略写。
BST的局限
- 先引入两个BST的属性:height和average depth
- height:determines the worst case runtime to find a node
- average depth:determines the average case runtime to find a node
- 然后看看BST的各种属性
- Worst case $ \Theta( N)$ height
- Best case $ \Theta(log N)$ height
- Random trees have $ \Theta(log N)$ average depth and height (bushy)
- 随机情况下表现还不错,问题是现实中会不断从右边insert数据,这会导致不平衡,比如
- add(“01-Jan-2019, 10:31:00”)
- add(“01-Jan-2019, 18:51:00”)
B-Trees / 2-3 trees / 2-3-4 trees
接上,为了防止不平衡,我们不断把新的key insert进原来的node中,然后进行node splitting。
于是就出现了新的树B-trees。
B-trees should be called juicy trees.
定义
- B-tree是一种自平衡树,总的分为以下几类:
- B-trees of order L=3 (like we used today) are also called a 2-3-4 tree or a 2-4 tree(如图)
- B-trees of order L=2 are also called a 2-3 tree (如图)
- B-trees of large L are used in practice for databases etc.
构成机制
(node-split)
-
1/ Insert into a leaf until the number of keys exceeds L
-
2/ In case of excessive keys, send the second (1 st) node up, and split the node
几个例子:
性质
(or Invariants)
基于以上构成机制,可以得出B-tree的一些性质。B-trees are balanced in that:
-
All leaves must be the same distance from the source
- 考虑:涉及到root-split,height+1,所有node都会下降一层;
- 如不涉及root-split,height不变
-
A non-leaf node with k items must have exactly k+1 children
Runtime Analysis
- Height:Θ(log N) for both worst cast and best case
- best case是每个node都有L个items(满的) $\Rightarrow$ Height grows with $\Theta(log_{L+1}N)$
- worst case是每个node只有1个item $\Rightarrow$ 所以和BST一样Height grows with $\Theta(log_2N)$
- operation runtime:
contains()
andadd()
are both $O(log N)$- 【worst case】
contains()
: $C = (层数)log_{L+1}{N} \space·\space (node \space per \space layer)1 \space·\space (work \space per \space node)L$ - 【worst case】
add()
: 多了一些split operation,但化简后是一样的
- 【worst case】
B-Tree的局限
猜猜为什么上面没有代码实现部分
Left Leaning Red-Black Trees
定义
A BST with left glue links that represents a 2-3 tree is often called a “Left Leaning Red Black Binary Search Tree” or LLRB
可以理解为:LLRB是一种和某一个B-Tree对应的BST。具体备注:
- LLRBs are normal BSTs!
- There is a 1-1 correspondence between an LLRB and an equivalent 2-3 tree.
- The red is just a convenient fiction. Red links don’t “do” anything special
性质
和BST一样
尤其需要注意的几个性质:
- All leaves must be the same distance from the source(只数black links)
- No node has two red links
构成机制
- 原理上是基于2-3 tree,将node中较小的item下移,如
-
实际上,因为不可能先实现2-3 tree,再调整成LLRB。在代码中会利用red link, rotate, flip color直接实现LLRB:
-
always use a red link when inserting (类比在2-3中会先往node中添加item)
-
when inserting items on the right, rotateLeft()
-
when inserting on the left twice, rotateRight()
-
(a new rule) allows representing temporary 4-nodes as BST nodes with two red links
-
In case of tmp 4-nodes, flipcolor().
-
If a rotation or flip operation cause an additional violation, fix it
-
代码实现
以下是这些方法的代码实现,包括调整过的Node Class、put()
、新增的rotate
系列、flipColors()
和isred()
。
|
|
Runtime Analysis
- Height: $O(logN)$
contains()
: $O(logN)$insert()
: $O(logN)$- $O(logN)$ to add the new node
- $O(logN)$ for rotation and color flip operations per insert (化简后忽略常数 所以还是 $O(logN)$)
Summary of search trees
逻辑梳理
Cited from the slides [lec 18, 19sp]
-
Binary search trees** are simple, but they are subject to imbalance.
-
2-3 Trees (B Trees) are balanced, but painful to implement and relatively slow.
-
LLRBs insertion is simple to implement (but delete is hard).
- Works by maintaining mathematical bijection with a 2-3 trees.
-
Java’s TreeMap is a red-black tree (not left leaning).
-
Maintains correspondence with 2-3-4 tree (is not a 1-1 correspondence).
-
Allows glue links on either side (see Red-Black Tree).
-
More complex implementation, but significantly (?) faster.
-
Complexity/runtime对比
WC = worst case
BST | B-Trees | LLRB | |
---|---|---|---|
Height | $O(logN)$ | $\Theta(logN)$ | $O(logN)$ |
WC:$O(N)$ | WC:$O(log N)$ | WC:$O(log N)$ | |
contains() |
$O(logN)$ | $O(log N)$ | $O(logN)$ |
WC:$O(N)$ | WC:$O(log N)$ | WC:$O(log N)$ | |
insert() |
$O(logN)$ | $O(log N)$ | $O(logN)$ |
WC:$O(N)$ | WC:$O(log N)$ | WC:$O(log N)$ |
说明:B树和LLRB是self-balanced(不会出现BST变成LinkedList的极端情况),所以比较快。
总的来说:
- B-trees的自平衡避免了BST的worst-cast complexity,
- 而LLRB不仅继承了B-trees的自平衡特性,还继承了BST易于实现的特点。