UP | HOME

第二十七课 四种 Balanced Search Tree

Table of Contents

2-3-4 TREES

Balaced tree(Not all are binary):

  1. 234 tree(a kind of B tree): ordered perfectly balance tree
  2. splay tree (BST)
  3. AVL tree (BST)(height-balanced)
  4. Black-Red tree (BST)(convert from/to 234-tree)(black-height)

Last lecture, we learned about the Ordered Dictionary ADT, and we learned one data structure for implementing it: *binary search trees*. Today we learn a faster one.

A 2-3-4 tree is a perfectly balanced tree. It has a big advantage over regular binary search trees: because the tree is perfectly balanced, find, insert, and remove operations take O(log n) time, all of the leaves have the same depth, even in the worst case.

2-3-4 trees are thus named because every node has 2, 3, or 4 children, except leaves, which are all at the bottom level of the tree. Each node stores 1, 2, or 3 entries, which determine how other entries are distributed among its children's subtrees.

Number of childern = keys + 1

2 keys node has 3 children

                | key1 |  key2 |
                /     |        \
             /--      |         ---\
| []< key1 |   | key1 <[]< key2 |   | [] > key2 |

Each internal (non-leaf) node has one more child than entries. For example, a node with keys [20, 40, 50] has four children.

  • Eack key k in the subtree rooted at the first child satisfies k <= 20;
  • at the second child, 20 <= k <= 40;
  • at the third child, 40 <= k <= 50;
  • and at the fourth child, k >= 50.

like this 1

234-tree 也是 order 的,因为 binarytree 就是 ordered

Main api of 234-tree

WARNING: The algorithms for insertion and deletion I'll discuss today are different from those discussed by Goodrich and Tamassia. The text presents "bottom-up" 2-3-4 trees, so named because the effects of node splits at the bottom of the tree can work their way back up toward the root. I'll discuss "top-down" 2-3-4 trees, in which insertion and deletion finish at the leaves. Top-down 2-3-4 trees are usually faster than bottom-up ones, because both trees search down from the root to the leaves, but only the bottom-up trees sometimes go back up to the root. Goodrich and Tamassia call 2-3-4 trees "(2, 4) trees".

2-3-4 trees are a type of B-tree, which you may learn about someday in connection with fast disk access for database systems. B-trees on disks usually use the top-down insertion/deletion algorithms, because accessing a disk track is slow, so you'd rather not revisit it multiple times.

[1] Entry find(Object k);

Finding an entry is straightforward.        ==========
Start at the root.  At each node,           +20 40 50+
check for the key k; if it's not         /--==========------\
present, move down to the           /---/      /  \          \-----\
appropriate child chosen by     ----      ----      ----            =======
comparing k against the keys.   |14|      |32|      |43|            +70 79+
Continue until k is found,      ----      ----      ----            =======
or k is not found at a          /  \      /  \      /  \            /  |  \            <<lec27-pic1>>
leaf node.  For example,     ---- ---- ---- ---- ---- ---- ---------- ==== ----
find(74) visits the          |10| |18| |25| |33| |42| |47| |57 62 66| +74+ |81|
double-lined boxes at right. ---- ---- ---- ---- ---- ---- ---------- ==== ----

Incidentally, you can define an inorder traversal on 2-3-4 trees analogous to that on binary trees, and it visits the keys in sorted order.

[2] Entry insert(Object k, Object e);

insert(), like find(), walks down the tree in search of the key k. If it finds an entry with key k, it proceeds to that entry's "left child" and continues.

这里 left child 是针对某个 key 的,不是针对整个 node,比如 node = [20,40,80], 80 的 left-child就是 40~80 的这个 child。

一般我们会限制每个 node 能存储的最多 keys,比如这里 max(# of keys)=3 insert() 最大的问题是如何在插入之后仍然保持整棵树的平衡,解决方法就是 检测分割提升法:在插入之前按照 find()方法从 root 开始找位置,不同的是在找的过程中一旦 检测 到达到存储上线的 node就对其 分割提升 中间节点为父节点。最后再在叶子 node 中的适当位置插入。这个方法只在叶子 node 进行添加,不会在某个父节点进行,父节点的改变主要是满载时改变 or 子节点满载上呈给父节点。

Unlike find(), insert() sometimes modifies             ----         -------
nodes of the tree as it walks down.                    |20|         |11 20|
Specifically, whenever insert() encounters             ----         -------
a 3-key node, the middle key is ejected,               /  \   =>    /  |  \
and is placed in the parent node instead.     ========== ----    ---- ---- ----
Since the parent was previously treated the   +10 11 12+ |30|    |10| |12| |30|
same way, the parent has at most two keys,    ========== ----    ---- ---- ----
and always has room for a third.  The other
two keys in the 3-key node are split into two separate 1-key nodes, which are
divided underneath the old middle key (as the figure illustrates).

For example, suppose we                      ----
insert 60 into the tree                      |40|                                  <<lec27-pic2>>
depicted in [1].  The                      /------\
first node visited is                 /---/        \----\
the root, which has three          ----                  ----
keys; so we kick the               |20|                  |50|
middle key (40) upstairs.          ----                /------\
Since the root node has           /    \              /        \
no parent, a new node         ----      ----      ----          ----------
is created to hold 40         |14|      |32|      |43|          |62 70 79|
and becomes the root.         ----      ----      ----          ----------
Similarly, 62 is kicked       /  \      /  \      /  \          /  |  |   \
upstairs when insert()     ---- ---- ---- ---- ---- ---- ------- ---- ---- ----
finds the node containing  |10| |18| |25| |33| |42| |47| |57 60| |66| |74| |81|
it.  This ensures us that  ---- ---- ---- ---- ---- ---- ------- ---- ---- ----
when we arrive at the leaf
(labeled 57 in this example), there's room to add the new key 60.

Observe that along the way, we created a new 3-key node "62 70 79". We do not kick its middle key upstairs until the next time it is visited.

Again, the reasons why we split every 3-key node we encounter (and move its middle key up one level) are

  1. to make sure there's room for the new key in the leaf node, and
  2. to make sure that above the leaves, there's room for any key that gets kicked upstairs.

Sometimes, an insertion operation increases the height of the tree by one by creating a new root.

[3] Entry remove(Object k);

2-3-4 tree remove() is similar to remove() on binary search trees: you find the entry you want to remove (having key k).

  • If it's in a leaf, you remove it.
  • If it's in an internal node, you replace it with the entry with the next higher key. ( 似乎这个也可以实现类似 binary search tree 的 spell 自动纠错功能 ),前面这个想法是错的。这里的意思是说,一旦找到这个 key,就用右子树中最小的 (smallest higher)代替这个 key 的位置.

That entry is always in a leaf(原因同 insert(),因为在实际插入/删除之前都要进行‘检测 分割/合并 上升/下降 法’). 意思是,在找这个节点的过程中(directily leaf or changging internal with smallest higher),要对只有一个 entry 的节点进行处理。目的是不让删除节点造成 234-树整体的不平衡. In either case, you remove an entry from a leaf in the end.

Like insert(), remove() changes nodes of the tree as it walks down. Whereas insert() eliminates 3-key nodes (moving keys up the tree) to make room for new keys, remove() eliminates 1-key nodes (pulling keys down the tree) so that a key can be removed from a leaf without leaving it empty. There are three ways 1-key nodes (except the root) are eliminated.

1)  When remove() encounters a 1-key  -------                  -------
node (except the root), it tries       |20 40|                  |20 50|
to steal a key from an adjacent        -------                  -------
sibling.  But we can't just steal      /  |  \          =>     /   |   \
the sibling's key without          ---- ==== ----------    ---- ------- -------
violating the search tree          |10| +30+ |50 51 52|    |10| |30 40| |51 52|
invariant.  This figure shows      ---- ==== ----------    ---- ------- -------
remove's action, called a           /\   /\   / |  | \      /\   / | \   / | \
"rotation", when it reaches "30".            S                        S
We move a
key _from the sibling to the parent_, and we move a key _from the parent to the 1-key node_.  We also move
a subtree S from the sibling to the 1-key node (now a 2-key node).

Goodrich & Tamassia call rotations "transfer" operations. Note that we CAN'T steal a key from a non-adjacent sibling.

当相邻的兄弟有多于一个 entry, 用旋转,兄传父,父传我

2)  If no adjacent sibling has more than one     -------               ----
key, a rotation can't be used.  In this case,     |20 40|               |40|
the _1-key node steals a key from its parent_.    -------               ----
Since the parent was previously treated the       /  |  \    =>         /  \
same way (unless it's the root), it has at    ==== ---- ----    ---------- ----
least two keys, and can spare one.  The       +10+ |30| |50|    |10 20 30| |50|
sibling is also absorbed, and the 1-key node  ==== ---- ----    ---------- ----
becomes a 3-key node.  The figure illustrates
remove's action when it reaches "10".  This is called a "fusion" operation.

当相邻的兄弟只有一个 entry 且父节点不止一个节点,用融合,偷父合兄

3)  If the parent is the root and contains only one key, and the sibling
contains only one key, then the current 1-key node, its 1-key sibling, and the
1-key root are fused into one 3-key node that serves as the new root.  The
height of the tree decreases by one.

当父节点是根节点且只有一个 entry,兄弟节点也只有一个 entry, 和三为一,树深度减一;

Eventually we reach a leaf. After we process the leaf, it has at least two keys (if there are at least two keys in the tree), so we can delete the key and still have one key in the leaf.

For example, suppose we                  ----------
remove 40 from the large                 |20 xx 50|
tree depicted in  . The               /-----------------\
root node contains 40,            /--/      /   \        \-----\
which we mark "xx" to         ----      ----      ----          ----------
remind us that we plan to     |14|      |32|      |43|          |62 70 79|
replace it with the           ----      ----      ----          ----------
smallest key in the root      /  \      /  \      /  \          /  |  |   \
node's right subtree.  To  ---- ---- ---- ---- ---- ---- ------- ---- ---- ----
find that key, we move on  |10| |18| |25| |33| |42| |47| |57 60| |66| |74| |81|
to the 1-key node labeled  ---- ---- ---- ---- ---- ---- ------- ---- ---- ----
50.  Following our rules
for 1-key nodes, we fuse 50 with its sibling and parent to create a new 3-key
root labeled "20 xx 50".

有没有可能父节点不是根节点,且也只有一个 entry 的情况?

不可能,因为我是从上往下检查的,如果父节点只有一个,那肯定在检查到他时就已经处理了。

Next, we visit the node                     ----------
labeled 43.  Again                          |20 xx 62|
following our rules for                 /--------------------\
1-key nodes, we rotate            /----/    /       \         \-----\
62 from a sibling to the      ----      ----      -------            -------
root, and move 50 from        |14|      |32|      |43 50|            |70 79|
the root to the node          ----      ----      -------            -------
containing 43.                /  \      /  \     /   |   \           /  |  \
                           ---- ---- ---- ---- ---- ---- ------- ---- ---- ----
                           |10| |18| |25| |33| |42| |47| |57 60| |66| |74| |81|
                           ---- ---- ---- ---- ---- ---- ------- ---- ---- ----

Finally, we move down to                    ----------
the node labeled 42.  A                     |20 xx 62|
different rule for 1-key               /--------------------\
nodes requires us to             /----/        /  \          \-----\
fuse the nodes labeled       ----      -------/    \------          -------
42 and 47 into a 3-key       |14|      |32|           |50|          |70 79|
node, stealing 43 from       ----      ----           ----          -------
the parent node.             /  \      /  \           /  \          /  |  \
                          ---- ---- ---- ---- ---------- ------- ---- ---- ----
                          |10| |18| |25| |33| |42 43 47| |57 60| |66| |74| |81|
                          ---- ---- ---- ---- ---------- ------- ---- ---- ----

The last step is to remove 42 from the leaf and replace "xx" with 42.

总结

                                                                 |                        /--> yes ====> 1+1+1 = 3
                                                                 | parent is 1-key root ? |
      /->leaf    ---->|                                     /--->|                        \--> No  ====> 偷父合兄
      |               |                              /-> yes
target|               | adjecent sibling 1-key node ?
      |               |                              \-> No
      \->internal---->|                                     \===> 兄->父->我

Running Times of 234-tree

A 2-3-4 tree with height h has between 2^h and 4^h leaves. If n is the total number of entries (including entries in internal nodes), then n >= 2^(h+1) - 1. By taking the logarithm of both sides, we find that h is in O(log n).

The time spent visiting a 2-3-4 node is typically longer than in a binary search tree (because the nodes and the rotation and fusion operations are complicated), but the time per node is still in O(1).

The number of nodes visited is proportional to the height of the tree. Hence, the running times of the find(), insert(), and remove() operations are in O(h) and hence in O(log n), even in the worst case.

Compare this with the Theta(n) worst-case time of ordinary binary search trees.

Another Approach to Duplicate Keys

Rather than have a separate node for each entry, we might wish to collect all the entries that share a common key in one node. In this case, each node's entry becomes a list of entries.

This simplifies the implementation of findAll(), which finds all the entries with a specified key. It also speeds up other operations by leaving fewer nodes in the tree data structure. Obviously, this is a change in the implementation, but not a change in the dictionary ADT.

______________________
|  |  |  |  |  |  |  |
----|-----------------
    V
    ---   ---   ---
    |3|-->| |-->| |-->...  this is the entries with key = 3
    ---   ---   ---
    |
    v
    ---   ---   ---
    |7|-->| |-->| |-->...  this is the entries with key = 7
    ---   ---   ---

This idea can be used with hash tables, binary search trees, and 2-3-4 trees.

AVL TREE

AVL tree is a height-balanced BST

  1. 找到 avl-tree 深度的上界,并且不断 sharp 上界
    1. 开始用递归估算法,然后用给假设演绎证明法
  2. 找出 avl-tree 的三个 structure features
  3. 通过 3 个 features 找到与深度有关的更多 features
  4. insert(),need v,x,y,z(1st unbalanced when find from inserted node:v, y is z's child, x is y's child, in this line) rotation,
    1. 要求是 middle key endedup being at thetop
    2. ll or rr -> rotate once; rotate(y,z)
    3. lr or rl -> rotate twice; rotate(x,y), rotate(x,z)
    4. 所有的 rotate 操作都只需要 constant time,O(1),因为只需要改变几个链接就行,有大量的保持不变
  5. delete(),很麻烦,-1 的效果有可能一直上传,而不仅仅是 xyz 三者,所以找到 z(fist unbalance node),rotate to balance this subtree, then if this balance tree's height -1, then go up to next unbalance node, which influenced by the rebalanced tree whose height reduce 1.

Red-Black Tree-1

what we want is using R&B tree as BST, but not losing its red and black property

R&B-tree 如何保持平衡的呢,如果他只衡量黑 node 的个数作为高度?

最高(所有点都算)是 2*|黑点| 最矮的情况是 1*|黑点|保证通往某个点的每一条路线上经过的黑点数量一致所以 rotate 或 recolor 之后,仍需要保证两点:所有的父辈的路线高度原来是多少,处理之后还是多少。

Red-Black tree is a kind of BST case (1),just throw problem up 2 levels, (2)(3) really solver the problem.

so case(1) will throw problem to his grandpa, grandpa go on checking and throwing up, until encount (2) (3).

case (2) is bad orientation, (3) is a good orientation.

case (2) must rotate to get to case (3)

insert, we can ONLY insert red, because black will change the black-height: Key point is check uncle is Red or Black

1) insert a red ~left of right~ whose ~[father, uncle] -> Red~ , ~grandfather -> Black~.
   recolor(~[fater,uncle,grandfather] -> opposite~), then goto (1)or (2)or (3)
2) insert a red, R(L) of L(R), whose ~father -> Red~, ~[uncle, grandfater] -> Black~
   [zigzag or zagzig]: rotate to (3)
3) insert a red, R(L) of R(L), whose ~father -> Red~, ~[uncle, grandfater] -> Black~
   [zigzig or zagzag]: recolor(~[father,grandfather] -> opposite~) rotate

Red-Black Tree-2

不但每一个 external 节点的 black height 一样,每一个子树的 black height 也一样。跟 AVL 树一样,都要统计某种高度,这里就是 black-height, avl 是统计 balance height.

  • balance height 的统计方法是,每个 node 都算 1, avl 树要求,每个节点的左右子树 balance height 相差最多为 1
  • black height 的统计方法时,上面有几个 black-node, r&b 树要求所有 external node 的 black height 都一样

delete() 跟 BST 一样的概念,就是你删除的实际是:leaf(can be Red or Black) or a leaf' parent(by analysis, it ONLY has one situtation: parent is Black and childern is Red),如果某个节点只有左孩子或右孩子,

[注意] 这里的 delete 只是删除 r or b 节点,不是指删除 external node。 在 RB 树中,external node 并不算做节点,节点只有两种:R and B 只有这三种情况,对于最后一种,可以分析 BB,RR,RB,BR 四种组合,最后发现删除操作只会出现在这种组合 BR 中

[注意] 删除一个 rorb 节点,就用一个 external node 代替这个节点。比如图 1



   (1)            (2)                     (3)
  -----          -----                   -----
 (  R  )        (  B  )                 ( B:19)
  --X--          --X--                   --X--
   / \            / \                     / \
  /   \          /   \                   /   \
 /     \        /     \                 /     \
+-+    +-+     +-+    +-+            +-+     -----
+-+    +-+     +-+    +-+            +-+    ( R:17)
                                             --X--
    |             |                           / \
    |             |                          /   \
    |             |                         /     \
    |             |                        +-+    +-+
    |             |                        +-+    +-+
    |             |
    |             |                       |
    |             |                       |
    |             |                       |
    |             |                       V
    |             |                     (2)
    |             |                    -----
    V             V                   (  B  )
                                       --X--
   +-+            ??                    / \
   +-+                                 /   \
                                      /     \
                                     +-+    +-+
                                     +-+    +-+

(1)(3) 情况都好处理,就是(2)会造成 external 的 black height 减少一,所以怎么办,依旧是 rotation。red black 的所有操作都可以和 234tree 对比。基本相互呼应。red black 的删除 delete() 操作对于图(2)的情况,哪个父亲的 black-node 被删除了,就以这个父亲为考虑起点,往下分析,功能分析出六种情况。

summary:lec25 to lec27

Datastructure logical physical feature running time application
Priority Queue Dictionary: - I/O: queue - order by value: Ordered(level-orderd: parent_value < children_value) keys indicate by indices of array Complete(except leaves of bottom level) Binary Tree Array or LinkedList find min fast   event queue, 'key' is time the evet will take place, 'value' is the event
Hash Table Dictionary: Array + LinkedList, array for storing keys, linkedList for entry with, same keys(conflict occur) farst for find    
Binary Search Tree Dictionary: - order by key: Ordered(key: l < parent < r) Binary Tree LinkedList find min/max;find nearst   auto modification for miss spelling( by binary search tree can find nearst around target)
Balanced Search Tree (234-tree) Dictionary: - order by key: - Orderd Complete Binary Tree - one node has 1/2/3 entry - one node has 2/3/4 children LinkedList find min/max it's a little complex to keep balance when insert() or remove() O(log n)