This is part of the university project at FIIT STU, Bratislava, Slovakia. Goal is to compare different self-balancing trees to each other, as well as to the regular Binary Search Tree.
This project comes with a small interaction program - treeio
. It can be used
to manually insert and search nodes, as well as pretty print the whole tree.
Project contains a Makefile
to compile treeio
program. Default setting is
to compile it with AVL tree, but you can change this behavior:
- You can manually specify default behavior in Makefile.
- You can pass
target=[tree_name]
argument tomake
command.
If you want to compile program with different tree, then you also need to
change #include
statement inside treeio.c
to match tree that you want to
compile. E.g., if you want to compile 2-3 tree, then you would include
2_3tree.h
header and run:
$ make target=2_3tree
Tree type | Target name |
---|---|
BST | bst |
AVL | avltree |
2-3 | 2_3tree |
Red-black | rbtree |
Target names are folder names that contain source files of the given tree.
If you wish to run unit tests, you can add scenarios inside assert file of
the tree and include its header in treeio.c
. You will also be required to
pass additional argument to make
command in shell.
For example, if you've included avltree_assert.h
header into treeio.c
and
wish to run tests. Then you can add function call inside main
:
/* treeio.c */
int main(void)
{
run_internal_tests();
...
}
This will run scenarios specified in assert file.
To compile program with tests enabled, you can run shell command:
$ make assert=true
AVL tree, where AVL stands for its creators - Adelson-Velsky and Landis, is a self balancing binary search tree. It is was the first such tree to be invented.
AVL tree has O(logn)
time complexity for its operations. It is often compared
to red-black tree due to them both being height balanced.
Each node in AVL tree has a balance factor, which is calculated by substracting the height of the left subtree from the height of the right subtree. I.e.:
B(N) = H(R(N)) - H(L(N))
,
where B(N)
is balance factor of the node N
, R(N)
is the right subtree of
the node N
, L(N)
is the left subtree of the node N
, and H(T)
is a
function to calculate height of the subtree T
.
Binary search tree is defined as an AVL tree, if and only if, each node of the
tree has balance factor of 0, 1, or -1. Node N
with balance factor
of -1 is left heavy, node N
with balance factor of 1 is right
heavy, node N
with balance factor of 0 is balanced.
This implementation is based on wikipedia article as well as geeksforgeeks article about AVL tree. This implementation includes only search and insert operations.
For each node we can define a structure:
struct _node {
struct _node *left;
struct _node *right;
char bfactor;
long key;
};
where bfactor
stands for the balance factor.
Search routine is the same as for the regular Binary Search Tree, so we will skip it and go right to the insertion.
Insertion operation of the AVL tree is similar to one of regular binary search tree. Simple recursive traversal will do the trick. Important bit comes after the fact of insertion, which is the rebalancing of tree.
Each time we insert a new node, we calculate balance factor of that node, and if it is not -1, 0, or 1, then we need to rebalance our tree. This can be done with node rotation operations, such as left rotation, and right rotation.
To left rotate node is to set its right subtree to left subtree of its right subtree, and set itself as the left subtree of its right subtree. I.e.:
struct _node *rotate_left(struct _node *node)
{
struct _node *right = node->right;
node->right = right->left;
right->left = node;
set_bfactor(node);
return set_bfactor(right);
}
Right rotation is the same process as the left rotation, but instead of setting right subtree of the node, we set its left subtree to right subtree of its left subtree. And after that, similar to the left rotation, we set the node itself as the right subtree of its left subtree. I.e.:
struct _node *rotate_right(struct _node *node)
{
struct _node *left = node->left;
node->left = left->right;
left->right = node;
set_bfactor(node);
return set_bfactor(left);
}
When new node is inserted to AVL tree, there are 4 cases:
- Left left case.
- Left Right case.
- Right Right case.
- Right Left case.
Based on these cases, we can define rebalancing routine as follows:
struct _node *rebalance(struct _node *node, long key)
{
if (!node)
return NULL;
if (node->left && node->bfactor > 1) {
if (key < node->left->key)
return rotate_right(node);
node->left = rotate_left(node->left);
return rotate_right(node);
}
if (node->right && node->bfactor < -1) {
if (key > node->right->key)
return rotate_left(node);
node->right = rotate_right(node->right);
return rotate_left(node);
}
return node;
}
2-3 tree is a self-balancing search tree, where each node (often referred to as the internal node) has either 2 children and 1 key or three children and 2 keys.
Leaves of 2-3 tree are always on the level, meaning that tree is always balanced. Traversal algorithm of 2-3 tree goes as follows:
- Let
T
be a 2–3 tree andd
be the data element we want to find. IfT
is empty, thend
is not inT
and we're done. - Let
r
be the root ofT
. - Suppose
r
is a leaf. Ifd
is not inr
, thend
is not inT
. Otherwise,d
is inT
. In particular,d
can be found at a leaf node. We need no further steps and we're done. - Suppose
r
is a 2-node with left childL
and right childR
. Let e be the data element inr
. There are three cases:- If
d
is equal to e, then we've foundd
inT
and we're done. - If
d < e
, then setT
toL
, which by definition is a 2–3 tree, and go back to step 2. - If
d > e
, then setT
toR
and go back to step 2.
- If
- Suppose
r
is a 3-node with left childL
, middle childM
, and right childR
. Leta
andb
be the two data elements ofr
, wherea < b
. There are four cases:- If
d
is equal toa
orb
, thend
is inT
and we're done. - If
d < a
, then setT
toL
and go back to step 2. - If
a < d < b
, then setT
toM
and go back to step 2. - If
d > b
, then setT
toR
and go back to step 2.
- If
To insert a new node to the tree, we search for the correct position for the node with traversal algorithm and insert it there. If the node becomes a 4-node, then we split it into 2 nodes, promoting middle key to the parent node. This process repeats recursively until we reach the 2-node node, or the root, in which case we split the root as well, making a new one out of the middle key.
This implementation is based on paperwork, which is part of the CS230 Data Structures course at the Princeton University. This implementation includes only search and insert operations.
For each internal node we can define a structure:
struct _node {
bool isfull;
int low_key;
int high_key;
struct _node *left, *middle, *right;
struct _node *parent;
};
where isfull
indicates whether node is 3-node (has 2 keys and 3 children).
Search routine for the 2-3 tree is similar to the regular BST, only difference being that we have an extra condition for the recursive call when the node is a 3-node, and search value is between keys of the node. I.e.:
struct _node *search(struct _node *node, int key)
{
if (!node)
return NULL;
if (key == node->low_key || (node->isfull && key == node->high_key))
return node;
if (key < node->low_key)
return search(node->left, key);
if (node->right && key > node->high_key)
return search(node->right, key);
return search(node->middle, key);
}
Insertion is the most complex part of this implementation. Steps are the same as described in the Introduction. After traversing the tree and finding a spot for the new node, we insert the key:
struct _node *_insert_key(struct _node *node, int key)
{
if (node->isfull) {
node = _split(node, key);
} else {
node = _add_key(node, key);
}
return _get_root(node);
}
If node is not full, then we just compare new key with the existing key and
insert it to the correct position. If new key is a duplicate, then we throw
EDUPNODE
(error duplicate node) and simply return unchanged node:
Throw is a bit misleading, since it will only print an error message'
struct _node *_add_key(struct _node *node, int key)
{
if (key == node->low_key) {
throw(EDUPNODE);
return node;
}
if (node->low_key < key) {
node->high_key = key;
} else {
node->high_key = node->low_key;
node->low_key = key;
}
return (node->isfull = true, node);
}
But if node is full, then we are at the step 5 of our insertion algorithm, meaning we have to split our node. To do so, we sort all 3 keys (two keys of the node, and the key that is being inserted) and create 2 new nodes from the smallest and the largest keys. The middle size key gets promoted to the parent:
struct _node *_split(struct _node *node, int key)
{
if (key == node->low_key || key == node->high_key) {
throw(EDUPNODE);
return node;
}
int *keys = _sort_keys(node->low_key, node->high_key, key);
int prom_key = keys[1];
struct _node *left, *middle;
left = _node((struct _node) {
.isfull = false,
.low_key = keys[0],
.parent = node->parent
});
middle = _node((struct _node) {
.isfull = false,
.low_key = keys[2],
.parent = node->parent
});
free(keys);
return _merge_with_parent((struct _merge_args) {
.node = node,
.left = left,
.middle = middle,
.prom_key = prom_key
});
}
After that we need to merge our 2 new nodes with the parent node, as well as add the promoted key to its list of keys. This step can be divided into 3 scenarios:
- Node does not have a parent.
- Parent node is not full (2-node).
- Parent node is full (3-node).
In the first case, we simply return a new node, which has a single key (promoted key) and two children (newly made 2 nodes).
In the second case, we add promoted key to the parent and attach 2 new children alongside his single child.
In the third case, we split the parent itself and promote its middle key to its parent.
As expectedly, this process repeats recursively until we reach the first 2-node node, or the root.
struct _node *_merge_with_parent(struct _merge_args args)
{
__DESTRUCT_MERGE_ARGS__;
if (!node->parent) {
left->parent = middle->parent = node;
return _update_node(node, (struct _node) {
.isfull = false,
.low_key = prom_key,
.left = left,
.middle = middle
});
}
struct _node *parent = node->parent;
free(node);
if (!parent->isfull) {
if (parent->low_key < prom_key) {
parent->high_key = prom_key;
parent->middle = left;
parent->right = middle;
} else {
parent->high_key =parent->low_key;
parent->low_key = prom_key;
parent->left = left;
parent->right = parent->middle;
parent->middle = middle;
}
return (parent->isfull = true, parent);
}
int merge_prom_key;
struct _node *parent_left, *parent_middle;
if (prom_key < parent->low_key) {
parent_left = _node((struct _node) {
.isfull = false,
.low_key = prom_key,
.left = left,
.middle = middle,
.parent = parent->parent
});
parent_middle = _node((struct _node) {
.isfull = false,
.low_key = parent->high_key,
.left = parent->middle,
.middle = parent->right,
.parent = parent->parent
});
merge_prom_key = parent->low_key;
} else if (prom_key > parent->high_key) {
parent_left = _node((struct _node) {
.isfull = false,
.low_key = parent->low_key,
.left = parent->left,
.middle = parent->middle,
.parent = parent->parent
});
parent_middle = _node((struct _node) {
.isfull = false,
.low_key = prom_key,
.left = left,
.middle = middle,
.parent = parent->parent
});
merge_prom_key = parent->high_key;
} else {
parent_left = _node((struct _node) {
.isfull = false,
.low_key = parent->low_key,
.left = parent->left,
.middle = left,
.parent = parent->parent
});
parent_middle = _node((struct _node) {
.isfull = false,
.low_key = parent->high_key,
.left = middle,
.middle = parent->right,
.parent = parent->parent
});
merge_prom_key = prom_key;
}
return _merge_with_parent((struct _merge_args) {
.node = parent,
.left = parent_left,
.middle = parent_middle,
.prom_key = merge_prom_key
});
}
Red-black tree is a self-balancing binary search tree, where each node is colored either red or black:
- Root of the red-black tree is always black.
- Each leaf is colored black.
- If node is red, then both of its children are black.
- Every path from a given node to any of its descendant NIL nodes goes through the same number of black nodes.
Like AVL tree, Red-black tree is height balanced. The balancing rule is:
The path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf
This implementation was taken from www.programiz.com and it serves only for comparison purposes. Although, several alterations to the source code had to be made. These were primarily to adapt code for the testing and comparison. Important, these changes did not affect unerlying logic nor performance of the implementation:
- global variable
root
was removed; insertion
function was renamed toinsert
, and was tweaked to work without global variable;deletion
function was renamed todelete
and tweaked to work without global variable (although this function iss never used);- additional
search
function was implemented; - additional
print_node
function was implemented; - additional
print
function was implemented.
This implementation defined an internal node as a structure:
struct rbNode {
int data, color;
struct rbNode *link[2];
};
All three implementations were tested for time efficiency during insertion and
search with set of keys listed in the randnumbers
file in the data
directory
of the project. Results were following:
Tree type | Average insertion time | Average search time | Node size |
---|---|---|---|
BST | ~0.000060s | ~0.000002s | 24 bytes |
AVL | ~0.000160s | ~0.000001s | 24 bytes |
2-3 | ~0.000050s | ~0.000001s | 48 bytes |
Red-black | ~0.000052s | ~0.000002s | 24 bytes |
As we can see, the fastest insertions were in 2-3 and Red-black trees, although 2-3 tree requires twice the size of memory for each node compared to red-black or the AVL trees.
Slowest insertions were in the AVL tree, but in comparison to the red-black tree, AVL has better height balance, thus look up operations are faster inside AVL tree.
Nearly each function of the implementations was tested with unit tests (except
for the Red-black tree and the regular BST). Test functions including some test
scenarios can be found in the same directory of the source code of the
implementation. I.e., tests for the 2-3 tree implementation, are located in
2_3tree/2_3tree_assert.c
.
- en.wikipedia.org, AVL tree
- www.geeksforgeeks.org, AVL tree | Set 1 (Insertion)
- en.wikipedia.org, 2-3 tree
- CS230 Data Structures, Prof. Lyn Turbak, 2-3 Trees
- en.wikipedia.org, Red-black tree
- www.programiz.com, Red-black tree
2020, FIIT STU, Bratislava, Slovakia.