Binary Search Trees
Introduction
Many algorithms make use of datastructures that represent dynamic sets, that is, a collection of elements that can grow, shrink, or otherwise change, over time. Stacks, queues, priority queues, and dictionaries may all be viewed as dynamic sets. If algorithms are to make use of dynamic sets without efficiency worries, it is important that the appropriate data structures are carefully chosen. The choice of implementation may be affected by the particular types of element involved, and by the relative frequencies of different operations being performed on the dynamic set. In this note we introduce search trees, which support a variety of operations on dynamic sets.
Search Trees
A heap is a vertically-ordered tree; a search tree is horizontally ordered. A binary search tree is a binary tree whose nodes are labelled by items in such a way that in-order traversal of the tree gives an ordered list of items. Searching for an item in a search tree is an O(h) operation, where h is the height of the tree. Balanced trees are important because the height of a balanced tree is O(lgn), where n is the number of nodes in the tree. In this section we look at functions to insert and retrieve elements from a binary search tree without worrying about keeping the tree balanced. Techniques for balancing will be covered later.
Recall the type declaration for a binary tree. We declare it in a signature for later use.
-
signature BTreeSig =
sig datatype ’a Tree = Lf
| Nd of ’a Tree * ’a * ’a Tree
end
We will implement sets of items as search trees of type Item tree, where the type Item is equipped with an ordering, <. A binary tree is a search tree if, and only if, for each internal node Nd(lt, v, rt), every label in the left subtree, lt is less than v, and every label in the right subtree, rt, is greater than v.
The basic idea is to build into our data-structure the divide-and-conquer strategy used in algorithms like quicksort and mergesort. In the quicksort algorithm, we use a pivot to divide the sorting problem into two independent parts. In a binary search tree, each internal node divides the data-structure into two independent parts. We place smaller items in the left sub-tree, and larger items in the right sub-tree. An in-order traversal of a binary search tree gives an ordered list.
Here is a function, based on our first implementation of quicksort, that builds a binary search tree from a list.
-
local
fun divide x (h :: t) =
let val (low, high) = divide x t
in if x < h then (low, h :: high)
else (h :: low, high)
end
| divide _ [] = ([],[])
in
fun mkTree (h :: t) =
let val (x, y) = divide h t
in Nd( mkTree x, h, mkTree y )
| mkTree [] = Lf
end
We can picture the action of quicksort by building this tree, and then producing an in-order traversal—except that in quicksort we don’t actually bother to build the tree. It may pay to build the tree, in order to implement operations, such as member, that involve searching. The function to look for a given element may be written as
-
fun member Lf k = false
| member (Nd(lt, k’, rt) k =
if k < k’ then member lt k
else
if k’ < k then member rt k
else true (* k = k’ *) ;
The cost of a call to member is bounded by the height of the tree; if the tree is balanced, this is O(lgn). The work invested in building the tree is O(nlgn). If we only expect to make O(lgn) calls to member, we might as well use a list (with an O(n) implementation of member) to represent our set. Otherwise, the investment is probably worthwhile.
Insertion To insert a new element, we replace a leaf by a tree with a node containing the new element, and two leaves. We recurse down the tree to find the appropriate position for the new leaf.
-
fun insert (e, Lf) = Nd(Lf, e, Lf)
| insert (e, Nd(lt, r, rt)) =
if e < r then Nd(insert(e, lt), r, rt)
else if r < e then Nd(lt, r, insert(e, rt))
else Nd(lt, r, rt) (* e = r *)
-
local
exception NoChange
fun ins (e, Lf) = Nd(Lf, e, Lf)
| ins (e, Nd(lt, r, rt)) =
if e < r then Nd(ins(e, lt), r, rt)
else if r < e then Nd(lt, r, ins(e, rt))
else raise NoChange
in
fun insert(e, t) = ins(e, t) handle NoChange => t
end
Maximum We can use a binary search tree to implement a priority queue. The largest label in the tree must be found at the end of the right-most branch.
-
fun getmax (Nd(lt, v, Lf)) = (lt, v)
| getmax (Nd(lt, v, rt)) =
let val (r, m) = getmax rt
in (Nd(lt, v, r), m) end
Deletion The delete operation is more interesting. The entry to be deleted may occur anywhere in the tree, we must be able to re-constitute a binary search tree from the remainder. Fortunately, it suffices to consider only one case. If we can write a function join to re-constitute a binary search tree from the two orphan children that remain when we remove the root node of a tree, we can implement delete as follows:
-
fun delete(e, Lf) = Lf
| delete(e, Nd(lt, v, rt)) =
if e < v then Nd(delete(e, lt), v, rt)
else if v < e then Nd(lt, v, delete(e, rt))
else join lt rt
-
fun join Lf x = x
| join x Lf = x
| join lt rt =
let val (l, m) = rmmax lt
in Nd(l, m, rt) end
An implementation of several set operations is provided by the functor TREESET given in Figure 1.
A binary search tree can also be used to support dictionary operations, as shown in Figure 2.
|
We implement a dictionary as a search tree of Key * Item pairs. TREESET provides most of the operations, but we need to access the representation directly to implement lookup. ©Michael Fourman 1994-2006