diff options
Diffstat (limited to 'book.tex')
| -rw-r--r-- | book.tex | 125 |
1 files changed, 124 insertions, 1 deletions
@@ -1269,11 +1269,134 @@ which was the claim we needed to prove. \chapter{Integer Structures} +\topics{van Emde Boas, x-fast trees, y-fast trees, fusion trees.} -van Emde Boas, x-fast trees, y-fast trees, fusion trees. +In this chapter we look at data structures operating on integers in the word machine. +By limiting ourselves to only act on integers we can achieve time bounds as a function of the word size. + +The general problem we want to solve is \algo{Predecessor}: we maintain a structure containing $n$ words, +and would like to support insertion, deletion, and predecessor and/or successor. +The universe $U$ we get the integers from is asusmed to be a power of two $2^w$; we call $w$ the word size. +We also assume that $n \leq \log w$, that is, that the number of elements in the structure is representable in a word. + +In a comparison based model, the precesessor problem has a lower bound of $\Theta(\log n)$. +Table~\ref{tab:integer-structures} summarizes the structures we will look at, with time and space complexities. +Note that vEB with hashing or y-fast trees are faster than fusion trees if $w$ is small. + +\begin{table}[ht] + \centering + \begin{tabular}{c c c} + Name & Operation & Space\\\midrule + van Emde Boas Trees & $O(\log w)$ & $\Theta(U)$\\ + vEB + Hashing & $O(\log w)$ whp. & $\Theta(n)$\\ + y-fast trees & $O(\log w)$ whp. & $\Theta(n)$\\ + Fusion trees & $O(\log_w n)$ whp. & $\Theta(n)$\\ + \end{tabular} + \caption{% +\label{tab:integer-structures} + The integer structures we look at in this chapter. + } +\end{table} \section{van Emde Boas Tree}% \label{sec:vEB} +The general idea of the van Emde Boas Tree is to try to obtain the time recurrence $T(n) = T(\sqrt(n)) + O(1)$. +We can do this by splitting the universe into $\sqrt{u}$ clusters, each of size $\sqrt{u}$, and recurse on the cluster. +For universes that are a power of 2, this means splitting the bits of the number if half. +Consider a word $a = \big<c, i\big>$; +we call the most significant bits $c$ and the least significant bits $i$. +This is simply to do in the word machine: we simply mask out the lower bits to get $i$, and we shift down and mask to get $c$. + +The van Emde Boas tree is a recursive structure. +At each level we have four things: +$\sqrt{u}$ \emph{Clusters}, a vEB trees of size $\sqrt{u}$ which signals which of the clusters are empty; +one \emph{Summary}, which is also a vEB tree of size $\sqrt{u}$; +the minimum element in the tree; and the maximum element in the tree. +The minimum element is not stored elsewhere in the tree, but the maximum element is. + +\subsection{Operations} +With this layout, we can see how we will do the recursion: +we index the cluster using $c$, the most significant bits, and then continue by using $i$ in the next query. +We will call the vEB tree $v$, such that $v.cluster[0]$ is the first cluster in the current tree. +We assume no key duplication. + +\subsubsection{Successor} +If the queried element is smaller than $v.cluster[c].max$, we find the successor for $i$ in $v.cluster[c]$. +When returning we also have to ``rebuild'' our number, by appending $c$ in front. +If the element is larger than $v.max$, we have to find the next non-empty cluster, and select the minimum element in it, since +this will be our successor. Note that we know one such element will exist, since $x < v.max$. +To find this we do a sucessor query of $c$ in $v.summary$. +When rebuilding we now have to use $c'$, the successor of $c$ in the summary, instead of $c$. + +In total, we get one recursive call in either case. + +\subsubsection{Insert} +At first this might seem simple. We find the correct cluster, and recurse into it. However, we also need to update the summary, if +the cluter was empty. +This makes for \emph{two} recursive calls, which we cannot afford, since $T(u) = 2T(\sqrt{u}) + O(1)$ solves to $O(\log u)$, rather than $O(\log \log u)$ which is what we want ($w = \log u$). +However, we observe that if the cluster is empty, the insert call to it will be trivial since we only set $v.min$ which is not stored elsewhere in the tree. + +\subsubsection{Delete} +Deleting an element is slightly more complex. +First we check if $x = v.min$. +If the tree only contains one element we remove $min$ and $max$, and return; +else we take the min from the first cluster, which we find by taking the min in the summary, +and set \emph{our} min to be this min. +Since min keys are only stored in one place, we continue with $x = v.min$; +that is, by overwriting $v.min$ we deleted the original queried key, but +we have to clean up by deleing the key we just copied. + +Now we are either still looking for the original $x$, or we have changed $x$. +In any case, we recurse on cluster $c$, and delete $i$. +After deleting, we have to check if the cluster we recursed on got its $min$ removed, +because we then have to mark it in the summary. +Like with insert, we risk having two recursive calls here, but again one of these calls will be trivial; +this time it is the first call that is trivial, since if we removed the last element in the tree we just +set $v.max = v.min = None$, and did no further recursing. + + +\section{Saving Space} +The strucutre presented in Section~\ref{sec:vEB} takes $O(U)$ space, which is a lot. +We look at ways to reduce this. + +The obvious way to shave off some space is to not store the clusters in an array, but in a hash table. +This way we only pay for the tables we actually use. +Somehow, this gives us a space bound of $O(n \log w)$, which can be improved to $O(n)$ using indirection. + +\subsection{Tree View} +We can look at the van Emde Boas tree in a different view. +Consider a balanced binary tree of height $w$, where the leaf nodes are element markers (that is, the path from the root to a leaf is the number, +and the leaf is either 0 or 1, if the element is not or is in the tree). +The internal nodes are the \texttt{OR} of their children. +The upper half (in terms of height) of the tree can be considered at the Summary, and each subtree as one cluster. + +In this structure, updates are $O(w)$. Queries can be done faster by noticing that a path from a leaf to the root is monotone: +it is a string of 0s followed by 1s (unless the tree is empty), +so we can binary search it to find the transition point. +Then, we can look at the other child of the node that has the first $1$ in the path, +and get the min or max, depending on if it is to the left or right. +If all subtrees store this, this is constant. +What if we are looking for successor, but get predecessor from this method? +If all 1 leaves store pointers to the next and previous 1 leaves, this can also be done in constant time. + +\subsection{X-fast Trees} +We take inspiration from the Tree view in the previous section to build a X-fast tree, +where we store all of the 1s in the tree, as binary strings, in a hash table. +This lets us perform the binary search. +Updates are still $O(\log w)$, since we have to search through the path, which is $w$ long, +queries can use the same binary search trick, to get a time of $\Theta(\log w)$, and +space is $O(nw)$. + +\subsection{Y-fast Trees} +We use indirection on the X-fast tree to get faster updates and smaller space. +Split the tree into two structures, an upper and lower structure. +The upper structure consists of $O(n/w)$ of the tree nodes, and the bottom ones are BBSTs of size $O(\log w)$. +Queries are $O(\log w)$ in both structures, but we make updates $O(\log w)$ amortized, since +for each $w$ update in the bottom stucture we only need one update in the top structure. +Now the space is only $O(n/w w + n) = O(n)$. Linear space! + +\section{Fusion Trees} +\todo{write this :)} \chapter{Succinct Structures} Rank, Select |
