From 931388cf620d1ab347a3d7ffdd1063b35f8e6dc0 Mon Sep 17 00:00:00 2001 From: Martin Hafskjold Thoresen Date: Sat, 22 Jul 2017 16:36:37 +0200 Subject: Finish Fusion trees --- book.tex | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 153 insertions(+), 1 deletion(-) (limited to 'book.tex') diff --git a/book.tex b/book.tex index a97f3f6..c25fdc2 100644 --- a/book.tex +++ b/book.tex @@ -1396,7 +1396,159 @@ for each $w$ update in the bottom stucture we only need one update in the top st Now the space is only $O(n/w w + n) = O(n)$. Linear space! \section{Fusion Trees} -\todo{write this :)} +Fusion trees are more a hypotethical structure: is is fast when the word size is very large. +We want a strcuture with $O(\log_w n)$ operations. +Thus, using fusion trees or vEB trees we can get a time bound of $\min\{{\log_w n, \log w}\} \leq \sqrt{\log n}$. +We look at the static version of fusion trees; a dynamic version is possible with $O(\log_w n + \log\log n)$ time operations. + +The general idea of fusion trees is to have a B-tree with a branching factor of $w^{1/5}$. +This will make the height of the tree $h = \Theta(\log_w n)$. +However, simply searching through the keys will be to slow --- we need $O(1)$ time in each node +(binary search in each node would be $\Theta(\log w \log_w n )$). +Let $k = w^{1/5}$ be the number of keys stored in each node, and let $x_0 < x_1 < \cdots < x_k$ be the keys in the node. +We would like to pack the bits in $O(w)$ space, and have $O(1)$ predecessor/successor queries on $x_i$. +Note that the queried key $x'$ does not have to be one of the stored keys $x_i$. +We will allow polynomial preprocessing. + +\subsection{Overview} +We look at how to approach this problem. +The general idea follows three steps: key storage, node search, and key retrieval. +We find a way to store the keys in the given space, then a way to find the predecessor of the queired key, and then +a way to get back the actual key stored in the node --- since we cannot simply store all keys, we have stored a different representation. + +\subsubsection{Distinguishing the Keys} +We make a mental trie of the bits of all keys in the node, and note that certain levels are more ``interresting'', +namely the ones with branching nodes, of which there are $k-1$. +Let $b_i$ be important bit $i$, and let $sketch(x)$ be all interesting bits from $x$. +We claim that $sketch(x_i) < sketch(x_{i + 1})$. +To see why, we consider the first different bit in the numbers. Since $x_i < x_{i + 1}$ this bit have to be +0 for $x_i$ and 1 for $x_{i+1}$. Since this is a branch, this bit is a part of the sketch, so $sketch(x_i)$ is the smaller. + +Since each sketch is $k$ bits long, and there are $k$ of them, we can fit all sketches into a single word ($k^2 = {(w^{1/5})}^2 = w^{2/5}$). +This packing is computable in $O(1)$ time. +We would like to somehow search for our queried element in this packing. + +\subsubsection{Node Search} +For a query $q$ we would like to compare it ``in parallel'' to all $x_i$s, in $O(1)$ time. +While this might seem obviously impossible, we remind ourselves that the size of all $x_i$s in our representation +is $O(w)$, so all standard operaions are done in constant time for the entire set. + +\subsubsection{Desketchifying} +So we find out that $sketch(x_i) \leq sketch(q) < sketch(x_{i + 1})$; now what? +Simply retrieving $x_i$ somehow might not be the correct answer: +if $q$ branches on some level that is not considered important by $x_i$, we have effectively ignored a bit of $q$ in its $sketch$. +Consider $\mathbf{x} = [\texttt{\underline{0}0\underline{0}0}, \texttt{\underline{0}0\underline{1}0}, \texttt{\underline{1}1\underline{0}0}, \texttt{\underline{1}1\underline{1}1}]$, +where the important bits are underlined. If we query for $q=\texttt{0101}$, we get $sketch(q) = \texttt{00}$, so $sketch(x_0) = sketch(q)$ even though $x_1 < q$! +Even worse, this is not a off-by-one error; they can be very far from each other. + +However, all hope is not lost. We can look at the \LCA{} of $x_i$ or $q$ and the \LCA{} of $x_{i+1}$ and $q$ (choose the longer): +this is the nodes where we first fell off the tree. +In our example above, we went right when both $x_0$ and $x_1$ went left. +Then we know that the subtree we are in after the \LCA{} node contains no keys. +If there had been a key there, we would have gotten this key as either $x_i$ or $x_{i+1}$, +depending on the other important bits of the key. +So now, if we are looking for the predecessor, we can simply find the max key in the left subtree of the \LCA{} node. +We can do this by searching for $e=y0111\dots1$ in the left tree. In contrary to the $q$ case, this will work, since +any sketch we search for will be less than the sketch of $e$, since it will only contain 1s. +Therefore, we will get the max sketch in the subtree, which is the max value\footnote{Note that the reason the sketch search failed for $q$ was that +it branched on some bit that was not in the sketch. With $e$, we only care about getting the maximum/minimum sketch.}. +If we are looking for the successor and went left when the $x_i$s went right, $e=y100\dots0$. + +\subsubsection{} +So to sum up the big picture: +we start out by computing $sketch(q)$. Then we find $i$ such that $sketch(x_i) \leq sketch(q) < sketch(x_{i + 1})$, +and set $y=\mathcal{LCA}(q, x_i) or \mathcal{LCA}(q, x_{i+1})$, depending on which is the longer. +We then compute $e$, find $sketch(e)$, and at last find $j$ such that $sketch(x_j) \leq sketch(e) < sketch(x_{j + 1})$, +We claim that this will give us the right answer, for both successor and predecessor queries. + +However, there are a lot of details that needs explanation. +How do we make the sketches? How do we search in the $x_i$s? +What about computing the \LCA{}? +We look at these questions, in order. + +\subsection{Approximate Sketch} +We remember that the perfect sketch takes the exact bits we care about, and packs them right after another, +using $O(w^{2/5})$ space. +How can we do this? +We could start off by masking out the important bits; this is easy. +However, packing them together to be consecutive is harder. +Maybe we do not need to perfectly pack them together? We can allow to have some zeroes in between the important bits, +if they are in a predictable pattern, since this does not change the ordering of the sketches. +We will make an \emph{approximate sketch}, of size $O(w^{4/5})$ bits, making the total bits in a node $O(w)$, which is our limit. + +We start out by masking out the important bits: +\[x' = x \texttt{ AND } \sum\limits^{r-1}_{i = 0} 2^{b_i} = \sum\limits^{r-1}_{i = 0} x_{b_i} 2^{b_i}\] +In order to pack the bits somewhat together, we will use multiplication. +%Recall that binary multiplication is basically shifting and adding for each high bit in one of the operands: +\[x' \cdot m = (\sum\limits^{r-1}_{i = 0} x_{b_i} 2^{b_i}) (\sum\limits^{r-1}_{j = 0} 2^{m_j}) = +\sum\limits^{r-1}_{i = 0} \sum\limits^{r-1}_{j = 0} x_{b_i} 2^{b_i + m_j} +\] +That is, $b_i + m_j$ is something. +We claim that there exist an $m$ such that +\begin{enumerate} + \item\label{en:as-distinct} $b_i + m_j$ are all distinct, so we avoid collisions, + \item\label{en:as-order} $b_0 + m_0 < \dots b_{r-1} + m_{r-1}$, so we preserve ordering, and + \item\label{en:as-small} $b_{r-1} + m_{r-1} - b_0 + m_0 = O(r^4) = O(w^{4/5})$, so the number is small. +\end{enumerate} +Then +\[approx\_sketch = \big[(x \cdot m) \texttt{ AND } \sum\limits^{r-1}_{i=0}2^{b_i + m_i}\big] \gg (b_0 + m_0)\] +Note that we have discarded all terms where $i \neq j$. + +We first consider~\ref{en:as-distinct}. +Choose $m'_0, \dots m'_{r-1} < r^3$, such that all $b_i + m'_j$ are distinct mod $r^3$. +Do the choosing by induction. When we want to pick $m'_t$, +we must avoid all numbers equal to $m'_i + b_j - b_k\ \forall_{i, j, k}$ (we moved $b_i$ from the left to $b_k$ on the right). +$i$ ranges from 0 to $t$, and $j$ and $k$ ranges from 0 to $r$, so the total numbers we must avoid is $tr^2$. +But $t < r$, and we are working $\mod r^3$, so there is a number that is available. + +Next, we let $m_i = m'_i + (w - b_i + ir^3) \text{ rounded down to a multiple of }r^3$. +We use the $ir^3$ part to spread the bits out, since each $m_i < r^3$; this achieves~\ref{en:as-order}. +In order not to mess up the collision freeness we have achieved, we need the term we add to be +a multiple of $r_3$, so that $m_i \equiv m'_i \mod r^3$. +\todo{Not sure why we need $-b_i$, since $+ir^3$ guarantees ordering?}. +Since $m_{r-1} = O(r^4)$ and $m_0 \approx m'_0 < r^3$ and $m'_0 \geq 0$, we get $m_{r-1} - m_0 = O(r^4)$. + +\subsection{Parallel Comparison} +Now all sketches for all keys in a node takes $O(w)$ space. We want to search in all of them, in constant time. +We use subtraction to compare the keys. +Let $sketch(node) = 1\ sketch(x_0)\ 1\dots1\ sketch(x_{k-1})$, and let $sketch{(q)}^k = 0\ sketch(q)\ 0\dots0\ sketch(q)$; +this can be computed by multiplying the sketch of q with $00^n10^n1\dots0^n1$, where $n$ is the length of a sketch minus 1. +Now we can look at $sketch(node) - sketch{(q)}^k$: +the 1 bits in front of the sketch will be zero if and only if $sketch(q) > sketch(x_0)$. We can mask out these bits. +Note that since the $x_i$s are in order, this sequence is monotone, and of the form $0^a1^b$. + +Now we are interested in knowing where the transition is. +The way we will do this is to multiply the masked number with $0^n1\dots0^n1$: +this multiplication will make all of the 1s in the number to hit the same bit in the product, +which will be summed. +That is, instead of finding the position of the transition, we simply count the number of high bits. +This is the number of $x_i$s that are larger than $q$. +Note that we also have space in the sketch room for the entire sum, since the length of a sketch is at most $k - 1$, +and we only have at most $k$ 1s to sum. + +\subsection{LCA} +Lastly, we need to compute the LCA for our mental tree. +This is equivalent to the first set bit in the \texttt{XOR} of the two numbers, also called the most significant bit. +We split the word into $\sqrt{w}$ clusters of $\sqrt{w}$ bits each. +Then we find the non-empty clusters: get the \texttt{msb}s of the clusters, and clear the \texttt{msb}s from $x$. +Subtract the cleared $x$ from the \texttt{msb} mask, and mask out all bits except the \texttt{msb}. +\texttt{XOR} with the mask again, to flip the \texttt{msb}s. +Now, each block is either $\texttt{00\dots00}$ or $\texttt{10\dots00}$, depending on wether the bits except \texttt{msb} was 0 or not. +The special case we still need to handle is when a cluster is $\texttt{10\dots00}$. \texttt{OR} in the \texttt{msb}s, to fix this. +Now all non-empty clusters are $\texttt{10\dots00}$, and empty clusters are $\texttt{00\dots00}$. + +Next we need to find the first non-empty cluster, and then find the first set bit in that cluster. +To do this we would first like to compress the bits into a string of just the bits we care about, of length $\sqrt{w}$. +Now we use \emph{perfect} sketch (details omitted). +Then, to find the most significant set bit, we use parallel comparison, by comparing the sketch repeated to one-hot strings of length $\sqrt{w}$. +By finding the largest power of two the string is greater than we find the \texttt{msb}. +Note that since the size of the sketch is $\sqrt{w}$, and there are $\sqrt{w}$ one-hot strings of length $\sqrt{w}$, +the total space for all the one-hot strings is $w$, so it fits. + +Now we have found the first cluster that has a set bit. Next, we go to that cluster, and use the exact same method again to find the +\texttt{msb} there. This works since the sketch and the cluter are the exact same size. +At last, we combine the two indices (cluster index and internal index to the cluster) to get the global index. + \chapter{Succinct Structures} Rank, Select -- cgit v1.2.3