From 931388cf620d1ab347a3d7ffdd1063b35f8e6dc0 Mon Sep 17 00:00:00 2001
From: Martin Hafskjold Thoresen <martinhath@gmail.com>
Date: Sat, 22 Jul 2017 16:36:37 +0200
Subject: Finish Fusion trees

---
 book.tex | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 153 insertions(+), 1 deletion(-)

(limited to 'book.tex')

diff --git a/book.tex b/book.tex
index a97f3f6..c25fdc2 100644
--- a/book.tex
+++ b/book.tex
@@ -1396,7 +1396,159 @@ for each $w$ update in the bottom stucture we only need one update in the top st
 Now the space is only $O(n/w w + n) = O(n)$. Linear space!
 
 \section{Fusion Trees}
-\todo{write this :)}
+Fusion trees are more a hypotethical structure: is is fast when the word size is very large.
+We want a strcuture with $O(\log_w n)$ operations.
+Thus, using fusion trees or vEB trees we can get a time bound of $\min\{{\log_w n, \log w}\} \leq \sqrt{\log n}$.
+We look at the static version of fusion trees; a dynamic version is possible with $O(\log_w n + \log\log n)$ time operations.
+
+The general idea of fusion trees is to have a B-tree with a branching factor of $w^{1/5}$.
+This will make the height of the tree $h = \Theta(\log_w n)$.
+However, simply searching through the keys will be to slow --- we need $O(1)$ time in each node
+(binary search in each node would be $\Theta(\log w \log_w n )$).
+Let $k = w^{1/5}$ be the number of keys stored in each node, and let $x_0 < x_1 < \cdots < x_k$ be the keys in the node.
+We would like to pack the bits in $O(w)$ space, and have $O(1)$ predecessor/successor queries on $x_i$.
+Note that the queried key $x'$ does not have to be one of the stored keys $x_i$.
+We will allow polynomial preprocessing.
+
+\subsection{Overview}
+We look at how to approach this problem.
+The general idea follows three steps: key storage, node search, and key retrieval.
+We find a way to store the keys in the given space, then a way to find the predecessor of the queired key, and then
+a way to get back the actual key stored in the node --- since we cannot simply store all keys, we have stored a different representation.
+
+\subsubsection{Distinguishing the Keys}
+We make a mental trie of the bits of all keys in the node, and note that certain levels are more ``interresting'',
+namely the ones with branching nodes, of which there are $k-1$.
+Let $b_i$ be important bit $i$, and let $sketch(x)$ be all interesting bits from $x$.
+We claim that $sketch(x_i) < sketch(x_{i + 1})$.
+To see why, we consider the first different bit in the numbers. Since $x_i < x_{i + 1}$ this bit have to be
+0 for $x_i$ and 1 for $x_{i+1}$. Since this is a branch, this bit is a part of the sketch, so $sketch(x_i)$ is the smaller.
+
+Since each sketch is $k$ bits long, and there are $k$ of them, we can fit all sketches into a single word ($k^2 = {(w^{1/5})}^2 = w^{2/5}$).
+This packing is computable in $O(1)$ time.
+We would like to somehow search for our queried element in this packing.
+
+\subsubsection{Node Search}
+For a query $q$ we would like to compare it ``in parallel'' to all $x_i$s, in $O(1)$ time.
+While this might seem obviously impossible, we remind ourselves that the size of all $x_i$s in our representation
+is $O(w)$, so all standard operaions are done in constant time for the entire set.
+
+\subsubsection{Desketchifying}
+So we find out that $sketch(x_i) \leq sketch(q) < sketch(x_{i + 1})$; now what?
+Simply retrieving $x_i$ somehow might not be the correct answer:
+if $q$ branches on some level that is not considered important by $x_i$, we have effectively ignored a bit of $q$ in its $sketch$.
+Consider $\mathbf{x} = [\texttt{\underline{0}0\underline{0}0}, \texttt{\underline{0}0\underline{1}0}, \texttt{\underline{1}1\underline{0}0}, \texttt{\underline{1}1\underline{1}1}]$,
+where the important bits are underlined. If we query for $q=\texttt{0101}$, we get $sketch(q) = \texttt{00}$, so $sketch(x_0) = sketch(q)$ even though $x_1 < q$!
+Even worse, this is not a off-by-one error; they can be very far from each other.
+
+However, all hope is not lost. We can look at the \LCA{} of $x_i$ or $q$ and the \LCA{} of $x_{i+1}$ and $q$ (choose the longer):
+this is the nodes where we first fell off the tree.
+In our example above, we went right when both $x_0$ and $x_1$ went left.
+Then we know that the subtree we are in after the \LCA{} node contains no keys.
+If there had been a key there, we would have gotten this key as either $x_i$ or $x_{i+1}$,
+depending on the other important bits of the key.
+So now, if we are looking for the predecessor, we can simply find the max key in the left subtree of the \LCA{} node.
+We can do this by searching for $e=y0111\dots1$ in the left tree. In contrary to the $q$ case, this will work, since
+any sketch we search for will be less than the sketch of $e$, since it will only contain 1s.
+Therefore, we will get the max sketch in the subtree, which is the max value\footnote{Note that the reason the sketch search failed for $q$ was that
+it branched on some bit that was not in the sketch. With $e$, we only care about getting the maximum/minimum sketch.}.
+If we are looking for the successor and went left when the $x_i$s went right, $e=y100\dots0$.
+
+\subsubsection{}
+So to sum up the big picture:
+we start out by computing $sketch(q)$. Then we find $i$ such that $sketch(x_i) \leq sketch(q) < sketch(x_{i + 1})$,
+and set $y=\mathcal{LCA}(q, x_i) or \mathcal{LCA}(q, x_{i+1})$, depending on which is the longer.
+We then compute $e$, find $sketch(e)$, and at last find $j$ such that $sketch(x_j) \leq sketch(e) < sketch(x_{j + 1})$,
+We claim that this will give us the right answer, for both successor and predecessor queries.
+
+However, there are a lot of details that needs explanation.
+How do we make the sketches? How do we search in the $x_i$s?
+What about computing the \LCA{}?
+We look at these questions, in order.
+
+\subsection{Approximate Sketch}
+We remember that the perfect sketch takes the exact bits we care about, and packs them right after another,
+using $O(w^{2/5})$ space.
+How can we do this?
+We could start off by masking out the important bits; this is easy.
+However, packing them together to be consecutive is harder.
+Maybe we do not need to perfectly pack them together? We can allow to have some zeroes in between the important bits,
+if they are in a predictable pattern, since this does not change the ordering of the sketches.
+We will make an \emph{approximate sketch}, of size $O(w^{4/5})$ bits, making the total bits in a node $O(w)$, which is our limit.
+
+We start out by masking out the important bits:
+\[x' = x \texttt{ AND } \sum\limits^{r-1}_{i = 0} 2^{b_i} = \sum\limits^{r-1}_{i = 0} x_{b_i} 2^{b_i}\]
+In order to pack the bits somewhat together, we will use multiplication.
+%Recall that binary multiplication is basically shifting and adding for each high bit in one of the operands:
+\[x' \cdot m = (\sum\limits^{r-1}_{i = 0} x_{b_i} 2^{b_i}) (\sum\limits^{r-1}_{j = 0} 2^{m_j}) =
+\sum\limits^{r-1}_{i = 0} \sum\limits^{r-1}_{j = 0} x_{b_i} 2^{b_i + m_j}
+\]
+That is, $b_i + m_j$ is something.
+We claim that there exist an $m$ such that
+\begin{enumerate}
+    \item\label{en:as-distinct} $b_i + m_j$ are all distinct, so we avoid collisions,
+    \item\label{en:as-order} $b_0 + m_0 < \dots b_{r-1} + m_{r-1}$, so we preserve ordering, and
+    \item\label{en:as-small} $b_{r-1} + m_{r-1} - b_0 + m_0 = O(r^4) = O(w^{4/5})$, so the number is small.
+\end{enumerate}
+Then
+\[approx\_sketch = \big[(x \cdot m) \texttt{ AND } \sum\limits^{r-1}_{i=0}2^{b_i + m_i}\big] \gg (b_0 + m_0)\]
+Note that we have discarded all terms where $i \neq j$.
+
+We first consider~\ref{en:as-distinct}.
+Choose $m'_0, \dots m'_{r-1} < r^3$, such that all $b_i + m'_j$ are distinct mod $r^3$.
+Do the choosing by induction. When we want to pick $m'_t$,
+we must avoid all numbers equal to $m'_i + b_j - b_k\ \forall_{i, j, k}$ (we moved $b_i$ from the left to $b_k$ on the right).
+$i$ ranges from 0 to $t$, and $j$ and $k$ ranges from 0 to $r$, so the total numbers we must avoid is $tr^2$.
+But $t < r$, and we are working $\mod r^3$, so there is a number that is available.
+
+Next, we let $m_i = m'_i + (w - b_i + ir^3) \text{ rounded down to a multiple of }r^3$.
+We use the $ir^3$ part to spread the bits out, since each $m_i < r^3$; this achieves~\ref{en:as-order}.
+In order not to mess up the collision freeness we have achieved, we need the term we add to be
+a multiple of $r_3$, so that $m_i \equiv m'_i \mod r^3$.
+\todo{Not sure why we need $-b_i$, since $+ir^3$ guarantees ordering?}.
+Since $m_{r-1} = O(r^4)$ and $m_0 \approx m'_0 < r^3$ and $m'_0 \geq 0$, we get $m_{r-1} - m_0 = O(r^4)$.
+
+\subsection{Parallel Comparison}
+Now all sketches for all keys in a node takes $O(w)$ space.  We want to search in all of them, in constant time.
+We use subtraction to compare the keys.
+Let $sketch(node) = 1\ sketch(x_0)\ 1\dots1\ sketch(x_{k-1})$, and let $sketch{(q)}^k = 0\ sketch(q)\ 0\dots0\ sketch(q)$;
+this can be computed by multiplying the sketch of q with $00^n10^n1\dots0^n1$, where $n$ is the length of a sketch minus 1.
+Now we can look at $sketch(node) - sketch{(q)}^k$:
+the 1 bits in front of the sketch will be zero if and only if $sketch(q) > sketch(x_0)$. We can mask out these bits.
+Note that since the $x_i$s are in order, this sequence is monotone, and of the form $0^a1^b$.
+
+Now we are interested in knowing where the transition is.
+The way we will do this is to multiply the masked number with $0^n1\dots0^n1$:
+this multiplication will make all of the 1s in the number to hit the same bit in the product,
+which will be summed.
+That is, instead of finding the position of the transition, we simply count the number of high bits.
+This is the number of $x_i$s that are larger than $q$.
+Note that we also have space in the sketch room for the entire sum, since the length of a sketch is at most $k - 1$,
+and we only have at most $k$ 1s to sum.
+
+\subsection{LCA}
+Lastly, we need to compute the LCA for our mental tree.
+This is equivalent to the first set bit in the \texttt{XOR} of the two numbers, also called the most significant bit.
+We split the word into $\sqrt{w}$ clusters of $\sqrt{w}$ bits each.
+Then we  find the non-empty clusters: get the \texttt{msb}s of the clusters, and clear the \texttt{msb}s from $x$.
+Subtract the cleared $x$ from the \texttt{msb} mask, and mask out all bits except the \texttt{msb}.
+\texttt{XOR} with the mask again, to flip the \texttt{msb}s.
+Now, each block is either $\texttt{00\dots00}$ or $\texttt{10\dots00}$, depending on wether the bits except \texttt{msb} was 0 or not.
+The special case we still need to handle is when a cluster is $\texttt{10\dots00}$. \texttt{OR} in the \texttt{msb}s, to fix this.
+Now all non-empty clusters are $\texttt{10\dots00}$, and empty clusters are $\texttt{00\dots00}$.
+
+Next we need to find the first non-empty cluster, and then find the first set bit in that cluster.
+To do this we would first like to compress the bits into a string of just the bits we care about, of length $\sqrt{w}$.
+Now we use \emph{perfect} sketch (details omitted).
+Then, to find the most significant set bit, we use parallel comparison, by comparing the sketch repeated to one-hot strings of length $\sqrt{w}$.
+By finding the largest power of two the string is greater than we find the \texttt{msb}.
+Note that since the size of the sketch is $\sqrt{w}$, and there are $\sqrt{w}$ one-hot strings of length $\sqrt{w}$,
+the total space for all the one-hot strings is $w$, so it fits.
+
+Now we have found the first cluster that has a set bit. Next, we go to that cluster, and use the exact same method again to find the
+\texttt{msb} there. This works since the sketch and the cluter are the exact same size.
+At last, we combine the two indices (cluster index and internal index to the cluster) to get the global index.
+
 
 \chapter{Succinct Structures}
 Rank, Select
-- 
cgit v1.2.3