In the $G$-sampling problem, the goal is to output an index $i$ of a vector $f \in\mathbb{R}^n$, such that for all coordinates $j \in [n]$, \[\textbf{Pr}[i=j] = (1 \pm \epsilon) \frac{G(f_j)}{\sum_{k\in[n]} G(f_k)} + \gamma,\] where $G:\mathbb{R} \to \mathbb{R}_{\geq 0}$ is some non-negative function. If $\epsilon = 0$ and $\gamma = 1/\text{poly}(n)$, the sampler is called perfect. In the data stream model, $f$ is defined implicitly by a sequence of updates to its coordinates, and the goal is to design such a sampler in small space. Jayaram and Woodruff (FOCS 2018) gave the first perfect $L_p$ samplers in turnstile streams, where $G(x)=|x|^p$, using $\text{polylog}(n)$ space for $p\in(0,2]$. However, to date all known sampling algorithms are not truly perfect, since their output distribution is only point-wise $\gamma = 1/\text{poly}(n)$ close to the true distribution. This small error can be significant when samplers are run many times on successive portions of a stream, and leak potentially sensitive information about the data stream. In this work, we initiate the study of truly perfect samplers, with $\epsilon = \gamma = 0$, and comprehensively investigate their complexity in the data stream and sliding window models. Abstract truncated due to arXiv limits; please see paper for full abstract.
翻译:在 $G$ 的抽样问题中, 目标是输出一个向量 $x 的 mexicle $f\ in\ mathb{R ⁇ {R ⁇ } n$ 的指数 。 对于所有向量 $ $ $\ n],\\\\ textbf{Pr} [i=j] = 1\ pm\\\\ epsilon} G(f_j)\\\\\ sum} k\ in[ n} G(f_k) + pomma,\] 其中向向量的向量的向量 方向输出 : g:\ mexcialb{R}\ misslexxb{rookral_rexexral_ricer= $n] 。 如果 $\\ textblations real_\\\\\\ prentral_rentral= dies, 其所有的流数据在运行量 $xxxxxxxxxxxxxxxxlentrentrental 数据中, 则其所有的流流流流流数据为为为 =xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx。