Many problems on data streams have been studied at two extremes of difficulty: either allowing randomized algorithms, in the static setting (where they should err with bounded probability on the worst case stream); or when only deterministic and infallible algorithms are required. Some recent works have considered the adversarial setting, in which a randomized streaming algorithm must succeed even on data streams provided by an adaptive adversary that can see the intermediate outputs of the algorithm. In order to better understand the differences between these models, we study a streaming task called "Missing Item Finding". In this problem, for $r < n$, one is given a data stream $a_1,\ldots,a_r$ of elements in $[n]$, (possibly with repetitions), and must output some $x \in [n]$ which does not equal any of the $a_i$. We prove that, for $r = n^{\Theta(1)}$ and $\delta = 1/\mathrm{poly}(n)$, the space required for randomized algorithms that solve this problem in the static setting with error $\delta$ is $\Theta(\mathrm{polylog}(n))$; for algorithms in the adversarial setting with error $\delta$, $\Theta((1 + r^2 / n) \mathrm{polylog}(n))$; and for deterministic algorithms, $\Theta(r / \mathrm{polylog}(n))$. Because our adversarially robust algorithm relies on free access to a string of $O(r \log n)$ random bits, we investigate a "random start" model of streaming algorithms where all random bits used are included in the space cost. Here we find a conditional lower bound on the space usage, which depends on the space that would be needed for a pseudo-deterministic algorithm to solve the problem. We also prove an $\Omega(r / \mathrm{polylog}(n))$ lower bound for the space needed by a streaming algorithm with $< 1/2^{\mathrm{polylog}(n)}$ error against "white-box" adversaries that can see the internal state of the algorithm, but not predict its future random decisions.
翻译:数据流的许多问题已在两个极端困难的地方进行了研究:要么允许随机化的算法,在静态环境中(在最坏的个案流中,它们应该错误地使用固定的概率);或者只要需要确定性和不可错错的算法。最近的一些工作考虑了对抗性设置,在这个设置中,随机化的流算法必须成功,即使是在能够看到算法的中间输出的适应性对手提供的数据流上。为了更好地了解这些模型之间的差异,我们研究了一个名为“丢失项目查找”的流式任务。在这个设置中,对于 $(n) 来说, 美元(r) 的算法, 美元(r) 美元; 美元(r) 美元(r) 的算法, 美元(r) 美元(r) 的算法运算法(r) 。