In the pooled data problem we are given $n$ agents with hidden state bits, either $0$ or $1$. The hidden states are unknown and can be seen as the underlying ground truth $\sigma$. To uncover that ground truth, we are given a querying method that queries multiple agents at a time. Each query reports the sum of the states of the queried agents. Our goal is to learn the hidden state bits using as few queries as possible. So far, most literature deals with exact reconstruction of all hidden state bits. We study a more relaxed variant in which we allow a small fraction of agents to be classified incorrectly. This becomes particularly relevant in the noisy variant of the pooled data problem where the queries' results are subject to random noise. In this setting, we provide a doubly regular test design that assigns agents to queries. For this design we analyze an approximate reconstruction algorithm that estimates the hidden bits in a greedy fashion. We give a rigorous analysis of the algorithm's performance, its error probability, and its approximation quality. As a main technical novelty, our analysis is uniform in the degree of noise and the sparsity of $\sigma$. Finally, simulations back up our theoretical findings and provide strong empirical evidence that our algorithm works well for realistic sample sizes.
翻译:在集合数据问题中,我们得到的是隐藏状态的一元代理人,或者0美元,或者1美元。隐藏状态是未知的,可以视为基本地面真理的基底真理$$sigma$。为了揭示地面真理,我们得到一种询问方法,每次询问多个代理人。每个查询报告被查询的代理人的总数。我们的目标是尽可能用较少的查询来了解隐藏状态的一元。迄今为止,大多数文献涉及所有隐藏状态的精确重建。我们研究了一个更加宽松的变式,让我们对一小部分代理人进行错误分类。这在查询结果受到随机噪音影响的集合数据问题的响亮变体中变得特别相关。在这个环境中,我们提供一种双重的定期测试设计,指派代理人进行查询。对于这个设计,我们分析一个大致的重建算法,以贪婪的方式估计隐藏的点。我们严格地分析了算法的性能、其误差概率及其近似质量。我们的主要技术创新是,我们的分析在噪音的程度上是一致的,以及我们精确的理论性研究结果。最后,我们提供了一个精确的模型和精确的模型。</s>