This paper considers the problem of detecting whether two databases, each consisting of $n$ users with $d$ Gaussian features, are correlated. Under the null hypothesis, the databases are independent. Under the alternate hypothesis, the features are correlated across databases, under an unknown row permutation. A simple test is developed to show that detection is achievable above $\rho^2 \approx \frac{1}{d}$. For the converse, the truncated second moment method is used to establish that detection is impossible below roughly $\rho^2 \approx \frac{1}{d\sqrt{n}}$. These results are compared to the corresponding recovery problem, where the goal is to decode the row permutation, and a converse bound of roughly $\rho^2 \approx 1 - n^{-4/d}$ has been previously shown. For certain choices of parameters, the detection achievability bound outperforms this recovery converse bound, demonstrating that detection can be easier than recovery in this scenario.
翻译:本文考虑了检测两个数据库是否相关的问题, 每个数据库由美元用户组成, 每个由$Dausian 特性组成。 在无效假设下, 数据库是独立的。 在另一个假设下, 各个数据库的特征是相互关联的, 在未知的行排列下。 开发了一个简单的测试, 以显示检测可以超过$rho2\\\ approx\ frac{ 1\\\\\\\\\\ d}$。 反之, 使用截断的第二秒方法来确定检测不可能低于$rho2\ 2\ approx\ frac{ 1\\\\ d\ sqrt{n\\\\\\ $。 这些结果与相应的恢复问题相比较, 目标是解码行排列, 并且之前显示大约$rho2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\