This paper considers a noisy data structure recovery problem. The goal is to investigate the following question: Given a noisy observation of a permuted data set, according to which permutation was the original data sorted? The focus is on scenarios where data is generated according to an isotropic Gaussian distribution, and the noise is additive Gaussian with an arbitrary covariance matrix. This problem is posed within a hypothesis testing framework. The objective is to study the linear regime in which the optimal decoder has a polynomial complexity in the data size, and it declares the permutation by simply computing a permutation-independent linear function of the noisy observations. The main result of the paper is a complete characterization of the linear regime in terms of the noise covariance matrix. Specifically, it is shown that this matrix must have a very flat spectrum with at most three distinct eigenvalues to induce the linear regime. Several practically relevant implications of this result are discussed, and the error probability incurred by the decision criterion in the linear regime is also characterized. A core technical component consists of using linear algebraic and geometric tools, such as Steiner symmetrization.
翻译:本文审议了一个吵闹的数据结构回收问题。 目的是调查以下问题: 在对一个固定数据集进行噪音观测的情况下, 将原始数据排序为变异; 重点是根据异向高斯分布生成数据的情景; 噪音是带有任意共变矩阵的加加高西安。 这个问题在假设测试框架内出现。 目标是研究最优解密器在数据大小方面具有多元复杂性的线性系统, 并且通过简单地计算噪音观察的偏移独立的线性功能来宣布变异性。 纸张的主要结果就是对噪音共变矩阵的线性系统的完整描述。 具体地说, 显示该矩阵必须有一个非常平坦纳的频谱, 最多有三种不同的电子值来引导线性系统。 讨论这一结果的几种实际影响, 线性系统决定标准产生的误差概率也作了描述。 核心技术组件包括使用线性代谢和几何测量工具, 如 Steiner 。