This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. MESP has been widely applied to many areas, including healthcare, power system, manufacturing and data science. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to study an efficient sampling algorithm and develop its approximation bound for MESP, which improves the best-known bound in literature. We then provide an efficient deterministic implementation of the sampling algorithm with the same approximation bound. By developing new mathematical tools for the singular matrices and analyzing the Lagrangian dual of the proposed convex integer program, we investigate the widely-used local search algorithm and prove its first-known approximation bound for MESP. The proof techniques further inspire us with an efficient implementation of the local search algorithm. Our numerical experiments demonstrate that these approximation algorithms can efficiently solve medium-sized and large-scale instances to near-optimality. Our proposed algorithms are coded and released as open-source software. Finally, we extend the analyses to the A-Optimal MESP (A-MESP), where the objective is to minimize the trace of the inverse of the selected principal submatrix.
翻译:本文研究了经典的最大熵采样问题(MESP),该问题旨在从协方差矩阵中选择最具信息量的特定大小的主子矩阵。MESP已被广泛应用于许多领域,包括医疗保健、电力系统、制造业和数据科学。通过研究其Lagrangian dual和primal,我们为MESP导出了一个新的凸整数规划,并证明其连续松弛可以得到近似最优解。结果激发我们研究一个高效的采样算法并为MESP开发其近似界限,这提高了文献中已知的最好界限。然后,我们提供了一个具有相同近似边界的高效确定性采样算法实现。通过为奇异矩阵开发新的数学工具并分析所提出的凸整数规划的Lagrangian dual,我们调查了广泛使用的局部搜索算法并证明了其针对MESP的第一个近似界限。该证明技术进一步启发了我们开发局部搜索算法的高效实现。我们的数值实验表明,这些近似算法可以有效地解决中等规模和大规模实例,接近最优解。我们提出的算法已编码并发布为开源软件。最后,我们将分析扩展到A-Optimal MESP(A-MESP),其中目标是最小化所选主子矩阵的逆的痕迹。