Matrix completion is a ubiquitous tool in machine learning and data analysis. Most work in this area has focused on the number of observations necessary to obtain an accurate low-rank approximation. In practice, however, the cost of observations is an important limiting factor, and experimentalists may have on hand multiple modes of observation with differing noise-vs-cost trade-offs. This paper considers matrix completion subject to such constraints: a budget is imposed and the experimentalist's goal is to allocate this budget between two sampling modalities in order to recover an accurate low-rank approximation. Specifically, we consider that it is possible to obtain low noise, high cost observations of individual entries or high noise, low cost observations of entire columns. We introduce a regression-based completion algorithm for this setting and experimentally verify the performance of our approach on both synthetic and real data sets. When the budget is low, our algorithm outperforms standard completion algorithms. When the budget is high, our algorithm has comparable error to standard nuclear norm completion algorithms and requires much less computational effort.
翻译:矩阵完成是机器学习和数据分析中无处不在的工具。 这一领域的大部分工作侧重于获得准确的低级近似值所需的观测数量。 然而,在实践中,观测成本是一个重要的限制因素,实验家可能手持多种观察模式,有不同的噪音- 反成本权衡。 本文认为矩阵完成受这些限制: 预算是强制实行的,实验家的目标是在两种抽样模式之间分配这一预算, 以便恢复准确的低级近似值。 具体地说, 我们认为有可能获得低噪声、 高成本的单个条目或高噪声观测、 低成本的整列观测。 我们为此设定了基于回归的完成算法, 并实验性地验证我们在合成和真实数据集上的方法绩效。 当预算低时, 我们的算法超过标准完成算法。 当预算高时, 我们的算法有与标准的核规范完成算法相似的错误, 并且需要的计算努力要少得多。