The problem of completing a large matrix with lots of missing entries has received widespread attention in the last couple of decades. Two popular approaches to the matrix completion problem are based on singular value thresholding and nuclear norm minimization. Most of the past works on this subject assume that there is a single number $p$ such that each entry of the matrix is available independently with probability $p$ and missing otherwise. This assumption may not be realistic for many applications. In this work, we replace it with the assumption that the probability that an entry is available is an unknown function $f$ of the entry itself. For example, if the entry is the rating given to a movie by a viewer, then it seems plausible that high value entries have greater probability of being available than low value entries. We propose two new estimators, based on singular value thresholding and nuclear norm minimization, to recover the matrix under this assumption. The estimators are shown to be consistent under a low rank assumption. We also provide a consistent estimator of the unknown function $f$.
翻译:在过去几十年中,用大量缺失条目完成一个大矩阵的问题引起了广泛的关注。两种对矩阵完成问题的流行方法都以单值阈值和核规范最小化为基础。过去关于这一主题的多数工作假设有一个单数的p美元,因此矩阵的每个条目都可以独立获得,概率为p美元,否则则缺失。这一假设对许多应用程序来说可能不现实。在这项工作中,我们用以下假设来取代它:一个条目的可用概率是该条目本身的未知函数$f美元。例如,如果条目是观众给电影的评级,那么,高值条目的可用概率似乎比低值条目大。我们建议根据单值阈值和核规范最小化两个新的估计值,以便在这一假设下恢复矩阵。根据低级假设,估计值被证明是一致的。我们还提供了一个未知函数的一致的估算值$f美元。