The problem of completing a large matrix with lots of missing entries has received widespread attention in the last couple of decades. Two popular approaches to the matrix completion problem are based on singular value thresholding and nuclear norm minimization. Most of the past works on this subject assume that there is a single number $p$ such that each entry of the matrix is available independently with probability $p$ and missing otherwise. This assumption may not be realistic for many applications. In this work, we replace it with the assumption that the probability that an entry is available is an unknown function $f$ of the entry itself. For example, if the entry is the rating given to a movie by a viewer, then it seems plausible that high value entries have greater probability of being available than low value entries. We propose two new estimators, based on singular value thresholding and nuclear norm minimization, to recover the matrix under this assumption. The estimators involve no tuning parameters, and are shown to be consistent under a low rank assumption. We also provide a consistent estimator of the unknown function $f$.
翻译:在过去几十年中,完成一个包含大量缺失条目的大矩阵的问题引起了广泛的关注。对于矩阵完成问题,两种流行的方法都是以单值阈值和核规范最小化为基础。过去关于这一主题的大部分工作假设有一个单数美元,因此矩阵的每个条目都可以独立获得,概率为1美元,否则则缺失。对于许多应用程序来说,这一假设可能不现实。在这项工作中,我们用以下假设来取代它:一个条目的可用概率是该条目本身的未知函数$f美元。例如,如果条目是观众给电影的评级,那么,高值条目的可用概率似乎比低值条目大。我们提议根据单值阈值和核规范最小化两个新的估计值,以便在这一假设下恢复矩阵。估计值不涉及调整参数,并且显示在低级假设下是一致的。我们还提供了一个未知函数的一致的估算值$f美元。