In many real-world settings, only incomplete measurement data are available which can pose a problem for learning. Unsupervised learning of the signal model using a fixed incomplete measurement process is impossible in general, as there is no information in the nullspace of the measurement operator. This limitation can be overcome by using measurements from multiple operators. While this idea has been successfully applied in various applications, a precise characterization of the conditions for learning is still lacking. In this paper, we fill this gap by presenting necessary and sufficient conditions for learning the signal model which indicate the interplay between the number of distinct measurement operators $G$, the number of measurements per operator $m$, the dimension of the model $k$ and the dimension of the signals $n$. In particular, we show that generically unsupervised learning is possible if each operator obtains at least $m>k+n/G$ measurements. Our results are agnostic of the learning algorithm and have implications in a wide range of practical algorithms, from low-rank matrix recovery to deep neural networks.
翻译:在许多现实世界环境中,只有不完全的测量数据存在,这可能造成学习问题。使用固定的不完全的测量程序,对信号模型进行未经监督的学习,一般来说是不可能的,因为测量操作员的空域中没有任何信息。这一限制可以通过使用多个操作员的测量来克服。虽然这一想法已经成功地应用于各种应用中,但仍然缺乏对学习条件的精确描述。在本文件中,我们填补了这一差距,为学习信号模型提供了必要和充分的条件,该信号模型表明不同测量操作员的数目($G$)、每个操作员的测量次数($M)、每个操作员的测量次数($M)、模型的尺寸($K$)和信号的维度($$美元)之间的相互作用。我们特别表明,如果每个操作员获得至少$>k+n/G$的测量,那么一般不受监督的学习是可能的。我们的结果对学习算法具有概念的敏感性,并且对从低级矩阵恢复到深层神经网络等一系列实际算法产生影响。