掩码化预测任务:参数可识别性视图 (Masked prediction tasks: a parameter identifiability view)

The vast majority of work in self-supervised learning, both theoretical and empirical (though mostly the latter), have largely focused on recovering good features for downstream tasks, with the definition of "good" often being intricately tied to the downstream task itself. This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream tasks to focus on -- in practice, this problem is usually resolved by competing on the benchmark dataset du jour. In this paper, we present an alternative lens: one of parameter identifiability. More precisely, we consider data coming from a parametric probabilistic model, and train a self-supervised learning predictor with a suitably chosen parametric form. Then, we ask whether we can read off the ground truth parameters of the probabilistic model from the optimal predictor. We focus on the widely used self-supervised learning method of predicting masked tokens, which is popular for both natural languages and visual data. While incarnations of this approach have already been successfully used for simpler probabilistic models (e.g. learning fully-observed undirected graphical models), we focus instead on latent-variable models capturing sequential structures -- namely Hidden Markov Models with both discrete and conditionally Gaussian observations. We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not. Our results, borne of a theoretical grounding of self-supervised learning, could thus potentially beneficially inform practice. Moreover, we uncover close connections with uniqueness of tensor rank decompositions -- a widely used tool in studying identifiability through the lens of the method of moments.

翻译：在自监督的理论和实证(尽管后者大多是后者)学习中,绝大多数工作在理论和经验上都主要侧重于为下游任务恢复良好的特征,而“良好”的定义往往与下游任务本身紧密地联系在一起。这个镜头无疑非常有趣,但也有问题,即没有一套“卡门”的下游任务需要关注,实际上,这个问题通常通过在基准数据集上竞争来解决。在本文中,我们提出了一个替代镜头:一个参数可识别性。更确切地说,我们认为数据来自一个参数可辨别的自我稳定模型,并且用一个适当选择的参数形式来训练一个自我监督的学习预测预测预测器。然后,我们问我们是否能够从最佳预测器的预测器的地面参数上阅读。我们侧重于广泛使用的自我监督的预测掩码符号的学习方法,这对自然语言和视觉数据都是受欢迎的。虽然这一方法具有可解释性,但已经成功地用于更简单的直观的自我稳定模型(例如:我们所观测的可测的可测的极易变性数据),而我们又通过一个可测的可测的可测的可测的模型,我们所测的可测的可测的可测的可测的可测的可测性模型来学习性模型来学习的。