Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal. This technique is particularly relevant for speech data where various meaningful signal processing features may serve as pseudo-labels. However, the process of selecting pseudo-labels, for speech or other types of data, remains mostly unexplored and currently relies on observing the results on the final downstream task. Nevertheless, this methodology is not sustainable at scale due to substantial computational (hence carbon) costs. Thus, this paper introduces a practical and theoretical framework to select relevant pseudo-labels with respect to a given downstream task. More precisely, we propose a functional estimator of the pseudo-label utility grounded in the conditional independence theory, which does not require any training. The experiments conducted on speaker recognition and automatic speech recognition validate our estimator, showing a significant correlation between the performance observed on the downstream task and the utility estimates obtained with our approach, facilitating the prospection of relevant pseudo-labels for self-supervised speech representation learning.
翻译:通过解决托辞任务,自我监督的学习(SSL)利用未贴标签的数据来获取有用的潜在代表形式,取代下游任务的传统投入特征。共同的托辞任务包括:对最初信号产生的假标签的SSL模型进行预先培训。这一技术对于各种有意义的信号处理功能可能用作假标签的语音数据特别相关。然而,为语音或其他类型的数据选择假标签的过程大多仍未探讨,目前依赖于观察最后下游任务的结果。然而,由于计算成本巨大,这一方法在规模上是不可持续的。因此,本文提出了一个实用和理论框架,用于在特定下游任务方面选择相关的伪标签。更准确地说,我们提议基于有条件独立理论的伪标签实用功能的功能估算师,这不需要任何培训。在语音识别和自动语音识别方面进行的实验验证了我们的估计,显示下游任务上观察到的绩效与我们的做法获得的实用性估算之间有着重大的相关性,从而便利了相关伪标签在自我监督的语音学习中的前景。