Common representation learning (CRL) learns a shared embedding between two or more modalities to improve in a given task over using only one of the modalities. CRL from different data types such as images and time-series data (e.g., audio or text data) requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the triplet loss, which uses positive and negative identities to create sample pairs with different labels, for CRL between image and time-series modalities. By adapting the triplet loss for CRL, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. Our experiments on synthetic data and handwriting recognition data from sensor-enhanced pens show an improved classification accuracy, faster convergence, and a better generalizability.
翻译:通用代言学习( CRL) 学习两种或两种以上模式的共同嵌入方式,以改善某一任务中只使用一种模式。来自图像和时间序列数据(如音频或文本数据)等不同数据类型的CRL需要深度的衡量学习损失,以最大限度地减少模式嵌入之间的距离。在本文中,我们提议使用三重损失,即使用正和负特性来创建带有不同标签的样本配对,用于CRL图像和时间序列模式的CRL。通过调整CRL的三重损失,主要(时间序列分类)任务的精度可以通过利用辅助(图像分类)任务的额外信息来实现。我们在合成数据方面的实验以及传感器增强的钢笔的笔识别数据显示出更高的分类准确性、更快的趋同性以及更好的通用性。