具有信息理论损失的深层代表性学习 (Deep Representation Learning with an Information-theoretic Loss)

This paper proposes a deep representation learning using an information-theoretic loss with an aim to increase the inter-class distances as well as within-class similarity in the embedded space. Tasks such as anomaly and out-of-distribution detection, in which test samples comes from classes unseen in training, are problematic for deep neural networks. For such tasks, it is not sufficient to merely discriminate between known classes. Our intuition is to represent the known classes in compact and separated embedded regions in order to decrease the possibility of known and unseen classes overlapping in the embedded space. We derive a loss from Information Bottleneck principle, which reflects the inter-class distances as well as the compactness within classes, thus will extend the existing deep data description models. Our empirical study shows that the proposed model improves the segmentation of normal classes in the deep feature space, and subsequently contributes to identifying out-of-distribution samples.

翻译：本文提出利用信息理论损失进行深层代表性学习,目的是增加嵌入空间的阶级间距离和阶级内相似性;异常点和分配外检测等任务,即测试样品来自培训中看不见的阶级,对深层神经网络来说有问题;对于这些任务来说,仅仅区分已知的阶级是不够的;我们的直觉是代表密闭和分离的嵌入区域已知的阶级,以减少在嵌入空间出现已知和看不见的阶级重叠的可能性;我们从反映阶级间距离和阶级内紧凑的“信息瓶颈”原则中获得了损失,从而将扩大现有的深层数据描述模型;我们的经验研究表明,拟议的模型将改善深层地貌正常阶级的分化,并随后有助于确定分布外的样本。

相关内容

表示学习

关注 185

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

83+阅读 · 2022年3月19日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

专知会员服务

39+阅读 · 2020年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs