以重新定级的 " Hinge损失 " 为基础的低Rank Robust 在线远程/团结学习 (Low-Rank Robust Online Distance/Similarity Learning based on the Rescaled Hinge Loss)

An important challenge in metric learning is scalability to both size and dimension of input data. Online metric learning algorithms are proposed to address this challenge. Existing methods are commonly based on (Passive Aggressive) PA approach. Hence, they can rapidly process large volumes of data with an adaptive learning rate. However, these algorithms are based on the Hinge loss and so are not robust against outliers and label noise. Also, existing online methods usually assume training triplets or pairwise constraints are exist in advance. However, many datasets in real-world applications are in the form of input data and their associated labels. We address these challenges by formulating the online Distance-Similarity learning problem with the robust Rescaled hinge loss function. The proposed model is rather general and can be applied to any PA-based online Distance-Similarity algorithm. Also, we develop an efficient robust one-pass triplet construction algorithm. Finally, to provide scalability in high dimensional DML environments, the low-rank version of the proposed methods is presented that not only reduces the computational cost significantly but also keeps the predictive performance of the learned metrics. Also, it provides a straightforward extension of our methods for deep Distance-Similarity learning. We conduct several experiments on datasets from various applications. The results confirm that the proposed methods significantly outperform state-of-the-art online DML methods in the presence of label noise and outliers by a large margin.

翻译：衡量学习中的一个重要挑战是使输入数据的规模和层面具有可伸缩性。提出了在线衡量学习算法来应对这一挑战。现有方法通常以( Passive Agrestitionive) PA 方法为基础。因此,它们可以迅速处理大量适应学习率的数据。但是,这些算法基于Hinge损失,因此,对于外部线和标签噪音而言,这些算法并不强大。另外,现有的在线方法通常假定培训三重或双向限制是预先存在的。然而,现实世界应用中的许多数据集是以输入数据及其相关标签为形式的。我们通过制定强有力的重新标定损失函数来应对这些挑战。提议的模型相当笼统,可以适用于任何基于PA的在线远距离统计算法。此外,我们开发了一个高效的一等式三通制建筑算法。最后,为了在高尺寸的DML环境中提供可缩缩放的缩放版,拟议方法不仅大大降低计算成本,而且通过深度的标定值测试方法,还保持了深度的远程数据运行率。我们还在远程测试中提供了多种远程测试方法。