The objective of ordinal embedding is to find a Euclidean representation of a set of abstract items, using only answers to triplet comparisons of the form "Is item $i$ closer to the item $j$ or item $k$?". In recent years, numerous algorithms have been proposed to solve this problem. However, there does not exist a fair and thorough assessment of these embedding methods and therefore several key questions remain unanswered: Which algorithms scale better with increasing sample size or dimension? Which ones perform better when the embedding dimension is small or few triplet comparisons are available? In our paper, we address these questions and provide the first comprehensive and systematic empirical evaluation of existing algorithms as well as a new neural network approach. In the large triplet regime, we find that simple, relatively unknown, non-convex methods consistently outperform all other algorithms, including elaborate approaches based on neural networks or landmark approaches. This finding can be explained by our insight that many of the non-convex optimization approaches do not suffer from local optima. In the low triplet regime, our neural network approach is either competitive or significantly outperforms all the other methods. Our comprehensive assessment is enabled by our unified library of popular embedding algorithms that leverages GPU resources and allows for fast and accurate embeddings of millions of data points.
翻译:ordin 嵌入的目的是要找到一组抽象项目的 Euclidea 代表, 仅使用对“ 项目是否接近项目 $ $ $ 美元 或 项目 $ $ $ 美元 ” 表格的三重比较的答案。 近年来,提出了许多算法来解决这个问题。 但是,还没有对这些嵌入方法进行公平和彻底的评估,因此,几个关键问题仍然没有答案: 哪种算法在样本规模或维度增加的情况下比例更好? 当嵌入的维度小或少于三重比较时,哪些算法表现更好? 在我们的论文中,我们处理这些问题,并提供了对现有算法以及新的神经网络方法的第一次全面和系统的经验性评估。在大型的三重机制中,我们发现简单、相对未知、非电解法方法始终超越所有其他算法,包括基于神经网络或标志性方法的精细方法。我们从洞察中可以解释, 许多非convex优化方法并不受到本地选择的影响。 在低三重的三重机制中, 我们的神经网络方法是对现有算法进行竞争或大大超越了我们数据库的精确的定位, 使得我们所有的GPIL 能够使其他方法得到快速的精确的定位。