Data analysis require a pairwise proximity measure over objects. Recent work has extended this to situations where the distance information between objects is given as comparison results of distances between three objects (triplets). Humans find the comparison tasks much easier than the exact distance computation and such data can be easily obtained in big quantity via crowd-sourcing. In this work, we propose triplets augmentation, an efficient method to extend the triplets data by inferring the hidden implicit information form the existing data. Triplets augmentation improves the quality of kernel-based and kernel-free data analytics. We also propose a novel set of algorithms for common data analysis tasks based on triplets. These methods work directly with triplets and avoid kernel evaluations, thus are scalable to big data. We demonstrate that our methods outperform the current best-known techniques and are robust to noisy data.
翻译:数据分析要求对对象进行对称近距离测量。 最近的工作将这种测量扩展到了作为三个对象(三环)距离距离比较结果给出天体之间的距离信息的情况。 人类发现比较任务比精确距离计算要容易得多, 这些数据可以通过众包轻易获得。 在这项工作中, 我们提出三重增强, 这是一种通过推断现有数据形式中隐藏的隐含信息来扩展三重数据的有效方法。 三重增强会提高了内核和无内核数据分析的质量。 我们还提出了一套基于三重的通用数据分析任务的新算法。 这些方法直接与三重三重,避免内核评估,因此对大数据来说是可扩缩的。 我们证明我们的方法优于目前最著名的技术,并且对噪音数据非常有力。