Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. Distance metrics can only serve as proxy for similarity in information retrieval of similar instances. Learning a good similarity function from human annotations improves the quality of retrievals. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset. We adapt an entropy-based active learning method with recent work from triplet mining to collect easy-to-answer but still informative annotations from human participants and use them to train a deep convolutional network that generalizes to unseen samples. Our user study shows that our approach improves the quality of the information retrieval compared to a previous deep metric learning approach that relies on a Siamese network. Specifically, we shed light on the strengths and weaknesses of passive sampling heuristics and active learners alike by analyzing the participants' response efficacy. To this end, we collect accuracy, algorithmic time complexity, the participants' fatigue and time-to-response, qualitative self-assessment and statements, as well as the effects of mixed-expertise annotators and their consistency on model performance and transfer-learning.
翻译:人类本能地用未知的相似功能测量无标签数据集中各实例之间的距离。 远程度量只能作为类似实例信息检索中的相似性的代名词。 从人类的注释中学习一个良好的相似功能可以提高检索质量。 这项工作使用深度度的学习方法,从一个庞大的足球轨道数据集的几个注释中学习这些用户定义的相似功能。 我们用最近从三重采矿中开始的工作来调整一种基于酶基的积极学习方法,以便从人类参与者那里收集容易回答但仍然内容丰富的说明,并用它们来训练一个深度的革命网络,将其概括为看不见的样本。 我们的用户研究表明,与以前依靠西亚网络的深入的计量学习方法相比,我们的方法提高了信息检索的质量。 具体地说,我们通过分析参与者的反应效果, 来了解被动抽样超自然学和活跃的学习者的长处和弱点。 为此,我们收集了准确性、 算法时间的复杂性、 参与者疲劳和时间对应的时间、 定性的自我评估和陈述, 以及混合研究者对模型和转移学习的一贯性。