An interactive image retrieval system learns which images in the database belong to a user's query concept, by analyzing the example images and feedback provided by the user. The challenge is to retrieve the relevant images with minimal user interaction. In this work, we propose to solve this problem by posing it as a binary classification task of classifying all images in the database as being relevant or irrelevant to the user's query concept. Our method combines active learning with graph-based semi-supervised learning (GSSL) to tackle this problem. Active learning reduces the number of user interactions by querying the labels of the most informative points and GSSL allows to use abundant unlabeled data along with the limited labeled data provided by the user. To efficiently find the most informative point, we use an uncertainty sampling based method that queries the label of the point nearest to the decision boundary of the classifier. We estimate this decision boundary using our heuristic of adaptive threshold. To utilize huge volumes of unlabeled data we use an efficient approximation based method that reduces the complexity of GSSL from $O(n^3)$ to $O(n)$, making GSSL scalable. We make the classifier robust to the diversity and noisy labels associated with images in large databases by incorporating information from multiple modalities such as visual information extracted from deep learning based models and semantic information extracted from the WordNet. High F1 scores within few relevance feedback rounds in our experiments with concepts defined on AnimalWithAttributes and Imagenet (1.2 million images) datasets indicate the effectiveness and scalability of our approach.
翻译:交互式图像检索系统通过分析用户提供的示例图像和反馈,学习数据库中哪些图像属于用户的查询概念。 挑战在于以最小的用户交互作用来检索相关图像。 在此工作中, 我们提议将这一问题作为二进制分类任务, 将数据库中的所有图像分类为与用户的查询概念相关或无关的二进制分类。 我们的方法是将积极学习与基于图形的半监督学习( GSSL) 相结合, 解决这个问题。 积极学习通过查询最知情点的标签和用户提供的反馈, 从而减少用户互动的次数。 使用最知情点的标签和用户提供的反馈。 挑战是如何在用户提供的有限标签数据中使用大量无标签的图像。 为了高效地使用基于不确定性的抽样方法, 将数据库中的所有图像分类分类分类为与用户查询最接近或与用户查询概念无关。 我们使用基于图形的粗略度阈值来评估这个决定边界。 我们使用大量基于无标签的数据来降低 GSSL的复杂度, 从$( n3) 到$O( n)$( $) $( $) $O( n) 允许使用大量图像的反馈方法, 能够使用大量的无标记的图像数据库, 和高额数据库中的高级数据库中的高级数据库, 和高额数据, 和高额数据库中学习数据库中的高级数据, 将大量数据与高级数据库中的高级数据库中的高级数据库中学习。