Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset and benchmark, grounded in a real-world application of teachable object recognizers for people who are blind/low-vision. The dataset contains 3,822 videos of 486 objects recorded by people who are blind/low-vision on their mobile phones, and the benchmark reflects a realistic, highly challenging recognition problem, providing a rich playground to drive research in robustness to few-shot, high-variation conditions. We set the first state-of-the-art on the benchmark and show that there is massive scope for further innovation, holding the potential to impact a broad range of real-world vision applications including tools for the blind/low-vision community. The dataset is available at https://bit.ly/2OyElCj and the code to run the benchmark at https://bit.ly/39YgiUW.
翻译:在过去的十年中,物体的识别取得了巨大的进步,但主要仍然依赖每个对象类别的许多高质量培训实例。相比之下,从几个例子中学习新对象,可以使机器人到用户个性化等许多影响性化应用。然而,大多数少发学习研究是由基准数据集驱动的,这些数据集在实际世界中部署时缺乏这些应用将面临的巨大差异。为了缩小这一差距,我们展示了ORBIT数据集和基准,该数据集基于对盲人/低视人群的可教学对象识别器的现实世界应用。数据集包含由盲人/低视人群记录的486个物体的3 822个视频,这些视频来自盲人/低视人群,该基准反映了一个现实的、极具挑战性的识别问题,为这些应用在现实的、高变异性条件下进行稳健的研究提供了一个丰富的游乐场。我们为基准设定了第一个状态,并表明进一步创新的范围很广,有可能影响包括盲人/低视界工具在内的各种真实世界视觉应用。数据集可在 https/Wbit/Odestroy 数据库上查阅。