Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset and benchmark, grounded in the real-world application of teachable object recognizers for people who are blind/low-vision. The dataset contains 3,822 videos of 486 objects recorded by people who are blind/low-vision on their mobile phones. The benchmark reflects a realistic, highly challenging recognition problem, providing a rich playground to drive research in robustness to few-shot, high-variation conditions. We set the benchmark's first state-of-the-art and show there is massive scope for further innovation, holding the potential to impact a broad range of real-world vision applications including tools for the blind/low-vision community. We release the dataset at https://doi.org/10.25383/city.14294597 and benchmark code at https://github.com/microsoft/ORBIT-Dataset.
翻译:在过去的十年中,物体的识别取得了巨大的进步,但主要仍然依赖每个对象类别的许多高质量培训实例。相反,从几个例子中学习新对象,可以使机器人到用户个性化等许多影响性化应用。然而,大多数少数的学习研究是由基准数据集驱动的,这些应用在现实世界中部署时将面临巨大的差异。为了缩小这一差距,我们展示了ORBIT数据集和基准,该数据集基于对盲人/低视人群的可教学对象识别器的实际应用。数据集包含由盲人/低视人群记录的486个物体的3 822个视频。该基准反映了一个现实的、高度具有挑战性的识别问题,为这些应用在现实的、高变异性条件下进行稳健的研究提供了一个丰富的游乐场。我们设置了基准的第一个状态,并表明有巨大的进一步创新空间,从而有可能影响包括盲人/低视界工具在内的各种真实世界视觉应用。我们在 https://dobormax/microb.org/row-visal 社区中发布了数据集,在 https://dologyal/malogyal.