Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system. Such a system would have the potential to solve a wide range of vision tasks simultaneously, without being restricted to a specific problem or data domain. This is crucial for practical, real-world computer vision applications. In this study, we focus on the million-scale multi-domain universal object detection problem, which presents several challenges, including cross-dataset category label duplication, label conflicts, and the need to handle hierarchical taxonomies. Furthermore, there is an ongoing challenge in the field to find a resource-efficient way to leverage large pre-trained vision models for million-scale cross-dataset object detection. To address these challenges, we introduce our approach to label handling, hierarchy-aware loss design, and resource-efficient model training using a pre-trained large model. Our method was ranked second in the object detection track of the Robust Vision Challenge 2022 (RVC 2022). We hope that our detailed study will serve as a useful reference and alternative approach for similar problems in the computer vision community. The code is available at https://github.com/linfeng93/Large-UniDet.
翻译:过去几年来,人们越来越关心开发一个广泛、普遍和通用的计算机视觉系统,这种系统将有可能同时解决广泛的视觉任务,而不局限于特定的问题或数据领域。这对于实用的、现实世界的计算机视觉应用至关重要。在这项研究中,我们侧重于100万规模的多域通用物体探测问题,这提出了若干挑战,包括跨数据分类标签重复、标签冲突以及处理等级分类的必要性。此外,这个系统还面临一个持续的挑战,即寻找一种资源高效的方法,利用大型预先训练的视觉模型来探测百万尺度的跨数据集对象。为了应对这些挑战,我们采用了我们使用预先训练的大型模型处理标签、等级意识损失设计和资源效率高模型培训的方法。我们的方法在Robust Vision Figh 2022(RVC 2022) 的物体探测轨道上排第二位。我们希望我们的详细研究将成为计算机视觉界类似问题的有用参考和替代方法。该代码可在 https://githrus/Deginub.com上查到。