Over the past few years, developing a broad, universal, and general-purpose computer vision system has become a hot topic. A powerful universal system would be capable of solving diverse vision tasks simultaneously without being restricted to a specific problem or a specific data domain, which is of great importance in practical real-world computer vision applications. This study pushes the direction forward by concentrating on the million-scale multi-domain universal object detection problem. The problem is not trivial due to its complicated nature in terms of cross-dataset category label duplication, label conflicts, and the hierarchical taxonomy handling. Moreover, what is the resource-efficient way to utilize emerging large pre-trained vision models for million-scale cross-dataset object detection remains an open challenge. This paper tries to address these challenges by introducing our practices in label handling, hierarchy-aware loss design and resource-efficient model training with a pre-trained large model. Our method is ranked second in the object detection track of Robust Vision Challenge 2022 (RVC 2022). We hope our detailed study would serve as an alternative practice paradigm for similar problems in the community. The code is available at https://github.com/linfeng93/Large-UniDet.
翻译:在过去几年里,开发一个广泛、普遍和通用的计算机愿景系统已成为一个热门话题;一个强大的通用系统将能够同时解决各种愿景任务,而不局限于特定问题或特定数据领域,这对于实际现实世界计算机愿景应用非常重要;这一研究通过集中研究百万个规模的多域通用物体探测问题,推动前进方向;由于跨数据分类标签重复、标签冲突和等级分类处理等复杂性质,问题并非微不足道;此外,利用新的大型预先培训的愿景模型来探测百万个规模的跨数据天体,资源效率高的方法仍然是一项公开的挑战;本文试图应对这些挑战,采用我们处理标签的做法、等级认知损失设计以及以预先培训大型模型进行资源效率高的模型培训。我们的方法在Robust Vision Fire 2022(RVC 2022)的物体探测轨道上排名第二。我们希望我们的详细研究将成为社区类似问题的替代做法范例。该代码可在 https://github.com/linfrangLa.