Multi-view Detection (MVD) is highly effective for occlusion reasoning in a crowded environment. While recent works using deep learning have made significant advances in the field, they have overlooked the generalization aspect, which makes them impractical for real-world deployment. The key novelty of our work is to formalize three critical forms of generalization and propose experiments to evaluate them: generalization with i) a varying number of cameras, ii) varying camera positions, and finally, iii) to new scenes. We find that existing state-of-the-art models show poor generalization by overfitting to a single scene and camera configuration. To address the concerns: (a) we propose a novel Generalized MVD (GMVD) dataset, assimilating diverse scenes with changing daytime, camera configurations, varying number of cameras, and (b) we discuss the properties essential to bring generalization to MVD and propose a barebones model to incorporate them. We perform a comprehensive set of experiments on the WildTrack, MultiViewX, and the GMVD datasets to motivate the necessity to evaluate the generalization abilities of MVD methods and to demonstrate the efficacy of the proposed approach. The code and the proposed dataset can be found at https://github.com/jeetv/GMVD
翻译:多视探测(MVD)对于在拥挤环境中进行隔离推理非常有效。虽然最近利用深层学习的工程在实地取得了显著进展,但忽视了一般化方面,这使得它们不适合于现实世界的部署。我们工作的关键新颖之处是正式确定三种关键的一般化形式,并提出对其进行评估的实验:一)不同数量的照相机,二)不同摄影机的位置,最后,三)新场景。我们发现,现有最先进的模型由于过度适应单一场景和相机配置而显示的概括化程度很差。为了解决这些关切:(a) 我们提出一个新的通用MVD(GMVD)数据集(GMVD),以变化的白天、摄像头配置、不同摄影机数量不同的方式模拟各种场景,以及(b) 我们讨论使MVD(MVD)的普及性能,并提议一个光谱模型来纳入这些场景。我们在WardTrac、MUVX和GVD数据集上进行全面的实验,以激发评估拟议通用的MVD(MV)方法的通用能力。