Multi-view Detection (MVD) is highly effective for occlusion reasoning in a crowded environment. While recent works using deep learning have made significant advances in the field, they have overlooked the generalization aspect, which makes them \emph{impractical for real-world deployment}. The key novelty of our work is to \emph{formalize} three critical forms of generalization and \emph{propose experiments to evaluate them}: generalization with i) a varying number of cameras, ii) varying camera positions, and finally, iii) to new scenes. We find that existing state-of-the-art models show poor generalization by overfitting to a single scene and camera configuration. To address the concerns: (a) we propose a novel Generalized MVD (GMVD) dataset, assimilating diverse scenes with changing daytime, camera configurations, varying number of cameras, and (b) we discuss the properties essential to bring generalization to MVD and propose a barebones model to incorporate them. We perform a comprehensive set of experiments on the WildTrack, MultiViewX, and the GMVD datasets to motivate the necessity to evaluate the generalization abilities of MVD methods and to demonstrate the efficacy of the proposed approach. The code and the proposed dataset can be found at \url{https:github.com/jeetv/GMVD}
翻译:多视图检测(MVD)对于在拥挤环境中进行隔离推理非常有效。 尽管最近利用深层次学习的工程在实地取得了显著进步, 但它们忽略了一般化方面, 这使得它们成为现实世界部署的缩略语。 我们工作的关键新颖之处是将三种关键的一般化和模拟实验形式化为评估它们: 概括化与i) 不同数量的相机, (ii) 不同的相机位置, 最后, (iii) 进入新场景。 我们发现, 现有最先进的模型通过过度适应单一场景和相机配置而显示不甚普遍化。 为了解决这些关切:(a) 我们提出一个新的通用的MVD(GMD)数据集(GMVD)新颖的数据集, 将不同的场景与日间变化、摄像配置、摄影机数量不同, 以及(b) 我们讨论为将一般 GMD 的缩略图化和GMV 数据演示方法的属性。 我们对WardTracrack、 MIVX 和GMV 的通用数据展示方法进行了一套全面的实验。