TV shows depict a wide variety of human behaviors and have been studied extensively for their potential to be a rich source of data for many applications. However, the majority of the existing work focuses on 2D recognition tasks. In this paper, we make the observation that there is a certain persistence in TV shows, i.e., repetition of the environments and the humans, which makes possible the 3D reconstruction of this content. Building on this insight, we propose an automatic approach that operates on an entire season of a TV show and aggregates information in 3D; we build a 3D model of the environment, compute camera information, static 3D scene structure and body scale information. Then, we demonstrate how this information acts as rich 3D context that can guide and improve the recovery of 3D human pose and position in these environments. Moreover, we show that reasoning about humans and their environment in 3D enables a broad range of downstream applications: re-identification, gaze estimation, cinematography and image editing. We apply our approach on environments from seven iconic TV shows and perform an extensive evaluation of the proposed system.
翻译:电视剧描绘了各种各样的人类行为,并且已经对其潜力进行了广泛研究,使其有可能成为许多应用的丰富数据来源。然而,大部分现有工作都侧重于2D识别任务。在本文中,我们观察到电视节目中存在某种持久性,即环境和人的重复,这使得3D内容的3D重建成为可能。基于这一见解,我们建议了一种自动方法,在整个电视节目和3D综合信息季节运作;我们建立了一个环境的3D模型,计算相机信息、静态的 3D 场景结构和体积信息。然后,我们展示了这些信息如何作为丰富的3D背景来指导和改进3D人在这些环境中的外形和位置的恢复。此外,我们展示了有关3D中人类及其环境的推理能够使一系列广泛的下游应用:重新定位、视觉估计、电影摄影和图像编辑。我们从七个图象电视节目中对环境应用了我们的方法,并对拟议的系统进行了广泛的评估。