We present a dataset of 1000 video sequences of human portraits recorded in real and uncontrolled conditions by using a handheld smartphone accompanied by an external high-quality depth camera. The collected dataset contains 200 people captured in different poses and locations and its main purpose is to bridge the gap between raw measurements obtained from a smartphone and downstream applications, such as state estimation, 3D reconstruction, view synthesis, etc. The sensors employed in data collection are the smartphone's camera and Inertial Measurement Unit (IMU), and an external Azure Kinect DK depth camera software synchronized with sub-millisecond precision to the smartphone system. During the recording, the smartphone flash is used to provide a periodic secondary source of lightning. Accurate mask of the foremost person is provided as well as its impact on the camera alignment accuracy. For evaluation purposes, we compare multiple state-of-the-art camera alignment methods by using a Motion Capture system. We provide a smartphone visual-inertial benchmark for portrait capturing, where we report results for multiple methods and motivate further use of the provided trajectories, available in the dataset, in view synthesis and 3D reconstruction tasks.
翻译:我们展示了在真实和不受控制的条件下录制的1000个人类肖像视频序列的数据集,使用手持智能手机,辅之以外部高品质深水照相机。所收集的数据集包含200人在不同姿势和地点捕获的200人,其主要目的是缩小从智能手机和下游应用获得的原始测量数据之间的差距,例如国家估计、3D重建、视觉合成等。数据收集中使用的传感器是智能手机的相机和惯性测量装置(IMU),以及外部Azure Kinect DK深度相机软件,与智能手机系统的亚毫米精度同步。在录制时,智能手机闪电用于提供定期的二次闪电源。提供最优秀人物的准确面罩,以及其对相机校准准确性的影响。在评估中,我们通过移动抓取系统比较了多种状态的相机校准方法。我们为肖像拍摄提供了一个智能手机直线基准,我们报告多种方法的结果,并激励进一步使用在图像集、合成和3重建任务中提供的轨迹。