Scene understanding is an active research area. Commercial depth sensors, such as Kinect, have enabled the release of several RGB-D datasets over the past few years which spawned novel methods in 3D scene understanding. More recently with the launch of the LiDAR sensor in Apple's iPads and iPhones, high quality RGB-D data is accessible to millions of people on a device they commonly use. This opens a whole new era in scene understanding for the Computer Vision community as well as app developers. The fundamental research in scene understanding together with the advances in machine learning can now impact people's everyday experiences. However, transforming these scene understanding methods to real-world experiences requires additional innovation and development. In this paper we introduce ARKitScenes. It is not only the first RGB-D dataset that is captured with a now widely available depth sensor, but to our best knowledge, it also is the largest indoor scene understanding data released. In addition to the raw and processed data from the mobile device, ARKitScenes includes high resolution depth maps captured using a stationary laser scanner, as well as manually labeled 3D oriented bounding boxes for a large taxonomy of furniture. We further analyze the usefulness of the data for two downstream tasks: 3D object detection and color-guided depth upsampling. We demonstrate that our dataset can help push the boundaries of existing state-of-the-art methods and it introduces new challenges that better represent real-world scenarios.
翻译:景色理解是一个积极的研究领域。 Kinect 等商业深度传感器在过去几年中使一些 RGB-D 数据集得以发布,这催生了3D现场理解的新方法。最近,随着苹果的iPads 和 iPhones 的LiDAR 传感器的启动,数百万人可以通过他们通常使用的设备获得高质量的 RGB-D 数据。这为计算机视野社区以及应用程序开发者打开了一个全新的现场理解新时代。现场了解的基础研究以及机器学习的进步现在可以影响人们的日常生活。然而,将这些场景理解方法转化为现实世界的经验需要额外的创新和发展。在本文中,我们引入了ARKitScenes。我们不仅第一个RGB-D数据集,而且现在有一个广泛可用的深度传感器,而且据我们所知,它也是最大的室内理解数据。除了移动设备的原始和处理数据外,A RKITScenes, 还包括使用一个更清晰的深度的深度地图,使用一个固定的激光扫描器, 以及手动地标的3D 方向的深度数据定位, 展示了我们现有的三向下游数据库的深度的定位, 。