Given an image or a video captured from a monocular camera, amodal layout estimation is the task of predicting semantics and occupancy in bird's eye view. The term amodal implies we also reason about entities in the scene that are occluded or truncated in image space. While several recent efforts have tackled this problem, there is a lack of standardization in task specification, datasets, and evaluation protocols. We address these gaps with AutoLay, a dataset and benchmark for amodal layout estimation from monocular images. AutoLay encompasses driving imagery from two popular datasets: KITTI and Argoverse. In addition to fine-grained attributes such as lanes, sidewalks, and vehicles, we also provide semantically annotated 3D point clouds. We implement several baselines and bleeding edge approaches, and release our data and code.
翻译:根据从单镜头摄像头中摄取的图像或视频,一个模式版面估计是预测语义和在鸟眼眼中的占用情况的任务。 现代术语意味着我们还要了解现场中在图像空间中隐蔽或脱节的实体。 虽然最近作出了一些努力来解决这个问题,但在任务规格、数据集和评价协议方面缺乏标准化。 我们用AutoLay来解决这些差距, AutoLay是一个数据集,用单眼图像进行模式版面估计的基准。 AutoLay包含两个受欢迎的数据集: KITTI 和 Argoverse 的驱动图像。 除了细微的属性,例如行道、人行道和车辆之外,我们还提供带有注释的三维点云。我们实施了若干基线和出血边缘方法,并发布我们的数据和代码。