Many basic indoor activities such as eating or writing are always conducted upon different tabletops (e.g., coffee tables, writing desks). It is indispensable to understanding tabletop scenes in 3D indoor scene parsing applications. Unfortunately, it is hard to meet this demand by directly deploying data-driven algorithms, since 3D tabletop scenes are rarely available in current datasets. To remedy this defect, we introduce TO-Scene, a large-scale dataset focusing on tabletop scenes, which contains 20,740 scenes with three variants. To acquire the data, we design an efficient and scalable framework, where a crowdsourcing UI is developed to transfer CAD objects from ModelNet and ShapeNet onto tables from ScanNet, then the output tabletop scenes are simulated into real scans and annotated automatically. Further, a tabletop-aware learning strategy is proposed for better perceiving the small-sized tabletop instances. Notably, we also provide a real scanned test set TO-Real to verify the practical value of TO-Scene. Experiments show that the algorithms trained on TO-Scene indeed work on the realistic test data, and our proposed tabletop-aware learning strategy greatly improves the state-of-the-art results on both 3D semantic segmentation and object detection tasks. Dataset and code are available at https://github.com/GAP-LAB-CUHK-SZ/TO-Scene.
翻译:饮食或写作等许多基本的室内活动总是在不同桌面上进行(如咖啡桌、写字桌等)。对于理解3D室内场景分析应用程序中的桌面场景来说,这是不可或缺的。 不幸的是,直接部署数据驱动算法很难满足这一需求,因为当前数据集中很少提供3D桌面场景。为了纠正这一缺陷,我们引入了以桌面场景为重点的大型数据集To-Scene,这是一个大型数据集,包含20,740个场景,并有三个变量。为了获取数据,我们设计了一个高效和可扩展的框架,在这个框架中,开发了一个众包界面,将CAD 对象从3D 网络和 ShapeNet 转移到扫描网络的表格,然后将输出桌面场景模拟成真实的扫描和自动附加说明。此外,还提出了一个桌面认知学习策略,以更好地了解小规模的桌面场景。我们还提供了一套真正的扫描测试工具,用于核实TO-Sene-CUe的实用价值。实验显示,在目标Scen-S-dereal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-sal-stal-stal-stal-deal-stal-stal-stal-stal-stal-stal-stal-stal-stal-stal-stal-s。