Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset based on real-world scenes. We then address this more challenging scenario with OPDFormer: a part-aware transformer architecture. Our experiments show that the OPDFormer architecture significantly outperforms prior work. The more realistic multiple-object scenarios we investigated remain challenging for all methods, indicating opportunities for future work.
翻译:可开启部件检测是指在单视图图像中检测物体的可开启部件,并预测相应的运动参数。先前的研究探讨了所有输入图像仅包含单个可开启物体的不切实际的情况。我们将此任务普及到具有多个物体的场景,每个物体都可能拥有可开启的部件,并基于真实场景创建相应的数据集。然后使用OPDFormer:一种部件感知变压器架构来解决这种更具挑战性的情况。我们的实验表明,OPDFormer架构显著优于先前的工作。我们研究的更加现实的多个物体场景对所有方法仍然具有挑战性,表明未来有机会开展更多研究。