Segmenting unseen objects is a critical task in many different domains. For example, a robot may need to grasp an unseen object, which means it needs to visually separate this object from the background and/or other objects. Mean shift clustering is a common method in object segmentation tasks. However, the traditional mean shift clustering algorithm is not easily integrated into an end-to-end neural network training pipeline. In this work, we propose the Mean Shift Mask Transformer (MSMFormer), a new transformer architecture that simulates the von Mises-Fisher (vMF) mean shift clustering algorithm, allowing for the joint training and inference of both the feature extractor and the clustering. Its central component is a hypersphere attention mechanism, which updates object queries on a hypersphere. To illustrate the effectiveness of our method, we apply MSMFormer to Unseen Object Instance Segmentation, which yields a new state-of-the-art of 87.3 Boundary F-meansure on the real-world Object Clutter Indoor Dataset (OCID). Code is available at https://github.com/YoungSean/UnseenObjectsWithMeanShift
翻译:隐藏的物体是许多不同领域的关键任务。 例如, 机器人可能需要掌握一个看不见的物体, 这意味着它需要将这个对象与背景和/ 或其他对象进行视觉分离。 平均转移组群是物体分割任务中常见的方法。 但是, 传统的平均转移组群算法不容易融入一个端到端的神经网络培训管道。 在此工作中, 我们建议使用“ 平均移动面面变换器( MSMFormer) ” ( MSMFormer) ), 这是一种新型变压器结构, 模拟 von Mises- Fisher ( VMFM) 代表的转移组群算法, 允许对特性提取器和组合进行联合培训和推断。 其核心组件是一个超精密的注意机制, 用于更新超精密的物体查询 。 为了说明我们的方法的有效性, 我们应用 MSMFormer 来不看对象区块剖面的剖面图, 产生一个新的873 边界法质, 用于真实世界的物体 Clutter Indod 数据集( OCID) 。 代码可在 http://github. com/ Youse- shean/ UnObjusion/ Opject.