In this paper, we present a transformer-based architecture, namely TF-Grasp, for robotic grasp detection. The developed TF-Grasp framework has two elaborate designs making it well suitable for visual grasping tasks. The first key design is that we adopt the local window attention to capture local contextual information and detailed features of graspable objects. Then, we apply the cross window attention to model the long-term dependencies between distant pixels. Object knowledge, environmental configuration, and relationships between different visual entities are aggregated for subsequent grasp detection. The second key design is that we build a hierarchical encoder-decoder architecture with skip-connections, delivering shallow features from encoder to decoder to enable a multi-scale feature fusion. Due to the powerful attention mechanism, the TF-Grasp can simultaneously obtain the local information (i.e., the contours of objects), and model long-term connections such as the relationships between distinct visual concepts in clutter. Extensive computational experiments demonstrate that the TF-Grasp achieves superior results versus state-of-art grasping convolutional models and attain a higher accuracy of 97.99% and 94.6% on Cornell and Jacquard grasping datasets, respectively. Real-world experiments using a 7DoF Franka Emika Panda robot also demonstrate its capability of grasping unseen objects in a variety of scenarios. The code and pre-trained models will be available at https://github.com/WangShaoSUN/grasp-transformer
翻译:在本文中, 我们展示了一个基于变压器的架构, 即 TF- Grasp, 用于机器人抓取检测 。 开发的 TF- Grasp 框架有两个精密的设计设计, 使得它非常适合视觉抓取任务。 第一个关键设计是, 我们采用本地窗口的注意来捕捉本地背景信息和可抓取对象的详细特性。 然后, 我们应用跨窗口的注意来模拟遥远的像素之间的长期依赖性。 对象知识、 环境配置和不同视觉实体之间的关系会被聚合起来, 以便随后探测。 第二个关键设计是, 我们建立一个带有跳接连接的高级编码- decoder 结构, 从编码到解析器, 提供浅的特性, 以允许多尺度的特性聚合。 由于强大的关注机制, TF- Grasp 能够同时获取本地信息( 即天体轮的轮廓) 。 对象知识、 不同视觉模型之间的关系将会被汇总起来。 广泛计算实验显示, TF- Grasp 将取得更优的结果, 而不是状态的连接连接连接, 。 将显示 Squal- killalalalalal- 模型和 Cladeal- lab- ladeal- lab- sal ladeal ladeal ladeal labal labal lab acal acal ladeal lab lab ladeal acal acal acreal acre lab lab labal acre lades lab ladeal labal acal abre.