This work presents a next-generation human-robot interface that can infer and realize the user's manipulation intention via sight only. Specifically, we develop a system that integrates near-eye-tracking and robotic manipulation to enable user-specified actions (e.g., grasp, pick-and-place, etc), where visual information is merged with human attention to create a mapping for desired robot actions. To enable sight guided manipulation, a head-mounted near-eye-tracking device is developed to track the eyeball movements in real-time, so that the user's visual attention can be identified. To improve the grasping performance, a transformer based grasp model is then developed. Stacked transformer blocks are used to extract hierarchical features where the volumes of channels are expanded at each stage while squeezing the resolution of feature maps. Experimental validation demonstrates that the eye-tracking system yields low gaze estimation error and the grasping system yields promising results on multiple grasping datasets. This work is a proof of concept for gaze interaction-based assistive robot, which holds great promise to help the elder or upper limb disabilities in their daily lives. A demo video is available at \url{https://www.youtube.com/watch?v=yuZ1hukYUrM}.
翻译:这项工作展示了下一代人- 机器人界面, 能够通过视觉推断并实现用户的操纵意图。 具体地说, 我们开发了一个系统, 将近视跟踪和机器人操作结合起来, 使用户指定的行动( 例如, 抓取、 选取和位置等) 能够将视觉信息与人类关注结合起来, 以绘制想要的机器人动作的映射图。 为了能够进行视觉指导操作, 正在开发一个头挂近视跟踪设备, 实时跟踪眼球的移动, 从而可以识别用户的视觉关注。 为了改进抓取性能, 然后开发了一个基于变压器的抓取模型。 固定的变压器块被用来提取等级特征, 使每个阶段的频道数量在挤压地图时都能够扩展。 实验性验证显示, 眼睛跟踪系统在多个抓取数据集上产生低视力估计错误, 抓取系统产生有希望的结果 。 这项工作证明视觉互动辅助机器人的概念, 具有帮助长或上下肢残疾的伟大承诺 。 A detual yurv=YUZ 。