Acquiring food items with a fork poses an immense challenge to a robot-assisted feeding system, due to the wide range of material properties and visual appearances present across food groups. Deformable foods necessitate different skewering strategies than firm ones, but inferring such characteristics for several previously unseen items on a plate remains nontrivial. Our key insight is to leverage visual and haptic observations during interaction with an item to rapidly and reactively plan skewering motions. We learn a generalizable, multimodal representation for a food item from raw sensory inputs which informs the optimal skewering strategy. Given this representation, we propose a zero-shot framework to sense visuo-haptic properties of a previously unseen item and reactively skewer it, all within a single interaction. Real-robot experiments with foods of varying levels of visual and textural diversity demonstrate that our multimodal policy outperforms baselines which do not exploit both visual and haptic cues or do not reactively plan. Across 6 plates of different food items, our proposed framework achieves 71% success over 69 skewering attempts total. Supplementary material, datasets, code, and videos are available on our website: https://sites.google.com/view/hapticvisualnet-corl22/home
翻译:以叉叉子获取食物给机器人辅助的喂养系统带来了巨大的挑战,因为食品群之间存在着广泛的物质特性和视觉外观。 变形食品需要与坚固的食品相比采取不同的扭曲策略,但推断盘子上一些先前看不见的食品的这种特征仍然是非三角的。 我们的关键洞察力是在与一个项目互动时利用视觉和偶然的观察,以快速和被动地计划扭曲动作。 我们从原始感官输入的食品中了解到一个可概括化的、多式的食品项目代表形式,它为最佳的扭曲战略提供了信息。 根据这一表述,我们建议了一个零光框架,以感知一个先前看不见的食品的性感性能,并在单一的互动中以反应性的方式将其切换出来。 真实的机器人实验显示,与不同水平的视觉和文字多样性的食物进行不同的实验表明,我们的多式联运政策超越了不利用视觉和偶然的提示或不反应性计划的基准。 在6个不同的食品板上,我们提议的框架在69种树苗子上取得了71%的成功。 补充材料、 数据/ 视频网站 。