Acquiring food items with a fork poses an immense challenge to a robot-assisted feeding system, due to the wide range of material properties and visual appearances present across food groups. Deformable foods necessitate different skewering strategies than firm ones, but inferring such characteristics for several previously unseen items on a plate remains nontrivial. Our key insight is to leverage visual and haptic observations during interaction with an item to rapidly and reactively plan skewering motions. We learn a generalizable, multimodal representation for a food item from raw sensory inputs which informs the optimal skewering strategy. Given this representation, we propose a zero-shot framework to sense visuo-haptic properties of a previously unseen item and reactively skewer it, all within a single interaction. Real-robot experiments with foods of varying levels of visual and textural diversity demonstrate that our multimodal policy outperforms baselines which do not exploit both visual and haptic cues or do not reactively plan. Across 6 plates of different food items, our proposed framework achieves 71\% success over 69 skewering attempts total. Supplementary material, datasets, code, and videos can be found on our $\href{https://sites.google.com/view/hapticvisualnet-corl22/home}{website}$.
翻译:以叉子获取食物给机器人辅助的喂养系统带来了巨大的挑战,因为食品组之间存在着广泛的物质特性和视觉外观。 变形食品需要采用与坚固的不同的战略,但推断盘子上一些先前不见的物品的这种特性仍然是非三角的。 我们的关键洞察力是在与一个项目互动时利用视觉和偶然的观察,以快速和被动地计划扭曲动作。 我们从原始感官输入的食品中了解到一个可概括的、多式的表示方式,以告知最佳的偏移战略。 根据这一表述,我们提议了一个零发式框架,以感知过去所见物品的反光性特性,并在单一的互动中以被动方式将其切换出来。 真实的机器人实验显示,与不同水平的视觉和文字多样性的食物进行不同程度的实验表明,我们的多式联运政策超越了不利用视觉和偶然的提示或不被动计划的基准。 在6个不同的食品组中,我们提议的框架可以在总共69个Skewerfer$$/stemite 尝试中成功。 补充材料、 数据、 和代码、 我们的图像、 我们的图像和网络/ 。