Natural language provides a powerful modality to program robots to perform temporal tasks. Linear temporal logic (LTL) provides unambiguous semantics for formal descriptions of temporal tasks. However, existing approaches cannot accurately and robustly translate English sentences to their equivalent LTL formulas in unseen environments. To address this problem, we propose Lang2LTL, a novel modular system that leverages pretrained large language models to first extract referring expressions from a natural language command, then ground the expressions to real-world landmarks and objects, and finally translate the command into an LTL task specification for the robot. It enables any robotic system to interpret natural language navigation commands without additional training, provided that it tracks its position and has a semantic map with landmarks labeled with free-form text. We demonstrate the state-of-the-art ability to generalize to multi-scale navigation domains such as OpenStreetMap (OSM) and CleanUp World (a simulated household environment). Lang2LTL achieves an average accuracy of 88.4% in translating challenging LTL formulas in 22 unseen OSM environments as evaluated on a new corpus of over 10,000 commands, 22 times better than the previous SoTA. Without modification, the best performing Lang2LTL model on the OSM dataset can translate commands in CleanUp World with 82.8% accuracy. As a part of our proposed comprehensive evaluation procedures, we collected a new labeled dataset of English commands representing 2,125 unique LTL formulas, the largest ever dataset of natural language commands to LTL specifications for robotic tasks with the most diverse LTL formulas, 40 times more than previous largest dataset. Finally, we integrated Lang2LTL with a planner to command a quadruped mobile robot to perform multi-step navigational tasks in an analog real-world environment created in the lab.
翻译:自然语言为编程机器人执行时间任务提供了强大的模式。 线性时间逻辑( LTL) 为时间任务的正式描述提供了清晰的语义。 但是, 现有方法无法准确和有力地将英语句转换为在隐形环境中的等效 LTL 公式。 为了解决这个问题, 我们提议 Lang2LTL, 这是一种新型模块化系统, 将大型语言模型用于首先从自然语言命令中提取引用表达方式, 然后将表达方式定位为真实世界的标志和对象, 并最终将命令转换为机器人的 LTL 任务规格。 它使任何机器人系统能够解释自然语言导航命令, 而无需接受额外培训, 只要它能够跟踪其位置, 并且拥有一个带有自由格式文本文本的标志的语义图。 我们展示了最先进的语义图, 将OStreetreetMMMap (OSMM) (OSMM) 和 ClecREWD World World (模拟家庭环境), 兰特LTLTL (L) 将一个比我们最初的货币指令中最高级的LTLT) 格式化的数据转换到一个新的版本, 。 将一个比之前的LTLTLT OLTLLT OLLT 程序要更好。