How can a robot efficiently extract a desired object from a shelf when it is fully occluded by other objects? Prior works propose geometric approaches for this problem but do not consider object semantics. Shelves in pharmacies, restaurant kitchens, and grocery stores are often organized such that semantically similar objects are placed close to one another. Can large language models (LLMs) serve as semantic knowledge sources to accelerate robotic mechanical search in semantically arranged environments? With Semantic Spatial Search on Shelves (S^4), we use LLMs to generate affinity matrices, where entries correspond to semantic likelihood of physical proximity between objects. We derive semantic spatial distributions by synthesizing semantics with learned geometric constraints. S^4 incorporates Optical Character Recognition (OCR) and semantic refinement with predictions from ViLD, an open-vocabulary object detection model. Simulation experiments suggest that semantic spatial search reduces the search time relative to pure spatial search by an average of 24% across three domains: pharmacy, kitchen, and office shelves. A manually collected dataset of 100 semantic scenes suggests that OCR and semantic refinement improve object detection accuracy by 35%. Lastly, physical experiments in a pharmacy shelf suggest 47.1% improvement over pure spatial search. Supplementary material can be found at https://sites.google.com/view/s4-rss/home.
翻译:机器人在被其它物体完全包围时,如何有效地从架子上提取理想对象? 先前的作品提出这一问题的几何方法,但不考虑物体的语义学。 药店、 餐厅厨房和杂货店的 Shelves 往往组织起来, 使语义相似的物体彼此相近。 大语言模型(LLLMs) 能够作为语义知识来源, 加速音义安排环境的机器人机械搜索? 在Shelves上进行语义空间搜索 (S4) 我们使用LLMs 来生成亲近矩阵, 其条目与物体之间物理接近的可能性相对应。 我们通过将语言语义和地理测量限制结合起来, 产生语义空间空间分布。 Sü4 包含光性特征识别(OCR) 和语义精细化, 包括VLD的预测, 开放语言库对象探测模型。 模拟实验表明, 语义空间搜索可以减少搜索时间相对于纯空间搜索的时间, 平均为24 %, 包括药房、 厨房、 和办公室架子架。 手动性搜索的精确度 。</s>