Although large language models (LLMs) have recently become effective tools for language-conditioned control in embodied systems, instability, slow convergence, and hallucinated actions continue to limit their direct application to continuous control. A modular neuro-symbolic control framework that clearly distinguishes between low-level motion execution and high-level semantic reasoning is proposed in this work. While a lightweight neural delta controller performs bounded, incremental actions in continuous space, a locally deployed LLM interprets symbolic tasks. We assess the suggested method in a planar manipulation setting with spatial relations between objects specified by language. Numerous tasks and local language models, such as Mistral, Phi, and LLaMA-3.2, are used in extensive experiments to compare LLM-only control, neural-only control, and the suggested LLM+DL framework. In comparison to LLM-only baselines, the results show that the neuro-symbolic integration consistently increases both success rate and efficiency, achieving average step reductions exceeding 70% and speedups of up to 8.83x while remaining robust to language model quality. The suggested framework enhances interpretability, stability, and generalization without any need of reinforcement learning or costly rollouts by controlling the LLM to symbolic outputs and allocating uninterpreted execution to a neural controller trained on artificial geometric data. These outputs show empirically that neuro-symbolic decomposition offers a scalable and principled way to integrate language understanding with ongoing control, this approach promotes the creation of dependable and effective language-guided embodied systems.
翻译:尽管大型语言模型(LLMs)最近已成为具身系统中语言条件控制的有效工具,但不稳定性、收敛速度慢以及动作幻觉等问题仍限制其直接应用于连续控制。本文提出了一种模块化的神经符号控制框架,明确区分低层运动执行与高层语义推理。该框架采用轻量级神经增量控制器在连续空间中执行有界的增量动作,同时通过本地部署的LLM解析符号化任务。我们在平面操作场景中评估了所提出的方法,该场景涉及由语言指定的物体间空间关系。通过大量实验,我们对比了纯LLM控制、纯神经控制以及所提出的LLM+DL框架在不同任务和本地语言模型(如Mistral、Phi和LLaMA-3.2)下的表现。结果表明,神经符号集成相较于纯LLM基线方法,持续提升了任务成功率和执行效率,平均步数减少超过70%,加速比最高达8.83倍,且对语言模型质量保持鲁棒性。所提出的框架通过将LLM约束至符号输出,并将非解析的执行任务分配给基于人工几何数据训练的神经控制器,在无需强化学习或昂贵试错的情况下,显著增强了系统的可解释性、稳定性和泛化能力。这些实验成果从实证角度证明,神经符号分解为语言理解与持续控制的融合提供了一种可扩展且具有理论依据的路径,有助于推动可靠高效的语言引导具身系统的构建。