智能空间——利用大型语言模型探索更智能的智能家居 ("Get ready for a party": Exploring smarter smart spaces with help from large language models)

The right response to someone who says "get ready for a party" is deeply influenced by meaning and context. For a smart home assistant (e.g., Google Home), the ideal response might be to survey the available devices in the home and change their state to create a festive atmosphere. Current practical systems cannot service such requests since they require the ability to (1) infer meaning behind an abstract statement and (2) map that inference to a concrete course of action appropriate for the context (e.g., changing the settings of specific devices). In this paper, we leverage the observation that recent task-agnostic large language models (LLMs) like GPT-3 embody a vast amount of cross-domain, sometimes unpredictable contextual knowledge that existing rule-based home assistant systems lack, which can make them powerful tools for inferring user intent and generating appropriate context-dependent responses during smart home interactions. We first explore the feasibility of a system that places an LLM at the center of command inference and action planning, showing that LLMs have the capacity to infer intent behind vague, context-dependent commands like "get ready for a party" and respond with concrete, machine-parseable instructions that can be used to control smart devices. We furthermore demonstrate a proof-of-concept implementation that puts an LLM in control of real devices, showing its ability to infer intent and change device state appropriately with no fine-tuning or task-specific training. Our work hints at the promise of LLM-driven systems for context-awareness in smart environments, motivating future research in this area.

翻译：回应“Get ready for a party”这种语句，合适的行动是深受意义和上下文影响的。对于一个智能家居助手（例如Google Home），理想的回答可能是调查家中可用的设备并改变它们的状态以创造欢乐氛围。现有的实际系统无法提供这样的服务，因为它们需要能力（1）推断抽象陈述背后的意义，以及（2）将此推断映射到适合上下文的具体行动（例如更改特定设备的设置）。在本文中，我们利用观察到的事实，即最近的任务无关型大型语言模型（LLM）如GPT-3包含了大量跨域、有时不可预测的上下文知识，这些知识是现有基于规则的家庭助手系统所缺乏的，这使它们成为推断用户意图并在智能家居交互过程中生成适当上下文依赖响应的强大工具。我们首先探讨了将LLM置于命令推断和动作规划的中心的系统的可行性，展示了LLM具有推断含糊、上下文相关的命令（如“get ready for a party”）背后意图，并用可机器解析的具体指令来做出回应的能力，可用于控制智能设备。此外，我们还展示了一个概念验证实现，将LLM置于真实设备的控制中，显示其能力在无任何精细调整或任务特定训练的情况下推断意图并适当改变设备状态。我们的工作暗示了LLM驱动系统在智能环境中的上下文感知的前景，为未来的研究提供了动力。