Conversational agents show the promise to allow users to interact with mobile devices using language. However, to perform diverse UI tasks with natural language, developers typically need to create separate datasets and models for each specific task, which is expensive and effort-consuming. Recently, pre-trained large language models (LLMs) have been shown capable of generalizing to various downstream tasks when prompted with a handful of examples from the target task. This paper investigates the feasibility of enabling versatile conversational interactions with mobile UIs using a single LLM. We designed prompting techniques to adapt an LLM to mobile UIs. We experimented with four important modeling tasks that address various scenarios in conversational interaction. Our method achieved competitive performance on these challenging tasks without requiring dedicated datasets and training, offering a lightweight and generalizable approach to enable language-based mobile interaction.
翻译:对话代理机构展示了允许用户与使用语言的移动设备互动的希望。然而,为了执行与自然语言的多种界面任务,开发者通常需要为每个具体任务创建单独的数据集和模型,而每个具体任务都是昂贵和费力的。最近,经过预先培训的大语言模型(LLMS)在目标任务中举了几个例子,被证明能够概括到各种下游任务中。本文件调查了利用单一的LLM与移动UI进行多功能性互动的可行性。我们设计了使LLM适应移动UI的技术。我们试验了四种重要的模型任务,这些任务涉及在对话互动中的各种情景。我们的方法在不要求专门的数据集和培训的情况下实现了这些具有竞争力的任务,提供了一种轻量力和通用的方法,使基于语言的移动互动得以实现。