Grounding natural language instructions on the web to perform previously unseen tasks enables accessibility and automation. We introduce a task and dataset to train AI agents from open-domain, step-by-step instructions originally written for people. We build RUSS (Rapid Universal Support Service) to tackle this problem. RUSS consists of two models: First, a BERT-LSTM with pointers parses instructions to ThingTalk, a domain-specific language we design for grounding natural language on the web. Then, a grounding model retrieves the unique IDs of any webpage elements requested in ThingTalk. RUSS may interact with the user through a dialogue (e.g. ask for an address) or execute a web operation (e.g. click a button) inside the web runtime. To augment training, we synthesize natural language instructions mapped to ThingTalk. Our dataset consists of 80 different customer service problems from help websites, with a total of 741 step-by-step instructions and their corresponding actions. RUSS achieves 76.7% end-to-end accuracy predicting agent actions from single instructions. It outperforms state-of-the-art models that directly map instructions to actions without ThingTalk. Our user study shows that RUSS is preferred by actual users over web navigation.
翻译:在网络上设置自然语言指令以完成先前的看不见任务,可以实现无障碍和自动化。 我们引入了一个任务和数据集, 用于从最初为人们编写的开放域、 分步骤指令中培训AI 代理。 我们为解决这一问题而建立 RUSS( 快速通用支持服务) 。 RUS 由两种模式组成: 首先, 由指针组成的 BERT- LSTM 将指令转换到 ThingTalk 。 我们的数据集由帮助网站的80种不同的客户服务问题组成, 共有741个分步骤指令及其相应行动。 RUSS 可以通过对话( 如要求地址) 与用户互动, 或者在网络运行时段内执行网络操作( 如点击按钮) 。 为了强化培训, 我们将自然语言指令合成到 ThingTalk 。 我们的数据集由帮助网站的80种不同的客户服务问题组成, 共有 741个分步骤指令及其相应的行动。 RUSS 能够通过单一指令( 要求) 实现76. 7% 端到端端端精确性代理动作来预测用户的实际指令。 它通过直接显示我们用户的网络指令的状态 。 它显示了我们用户的状态。