Understanding user interface (UI) functionality is a useful yet challenging task for both machines and people. In this paper, we investigate a machine learning approach for screen correspondence, which allows reasoning about UIs by mapping their elements onto previously encountered examples with known functionality and properties. We describe and implement a model that incorporates element semantics, appearance, and text to support correspondence computation without requiring any labeled examples. Through a comprehensive performance evaluation, we show that our approach improves upon baselines by incorporating multi-modal properties of UIs. Finally, we show three example applications where screen correspondence facilitates better UI understanding for humans and machines: (i) instructional overlay generation, (ii) semantic UI element search, and (iii) automated interface testing.
翻译:理解用户界面(UI)功能对于机器和人员来说都是一项有用但具有挑战性的任务。 在本文中,我们调查了一种用于筛选通信的机器学习方法,通过将其元素映射到已知功能和属性的先前遇到的例子中,可以对UI进行推理。我们描述并实施了一种模型,其中包括元素语义学、外观和文字,用以支持通信计算,而不需要任何标签示例。我们通过全面的业绩评估,表明我们的方法通过纳入UI的多式特性,在基线上有所改进。最后,我们展示了三种应用,即屏幕通信有助于更好地理解人类和机器的UI:(一) 教学覆盖生成,(二) 语义界面元素搜索,(三) 自动界面测试。