Representing the semantics of GUI screens and components is crucial to data-driven computational methods for modeling user-GUI interactions and mining GUI designs. Existing GUI semantic representations are limited to encoding either the textual content, the visual design and layout patterns, or the app contexts. Many representation techniques also require significant manual data annotation efforts. This paper presents Screen2Vec, a new self-supervised technique for generating representations in embedding vectors of GUI screens and components that encode all of the above GUI features without requiring manual annotation using the context of user interaction traces. Screen2Vec is inspired by the word embedding method Word2Vec, but uses a new two-layer pipeline informed by the structure of GUIs and interaction traces and incorporates screen- and app-specific metadata. Through several sample downstream tasks, we demonstrate Screen2Vec's key useful properties: representing between-screen similarity through nearest neighbors, composability, and capability to represent user tasks.
翻译:代表 GUI 屏幕和组件的语义表达方式对于模拟用户- GUI 互动和挖掘 GUI 设计的数据驱动计算方法至关重要。 GUI 现有的语义表达方式仅限于将文字内容、视觉设计和布局模式或应用程序环境编码。 许多表达方式还需要大量的手工数据说明努力。 本文展示了 Screen2Vec, 这是一种自我监督的新技术, 用于在嵌入 GUI 屏幕的矢量和组件中生成演示, 将所有上述 GUI 特征编码而无需使用用户互动跟踪背景进行人工注释。 Screen2Vec 受Word2Vec 嵌入方法的词的启发, 使用由图形和互动跟踪结构以及包含屏幕和应用程序特定元数据的新双层管道。 通过一些下游样本, 我们展示了Screen2Vec 的关键有用属性: 通过最近的邻居代表屏幕之间的相似性、 可容性和代表用户任务的能力。