Screen recordings of mobile apps are a popular and readily available way for users to share how they interact with apps, such as in online tutorial videos, user reviews, or as attachments in bug reports. Unfortunately, both people and systems can find it difficult to reproduce touch-driven interactions from video pixel data alone. In this paper, we introduce an approach to extract and replay user interactions in videos of mobile apps, using only pixel information in video frames. To identify interactions, we apply heuristic-based image processing and convolutional deep learning to segment screen recordings, classify the interaction in each segment, and locate the interaction point. To replay interactions on another device, we match elements on app screens using UI element detection. We evaluate the feasibility of our pixel-based approach using two datasets: the Rico mobile app dataset and a new dataset of 64 apps with both iOS and Android versions. We find that our end-to-end approach can successfully replay a majority of interactions (iOS--84.1%, Android--78.4%) on different devices, which is a step towards supporting a variety of scenarios, including automatically annotating interactions in existing videos, automated UI testing, and creating interactive app tutorials.
翻译:移动应用程序的屏幕记录是用户分享他们如何与应用程序互动的一种流行和现成的方式,例如在线辅导视频、用户审查或作为错误报告中的附件。 不幸的是, 人和系统都可能发现单靠视频像素数据很难复制触动互动。 在本文中, 我们引入了一种方法来提取和重播移动应用程序视频中的用户互动, 仅使用视频框中的像素信息。 为了识别互动, 我们应用基于超自然图像的图像处理, 并进化地深学习到部分屏幕记录, 对每个部分的互动进行分类, 并定位互动点 。 为了在另一个设备上重新播放互动, 我们用界面元素检测来匹配应用程序屏幕上的元素 。 我们使用两个数据集来评估我们的像素基方法的可行性 : Rico 移动应用程序数据集和64个应用程序的新数据集, 使用像素和机床的像素版本。 我们发现我们的端到端到方法可以成功重新播放不同设备上的大多数互动( OS-84.1%, 和roid- 78.4%), 将互动点定位于另一个设备上, 正在一步地支持交互式测试, 包括自动测试各种视频。