We present a new task and dataset, ScreenQA, for screen content understanding via question answering. The existing screen datasets are focused either on structure and component-level understanding, or on a much higher-level composite task such as navigation and task completion. We attempt to bridge the gap between these two by annotating 80,000+ question-answer pairs over the RICO dataset in hope to benchmark the screen reading comprehension capacity.
翻译:我们提出了一个新的任务和数据集,即ScreenQA,用于通过回答问题来理解屏幕内容。现有的屏幕数据集要么侧重于结构和组成部分层面的理解,要么侧重于更高级别的综合任务,如导航和任务完成。 我们试图通过在RICO数据集上加注80,000加问答对来弥合这两个任务之间的差距,以设定屏幕阅读能力的基准。