We create WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed to support the development of fair and transparent NLP models. WebQAmGaze includes webcam eye-tracking data from 332 participants naturally reading English, Spanish, and German texts. Each participant performs two reading tasks composed of five texts, a normal reading and an information-seeking task. After preprocessing the data, we find that fixations on relevant spans seem to indicate correctness when answering the comprehension questions. Additionally, we perform a comparative analysis of the data collected to high-quality eye-tracking data. The results show a moderate correlation between the features obtained with the webcam-ET compared to those of a commercial ET device. We believe this data can advance webcam-based reading studies and open a way to cheaper and more accessible data collection. WebQAmGaze is useful to learn about the cognitive processes behind question answering (QA) and to apply these insights to computational models of language understanding.
翻译:我们创建了WebQAmGaze,这是一种多语言低成本的阅读眼动跟踪数据集,旨在支持公平透明的自然语言处理模型的开发。 WebQAmGaze包括332名参与者阅读英语、西班牙语和德语文本时通过网络摄像头采集的眼动数据。每个参与者执行两个阅读任务,包括正常阅读和信息搜索任务,每个任务有五个文本。在数据预处理后,我们发现关注相关范围的凝视似乎能够表明回答理解问题的正确性。此外,我们对收集的数据进行了高质量眼动跟踪数据的比较分析。结果显示,与商用ET设备获得的特征相比,使用Webcam-ET获得的特征具有适度的相关性。我们相信这些数据可以推进基于网络摄像头的阅读研究,并为更便宜、更易获得的数据收集开辟道路。WebQAmGaze有助于了解问答背后的认知过程,并将这些见解应用于自然语言理解的计算模型中。