We create WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed to support the development of fair and transparent NLP models. WebQAmGaze includes webcam eye-tracking data from 332 participants naturally reading English, Spanish, and German texts. Each participant performs two reading tasks composed of five texts, a normal reading and an information-seeking task. After preprocessing the data, we find that fixations on relevant spans seem to indicate correctness when answering the comprehension questions. Additionally, we perform a comparative analysis of the data collected to high-quality eye-tracking data. The results show a moderate correlation between the features obtained with the webcam-ET compared to those of a commercial ET device. We believe this data can advance webcam-based reading studies and open a way to cheaper and more accessible data collection. WebQAmGaze is useful to learn about the cognitive processes behind question answering (QA) and to apply these insights to computational models of language understanding.
翻译:我们创建了WebQAmGaze,这是一个低成本的多语言阅读时眼动跟踪数据集,旨在支持开发公平透明的自然语言处理模型。WebQAmGaze包括332名参与者阅读英语、西班牙语和德语文本时的网络摄像头眼动数据。每个参与者完成两个阅读任务,包括普通阅读和信息搜索任务。经过预处理后,我们发现相关间隙的凝视似乎指示了回答理解问题的正确性。此外,我们对所收集的数据进行了与高质量眼动跟踪数据的比较分析。结果显示与商业ET设备获得的特征相比,使用网络摄像头得到的特征具有中等相关性。我们认为这些数据可以促进基于摄像头的阅读研究,并为更廉价、更易获取的数据收集开辟一条道路。WebQAmGaze对于了解问题回答背后的认知过程以及将这些见解应用于计算语言理解模型非常有用。