We present WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed as the first webcam-based eye-tracking corpus of reading to support the development of explainable computational language processing models. WebQAmGaze includes webcam eye-tracking data from 600 participants of a wide age range naturally reading English, German, Spanish, and Turkish texts. Each participant performs two reading tasks composed of five texts each, a normal reading and an information-seeking task, followed by a comprehension question. We compare the collected webcam data to high-quality eye-tracking recordings. The results show a moderate to strong correlation between the eye movement measures obtained with the webcam compared to those obtained with a commercial eye-tracking device. When validating the data, we find that higher fixation duration on relevant text spans accurately indicates correctness when answering the corresponding questions. This dataset advances webcam-based reading studies and opens avenues to low-cost and diverse data collection. WebQAmGaze is beneficial to learn about the cognitive processes behind question-answering and to apply these insights to computational models of language understanding.
翻译:暂无翻译