Eye movement recordings from reading are one of the richest signals of human language processing. Corpora of eye movements during reading of contextualized running text is a way of making such records available for natural language processing purposes. Such corpora already exist in some languages. We present CopCo, the Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts. It is the first eye tracking corpus of its kind for the Danish language. CopCo includes 1,832 sentences with 34,897 tokens of Danish text extracted from a collection of speech manuscripts. This first release of the corpus contains eye tracking data from 22 participants. It will be extended continuously with more participants and texts from other genres. We assess the data quality of the recorded eye movements and find that the extracted features are in line with related research. The dataset available here: https://osf.io/ud8s5/.
翻译:阅读记录是人类语言处理最丰富的信号之一。阅读背景化运行文本时的眼睛运动是使这种记录用于自然语言处理的一种方式。这种记录已经存在于某些语言中。我们提供丹麦文本自然读物的CopCoCo,即哥本哈根眼睛跟踪记录,这是丹麦语文本首个类型的眼睛跟踪资料库。Coco包括1 832个判决,其中34 897个丹麦语文本符号摘自演讲稿集。该材料的第一版载有22名参与者的眼跟踪数据。它将不断扩大,有更多的参与者和其他群体提供这些记录。我们评估所记录的眼睛运动的数据质量,发现所提取的特征符合相关研究。这里提供的数据集:https://osfio/ud8s5/。