There is growing interest in incorporating eye-tracking data and other implicit measures of human language processing into natural language processing (NLP) pipelines. The data from human language processing contain unique insight into human linguistic understanding that could be exploited by language models. However, many unanswered questions remain about the nature of this data and how it can best be utilized in downstream NLP tasks. In this paper, we present eyeStyliency, an eye-tracking dataset for human processing of stylistic text (e.g., politeness). We develop a variety of methods to derive style saliency scores over text using the collected eye dataset. We further investigate how this saliency data compares to both human annotation methods and model-based interpretability metrics. We find that while eye-tracking data is unique, it also intersects with both human annotations and model-based importance scores, providing a possible bridge between human- and machine-based perspectives. In downstream few-shot learning tasks, adding salient words to prompts generally improved style classification, with eye-tracking-based and annotation-based salient words achieving the highest accuracy.
翻译:人们越来越有兴趣将跟踪数据和其他隐含的人类语言处理措施纳入自然语言处理(NLP)管道。来自人类语言处理的数据包含了对语言理解的独特洞察力,而语言模型可以加以利用。然而,关于这些数据的性质以及如何在下游语言处理任务中最佳利用这些数据,仍有许多未回答的问题。在本文中,我们介绍了视觉跟踪数据,这是人类处理文体文字(如礼貌)的视觉跟踪数据集。我们开发了各种方法,利用所收集的眼数据集,在文本上得出风格突出的分数。我们进一步调查了这一突出数据与人类注解方法和基于模型的可解释性指标相比有何不同之处。我们发现,虽然眼跟踪数据是独特的,但它也与人的注释和基于模型的重要分数交叉,为人类和基于机器的观点提供了可能的桥梁。在下游的微小的学习任务中,我们添加了突出的词句,以促使普遍改进风格分类,以眼睛跟踪为基础和基于注释的突出字句达到最高精确度。