This paper is interested in investigating whether human gaze signals can be leveraged to improve state-of-the-art search engine performance and how to incorporate this new input signal marked by human attention into existing neural retrieval models. In this paper, we propose GazBy ({\bf Gaz}e-based {\bf B}ert model for document relevanc{\bf y}), a light-weight joint model that integrates human gaze fixation estimation into transformer models to predict document relevance, incorporating more nuanced information about cognitive processing into information retrieval (IR). We evaluate our model on the Text Retrieval Conference (TREC) Deep Learning (DL) 2019 and 2020 Tracks. Our experiments show encouraging results and illustrate the effective and ineffective entry points for using human gaze to help with transformer-based neural retrievers. With the rise of virtual reality (VR) and augmented reality (AR), human gaze data will become more available. We hope this work serves as a first step exploring using gaze signals in modern neural search engines.
翻译:本文有兴趣研究人类凝视信号是否可以用来改进最先进的搜索引擎性能,以及如何将人类关注的这一新输入信号纳入现有的神经检索模型。在本论文中,我们提议GazBy (@bf Gaz}e) 基于 {bf B}ert 模型,用于文档 relevanc {bf y},这是一个轻量级的联合模型,将人类凝视定估计纳入变压模型,以预测文件的相关性,将更多关于认知处理的精细信息纳入信息检索(IR)。我们评价了我们关于文本检索会议(TREC)2019年和2020年深层学习(DL)的模型。我们的实验显示了令人鼓舞的结果,并展示了利用人类凝视帮助基于变压的神经检索器的有效而无效的切入点。随着虚拟现实(VR)的崛起和现实的增强(AR)的扩大,人类凝视数据将变得更加容易获得。我们希望这项工作成为在现代神经搜索引擎中使用凝视信号进行探索的第一步。