The search for research datasets is as important as laborious. Due to the importance of the choice of research data in further research, this decision must be made carefully. Additionally, because of the growing amounts of data in almost all areas, research data is already a central artifact in empirical sciences. Consequentially, research dataset recommendations can beneficially supplement scientific publication searches. We formulated the recommendation task as a retrieval problem by focussing on broad similarities between research datasets and scientific publications. In a multistage approach, initial recommendations were retrieved by the BM25 ranking function and dynamic queries. Subsequently, the initial ranking was re-ranked utilizing click feedback and document embeddings. The proposed system was evaluated live on real user interaction data using the STELLA infrastructure in the LiLAS Lab at CLEF 2021. Our experimental system could efficiently be fine-tuned before the live evaluation by pre-testing the system with a pseudo test collection based on prior user interaction data from the live system. The results indicate that the experimental system outperforms the other participating systems.
翻译:搜索研究数据集同样重要。 由于在进一步研究中选择研究数据的重要性,必须谨慎地做出这一决定。此外,由于几乎所有领域的数据数量越来越多,研究数据已经是经验科学中的一项核心文物。因此,研究数据集的建议可以有益地补充科学出版物搜索。我们把建议任务作为一个检索问题来拟订,方法是侧重于研究数据集和科学出版物之间的广泛相似之处。在多阶段办法中,初步建议由BM25排名函数和动态查询检索。随后,利用点击反馈和文件嵌入对初始排名进行了重新排序。拟议系统是在实际用户互动数据的基础上使用2021年CLEF的LALAS实验室的STELLA基础设施进行现场评估的。我们的实验系统可以在现场评估之前进行精细调整,先先先测试系统,然后根据现场系统的用户互动数据进行伪测试。结果显示实验系统比其他参与系统要强。