Decisions in agriculture are increasingly data-driven; however, valuable agricultural knowledge is often locked away in free-text reports, manuals and journal articles. Specialised search systems are needed that can mine agricultural information to provide relevant answers to users' questions. This paper presents AgAsk -- an agent able to answer natural language agriculture questions by mining scientific documents. We carefully survey and analyse farmers' information needs. On the basis of these needs we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question. We implement and evaluate a number of information retrieval models to answer farmers questions, including two state-of-the-art neural ranking models. We show that neural rankers are highly effective at matching passages to questions in this context. Finally, we propose a deployment architecture for AgAsk that includes a client based on the Telegram messaging platform and retrieval model deployed on commodity hardware. The test collection we provide is intended to stimulate more research in methods to match natural language to answers in scientific documents. While the retrieval models were evaluated in the agriculture domain, they are generalisable and of interest to others working on similar problems. The test collection is available at: \url{https://github.com/ielab/agvaluate}.
翻译:农业决策越来越多地以数据为驱动力;然而,宝贵的农业知识往往被封存在自由文本报告、手册和期刊文章中;需要专门搜索系统,能够对农业信息进行排雷,以便对用户的问题提供相关答案;本文介绍AgAsk -- -- 一个能够通过采矿科学文件回答自然语言农业问题的代理物 -- -- AgAsk -- -- 一个能够通过采矿科学文件回答自然语言农业问题的代理物;我们仔细调查和分析农民的信息需求;根据这些需求,我们发布信息检索测试集,包括真实问题、大量各行各业的科学文件的收集,以及表明哪些段落与每个问题相关的地面事实相关性评估。我们实施和评估一些信息检索模型,以回答农民的问题,包括两个最先进的神经神经等级模型。我们表明神经级定级器在匹配这方面问题的通道方面非常有效。最后,我们提议为AgAgAsk建立一个部署架构,其中包括一个基于Telegram消息传送平台的客户和在商品硬件上安装的检索模型。我们提供的测试集集意在鼓励进行更多的研究,以与科学文件中的自然语言相匹配的方法。我们在农业领域对检索模型进行评价时,这些模型是普遍适用的。