Existing tools for Question Answering (QA) have challenges that limit their use in practice. They can be complex to set up or integrate with existing infrastructure, do not offer configurable interactive interfaces, and do not cover the full set of subtasks that frequently comprise the QA pipeline (query expansion, retrieval, reading, and explanation/sensemaking). To help address these issues, we introduce NeuralQA - a usable library for QA on large datasets. NeuralQA integrates well with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks. It introduces and implements contextual query expansion (CQE) using a masked language model (MLM) as well as relevant snippets (RelSnip) - a method for condensing large documents into smaller passages that can be speedily processed by a document reader model. Finally, it offers a flexible user interface to support workflows for research explorations (e.g., visualization of gradient-based explanations to support qualitative inspection of model behaviour) and large scale search deployment. Code and documentation for NeuralQA is available as open source on Github (https://github.com/victordibia/neuralqa}{Github).
翻译:问题解答(QA) 现有工具存在限制其实际使用的挑战。 它们可能非常复杂, 无法与现有基础设施建立或整合, 无法提供可配置的互动界面, 无法覆盖常包含 QA 管道的子任务集( 询问扩展、 检索、 阅读和解释/ 感知制作 ) 。 为了帮助解决这些问题, 我们引入神经QA - 一个可用于大数据集的QA 库。 神经QA 与现有基础设施( 例如, 与 Hugging Face 变换器 API 培训的高级搜索实例和阅读模型), 并且为 QA 子任务提供了帮助的默认。 它使用掩码语言模型( MLMM ) 及相关的剪贴( RelSnip) 引入并实施背景查询扩展( CQEEE ), 这是一种将大文档缩略图转换成小版本的方法, 可以通过文件阅读模型快速处理 。 最后, 它提供了一个灵活的用户界面, 支持研究探索工作流程模型( e.g. svisionalizationalalal destrual destrual destrual destration) propmental requidustration as astrual requistration 。