The widespread availability of search API's (both free and commercial) brings the promise of increased coverage and quality of search results for metasearch engines, while decreasing the maintenance costs of the crawling and indexing infrastructures. However, merging strategies frequently comprise complex pipelines that require careful tuning, which is often overlooked in the literature. In this work, we describe NeuralSearchX, a metasearch engine based on a multi-purpose large reranking model to merge results and highlight sentences. Due to the homogeneity of our architecture, we could focus our optimization efforts on a single component. We compare our system with Microsoft's Biomedical Search and show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks. Human evaluation on two domain-specific tasks shows that our retrieval system outperformed Google API by a large margin in terms of nDCG@10 scores. By describing our architecture and implementation in detail, we hope that the community will build on our design choices. The system is available at https://neuralsearchx.nsx.ai.
翻译:广泛提供搜索 API(免费和商业性) 带来了扩大搜索引擎的覆盖面和质量的保证,同时降低了爬动和索引化基础设施的维护成本。然而,合并战略往往包括复杂的管道,需要仔细调整,文献中经常忽视这一点。在这项工作中,我们描述NeuralSearchX,这是一个基于多用途大调整模型的元搜索引擎,可以合并结果并突出句子。由于我们建筑结构的同一性,我们可以把我们的优化努力集中在一个单一组成部分上。我们将我们的系统与微软的生物医学搜索进行比较,并表明我们的设计选择导致一个具有高成本效益的系统,具有竞争性的QPS,同时在广泛的公共基准上接近于最先进的结果。人类对两项具体领域任务的评价表明,我们的检索系统在NDCG@10分方面大大超过Google API。我们希望通过详细描述我们的架构和执行情况,使社区能够以我们的设计选择为基础。这个系统可以在 https://neuralearchex.nsai上查阅。