We present Spacerini, a modular framework for seamless building and deployment of interactive search applications, designed to facilitate the qualitative analysis of large scale research datasets. Spacerini integrates features from both the Pyserini toolkit and the Hugging Face ecosystem to ease the indexing text collections and deploy them as search engines for ad-hoc exploration and to make the retrieval of relevant data points quick and efficient. The user-friendly interface enables searching through massive datasets in a no-code fashion, making Spacerini broadly accessible to anyone looking to qualitatively audit their text collections. This is useful both to IR~researchers aiming to demonstrate the capabilities of their indexes in a simple and interactive way, and to NLP~researchers looking to better understand and audit the failure modes of large language models. The framework is open source and available on GitHub: https://github.com/castorini/hf-spacerini, and includes utilities to load, pre-process, index, and deploy local and web search applications. A portfolio of applications created with Spacerini for a multitude of use cases can be found by visiting https://hf.co/spacerini.
翻译:我们介绍了Spacerini,这是无缝建造和部署交互式搜索应用程序的模块框架,目的是便利对大规模研究数据集进行定性分析。Spacerini将Pyserini工具包和Hugging Face生态系统的特征综合起来,以便于对文本进行索引收集,并将其作为临时探索的搜索引擎,并迅速有效地检索相关数据点。方便用户的界面能够以无编码的方式通过大规模数据集进行搜索,使希望对文本收藏进行定性审计的任何人能够广泛使用Spacerini。这对于旨在以简单互动方式展示其索引能力的IR~Researchers以及希望更好地了解和审计大语言模型失败模式的NLP~Researchers都非常有用。这个框架是开放的,可在GitHub上查阅:https://github.com/castorini/hf-spacerini, 包括装货、预处理、索引和部署本地及网络搜索应用程序的公用事业。通过访问https://fcoi可以找到与Spacerini所创建的多种使用案例的应用组合。</s>