Retrieval-Augmented Generation (RAG) systems combine Large Language Models (LLMs) with external knowledge, and their performance depends heavily on how that knowledge is represented. This study investigates how different Knowledge Graph (KG) construction strategies influence RAG performance. We compare a variety of approaches: standard vector-based RAG, GraphRAG, and retrieval over KGs built from ontologies derived either from relational databases or textual corpora. Results show that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art frameworks, substantially outperforming vector retrieval baselines. Moreover, the findings reveal that ontology-guided KGs built from relational databases perform competitively to ones built with ontologies extracted from text, with the benefit of offering a dual advantage: they require a one-time-only ontology learning process, substantially reducing LLM usage costs; and avoid the complexity of ontology merging inherent to text-based approaches.
翻译:检索增强生成(RAG)系统将大型语言模型(LLM)与外部知识相结合,其性能在很大程度上取决于知识的表示方式。本研究探讨了不同的知识图谱(KG)构建策略如何影响RAG性能。我们比较了多种方法:基于向量的标准RAG、GraphRAG,以及基于从关系数据库或文本语料库衍生的本体构建的知识图谱进行检索。结果表明,结合文本块信息的本体引导知识图谱能够与最先进的框架实现竞争性性能,显著优于向量检索基线。此外,研究发现基于关系数据库构建的本体引导知识图谱与基于文本提取本体构建的知识图谱性能相当,并具有双重优势:仅需一次性的本体学习过程,大幅降低了LLM使用成本;同时避免了基于文本方法中固有的本体合并复杂性。