In recent years, the rapid increase in academic publications across various fields has posed severe challenges for academic paper analysis: scientists struggle to timely and comprehensively track the latest research findings and methodologies. Key concept extraction has proven to be an effective analytical paradigm, and its automation has been achieved with the widespread application of language models in industrial and scientific domains. However, existing paper databases are mostly limited to similarity matching and basic classification of key concepts, failing to deeply explore the relational networks between concepts. This paper is based on the OpenAlex opensource knowledge graph. By analyzing nearly 8,000 open-source paper data from Novosibirsk State University, we discovered a strong correlation between the distribution patterns of paper key concept paths and both innovation points and rare paths. We propose a prompt engineering-based key concept path analysis method. This method leverages small language models to achieve precise key concept extraction and innovation point identification, and constructs an agent based on a knowledge graph constraint mechanism to enhance analysis accuracy. Through fine-tuning of the Qwen and DeepSeek models, we achieved significant improvements in accuracy, with the models publicly available on the Hugging Face platform.
翻译:近年来,各领域学术文献的快速增长为学术论文分析带来了严峻挑战:科研人员难以及时、全面地追踪最新研究成果与方法论。关键概念提取已被证明是一种有效的分析范式,随着语言模型在工业与科学领域的广泛应用,其自动化已得以实现。然而,现有论文数据库大多局限于关键概念的相似性匹配与基础分类,未能深入挖掘概念间的关联网络。本文基于OpenAlex开源知识图谱,通过分析来自新西伯利亚国立大学的近8000篇开源论文数据,发现论文关键概念路径的分布模式与创新点及稀有路径均存在强相关性。我们提出了一种基于提示工程的关键概念路径分析方法。该方法利用小型语言模型实现精确的关键概念提取与创新点识别,并构建基于知识图谱约束机制的智能体以提升分析准确性。通过对Qwen与DeepSeek模型进行微调,我们在准确率上取得了显著提升,相关模型已在Hugging Face平台公开。