Despite the growing adoption of large language models (LLMs) in academic workflows, their capabilities remain limited to support high-quality scientific writing. Most existing systems are designed for general-purpose scientific text generation and fail to meet the sophisticated demands of research communication beyond surface-level polishing, such as conceptual coherence across sections. Furthermore, academic writing is inherently iterative and revision-driven, a process not well supported by direct prompting-based paradigms. To address these scenarios, we propose a human-AI collaboration framework for academic paper revision centered on criteria-guided intent alignment and context-aware modeling. To validate the framework, we curate a dataset of 7,000 research papers from top-tier venues annotated with 140,000 instruction-response pairs that reflect realistic, section-level scientific revisions. We instantiate the framework in XtraGPT, the first suite of open-source LLMs (1.5B to 14B parameters) for context-aware, instruction-guided writing assistance. Extensive experiments validate that XtraGPT significantly outperforms same-scale baselines and approaches the quality of proprietary systems. Both automated preference assessments and human evaluations confirm the effectiveness of XtraGPT in improving scientific drafts.
翻译:尽管大型语言模型(LLMs)在学术工作流程中的应用日益广泛,但其支持高质量科学写作的能力仍然有限。现有系统大多面向通用科学文本生成,难以满足研究传播中超越表层润饰的复杂需求,例如跨章节的概念连贯性。此外,学术写作本质上是迭代且以修订驱动的过程,而基于直接提示的范式对此支持不足。为应对这些挑战,我们提出了一种人机协作的学术论文修订框架,其核心在于准则引导的意图对齐与上下文感知建模。为验证该框架,我们构建了一个包含7,000篇顶级会议研究论文的数据集,其中标注了14万条反映真实章节级科学修订需求的指令-响应对。我们将该框架实例化为XtraGPT——首个面向上下文感知、指令引导写作辅助的开源LLM套件(参数量1.5B至14B)。大量实验表明,XtraGPT显著优于同规模基线模型,并接近专有系统的质量。自动化偏好评估与人工评估均证实了XtraGPT在改进科学草稿方面的有效性。