The rapid growth of scientific publications has made it increasingly difficult to keep literature reviews comprehensive and up-to-date. Though prior work has focused on automating retrieval and screening, the writing phase of systematic reviews remains largely under-explored, especially with regard to readability and factual accuracy. To address this, we present LiRA (Literature Review Agents), a multi-agent collaborative workflow which emulates the human literature review process. LiRA utilizes specialized agents for content outlining, subsection writing, editing, and reviewing, producing cohesive and comprehensive review articles. Evaluated on SciReviewGen and a proprietary ScienceDirect dataset, LiRA outperforms current baselines such as AutoSurvey and MASS-Survey in writing and citation quality, while maintaining competitive similarity to human-written reviews. We further evaluate LiRA in real-world scenarios using document retrieval and assess its robustness to reviewer model variation. Our findings highlight the potential of agentic LLM workflows, even without domain-specific tuning, to improve the reliability and usability of automated scientific writing.
翻译:科学文献的快速增长使得保持文献综述的全面性和时效性日益困难。尽管先前的研究主要集中在自动化检索与筛选上,但系统综述的撰写阶段在很大程度上仍未得到充分探索,尤其是在可读性和事实准确性方面。为此,我们提出了LiRA(文献综述智能体),这是一种模拟人类文献综述过程的多智能体协同工作流程。LiRA利用专门设计的智能体进行内容提纲拟定、子章节撰写、编辑和审阅,从而生成连贯且全面的综述文章。在SciReviewGen数据集及专有的ScienceDirect数据集上的评估表明,LiRA在写作质量和引用质量方面均优于当前基线方法(如AutoSurvey和MASS-Survey),同时保持了与人工撰写综述相当的相似性。我们进一步通过文档检索在真实场景中评估了LiRA,并考察了其对审阅模型变化的鲁棒性。我们的研究结果凸显了智能体化大语言模型工作流程的潜力,即使无需领域特定调优,也能提升自动化科学写作的可靠性与可用性。