The rapid growth of scientific publications has made it increasingly difficult to keep literature reviews comprehensive and up-to-date. Though prior work has focused on automating retrieval and screening, the writing phase of systematic reviews remains largely under-explored, especially with regard to readability and factual accuracy. To address this, we present LiRA (Literature Review Agents), a multi-agent collaborative workflow which emulates the human literature review process. LiRA utilizes specialized agents for content outlining, subsection writing, editing, and reviewing, producing cohesive and comprehensive review articles. Evaluated on SciReviewGen and a proprietary ScienceDirect dataset, LiRA outperforms current baselines such as AutoSurvey and MASS-Survey in writing and citation quality, while maintaining competitive similarity to human-written reviews. We further evaluate LiRA in real-world scenarios using document retrieval and assess its robustness to reviewer model variation. Our findings highlight the potential of agentic LLM workflows, even without domain-specific tuning, to improve the reliability and usability of automated scientific writing.
翻译:科学文献的快速增长使得保持文献综述的全面性和时效性日益困难。尽管先前的研究集中于自动化检索与筛选,但系统综述的撰写阶段在很大程度上仍未得到充分探索,尤其是在可读性和事实准确性方面。为解决此问题,我们提出了LiRA(文献综述智能体),一种模拟人类文献综述过程的多智能体协同工作流。LiRA利用专门设计的智能体进行内容大纲制定、子章节撰写、编辑和审阅,从而生成连贯且全面的综述文章。在SciReviewGen和一个专有的ScienceDirect数据集上的评估表明,LiRA在写作和引用质量方面优于当前基线方法(如AutoSurvey和MASS-Survey),同时保持了与人工撰写综述相当的相似性。我们进一步通过文档检索在真实场景中评估LiRA,并考察其对审阅模型变化的鲁棒性。我们的研究结果凸显了智能体化大语言模型工作流的潜力,即使无需领域特定调优,也能提升自动化科学写作的可靠性和可用性。