Understanding the current capabilities and risks of AI Scientist systems is essential for ensuring trustworthy and sustainable AI-driven scientific progress while preserving the integrity of the academic ecosystem. To this end, we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist system that mimics the core research workflow of a novice student researcher: Given the baseline paper from the human mentor, it analyzes its limitations, formulates novel hypotheses for improvement, validates them through rigorous experimentation, and writes a paper with the results. Unlike previous approaches that assume full automation or operate on small-scale code, Jr. AI Scientist follows a well-defined research workflow and leverages modern coding agents to handle complex, multi-file implementations, leading to scientifically valuable contributions. For evaluation, we conducted automated assessments using AI Reviewers, author-led evaluations, and submissions to Agents4Science, a venue dedicated to AI-driven scientific contributions. The findings demonstrate that Jr. AI Scientist generates papers receiving higher review scores than existing fully automated systems. Nevertheless, we identify important limitations from both the author evaluation and the Agents4Science reviews, indicating the potential risks of directly applying current AI Scientist systems and key challenges for future research. Finally, we comprehensively report various risks identified during development. We hope these insights will deepen understanding of current progress and risks in AI Scientist development.
翻译:理解AI科学家系统的当前能力与风险,对于确保可信且可持续的AI驱动科学进步、同时维护学术生态系统的完整性至关重要。为此,我们开发了Jr. AI Scientist,一种先进的自主AI科学家系统,它模拟了新手学生研究人员的核心研究流程:在获得人类导师提供的基线论文后,系统分析其局限性,提出改进的新假设,通过严格实验进行验证,并撰写包含结果的论文。与以往假设完全自动化或仅处理小规模代码的方法不同,Jr. AI Scientist遵循明确的研究流程,并利用现代编码代理处理复杂的多文件实现,从而产生具有科学价值的贡献。为进行评估,我们采用了AI评审员的自动评估、作者主导的评估以及向专注于AI驱动科学贡献的会议Agents4Science的投稿。结果表明,Jr. AI Scientist生成的论文获得了比现有全自动化系统更高的评审分数。然而,我们从作者评估和Agents4Science评审中发现了重要局限性,揭示了直接应用当前AI科学家系统的潜在风险以及未来研究的关键挑战。最后,我们全面报告了开发过程中识别的各类风险。希望这些见解能加深对AI科学家发展现状与风险的理解。