Large Language Models (LLMs) like ChatGPT are now widely used in writing and reviewing scientific papers. While this trend accelerates publication growth and reduces human workload, it also introduces serious risks. Papers written or reviewed by LLMs may lack real novelty, contain fabricated or biased results, or mislead downstream research that others depend on. Such issues can damage reputations, waste resources, and even endanger lives when flawed studies influence medical or safety-critical systems. This research explores both the offensive and defensive sides of this growing threat. On the attack side, we demonstrate how an author can inject hidden prompts inside a PDF that secretly guide or "jailbreak" LLM reviewers into giving overly positive feedback and biased acceptance. On the defense side, we propose an "inject-and-detect" strategy for editors, where invisible trigger prompts are embedded into papers; if a review repeats or reacts to these triggers, it reveals that the review was generated by an LLM, not a human. This method turns prompt injections from vulnerability into a verification tool. We outline our design, expected model behaviors, and ethical safeguards for deployment. The goal is to expose how fragile today's peer-review process becomes under LLM influence and how editorial awareness can help restore trust in scientific evaluation.
翻译:以ChatGPT为代表的大型语言模型(LLMs)当前已广泛应用于科学论文的撰写与评审。这一趋势虽加速了论文发表增长并减轻了人工负担,但也带来了严重风险。由LLMs撰写或评审的论文可能缺乏真正的新颖性、包含捏造或有偏见的结果,甚至误导后续依赖该成果的研究。此类问题可能损害学术声誉、浪费资源,当存在缺陷的研究影响医疗或安全关键系统时,甚至会危及生命。本研究探讨了这一日益增长的威胁的攻防两面。在攻击层面,我们展示了作者如何在PDF文件中嵌入隐藏提示,以暗中引导或“越狱”LLM评审者,使其给出过度积极的反馈和有偏向的接收意见。在防御层面,我们为编辑提出了一种“注入-检测”策略:将不可见的触发提示嵌入论文中;若评审报告重复或响应这些触发内容,则表明该评审由LLM生成而非人工完成。此方法将提示注入从系统漏洞转化为验证工具。我们阐述了方案设计、预期模型行为及部署所需的伦理保障措施。本研究旨在揭示当前同行评审流程在LLM影响下的脆弱性,并探讨编辑如何通过提升警觉性以重建科学评估体系的公信力。