Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including 1) the lack of a clear understanding of the differences between machine-generated and human-written scientific text, 2) the poor generalization performance of existing methods caused by out-of-distribution issues, and 3) the limited support for human-machine collaboration with sufficient interpretability during the detection process. In this paper, we first identify the critical distinctions between machine-generated and human-written scientific text through a quantitative experiment. Then, we propose a mixed-initiative workflow that combines human experts' prior knowledge with machine intelligence, along with a visual analytics prototype to facilitate efficient and trustworthy scientific text detection. Finally, we demonstrate the effectiveness of our approach through two case studies and a controlled user study with proficient researchers. We also provide design implications for interactive artificial text detection tools in high-stakes decision-making scenarios.
翻译:大型语言模型(LLMs)因其出色的生成人类化文本的能力而在各个领域得到了广泛的应用。它们的潜在滥用引起了有关学术抄袭的社会关切。然而,有效的人工科学文本检测是一项非常重要的任务,由于存在多种挑战,包括:1)缺乏清晰理解机器生成和人类撰写的科学文本之间的区别;2)现有方法的一般化效果差,由于出现了分布外的问题;3)检测过程中对于援助人机合作具有充分解释性的支持有限。在本文中,我们首先通过定量实验确定了机器生成和人类撰写的科学文本之间的关键区别。然后,我们提出了一种混合主动技术工作流,将人类专家的先验知识与机器智能相结合,以及一个视觉分析原型,以促进高效且可信的科学文本检测。最后,我们通过两个案例研究和一项针对熟练研究人员的控制用户研究证明了我们方法的有效性。我们还提供了高风险决策场景下交互式人工文本检测工具的设计启示。