ChatGPT(-3.5, -4)生成和人类撰写的文献的日本文体分析之区分 (Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis)

Text-generative artificial intelligence (AI), including ChatGPT, equipped with GPT-3.5 and GPT-4, from OpenAI, has attracted considerable attention worldwide. In this study, first, we compared Japanese stylometric features generated by GPT (-3.5 and -4) and those written by humans. In this work, we performed multi-dimensional scaling (MDS) to confirm the classification of 216 texts into three classes (72 academic papers written by 36 single authors, 72 texts generated by GPT-3.5, and 72 texts generated by GPT-4 on the basis of the titles of the aforementioned papers) focusing on the following stylometric features: (1) bigrams of parts-of-speech, (2) bigram of postpositional particle words, (3) positioning of commas, and (4) rate of function words. MDS revealed distinct distributions at each stylometric feature of GPT (-3.5 and -4) and human. Although GPT-4 is more powerful than GPT-3.5 because it has more parameters, both GPT (-3.5 and -4) distributions are likely to overlap. These results indicate that although the number of parameters may increase in the future, AI-generated texts may not be close to that written by humans in terms of stylometric features. Second, we verified the classification performance of random forest (RF) for two classes (GPT and human) focusing on Japanese stylometric features. This study revealed the high performance of RF in each stylometric feature. Furthermore, the RF classifier focusing on the rate of function words achieved 98.1% accuracy. The RF classifier focusing on all stylometric features reached 100% in terms of all performance indexes (accuracy, recall, precision, and F1 score). This study concluded that at this stage we human discriminate ChatGPT from human limited to Japanese language.

翻译：人工智能（AI）中的文本生成技术，包括OpenAI的ChatGPT，配备了GPT-3.5和GPT-4，引起了全球广泛关注。本研究首先比较了由GPT (-3.5和-4)生成的日本文体特征与人类撰写的文献之间的差异。我们使用了多维缩放（MDS）将216个文本分为三类（36个单一作者撰写的72篇学术论文、72个以上述论文题目为基础通过GPT-3.5生成的文本和72个基于相同论文的标题由GPT-4生成的文本），并聚焦于以下文体特征：（1）词性的双字组合，（2）助词词组的双字组合，（3）逗号的位置和（4）功能词的使用率。MDS揭示了GPT(-3.5和-4)和人类在每个文体特征上分布明显不同。尽管GPT-4比GPT-3.5更强大，因为它具有更多的参数，但两种GPT（-3.5和-4）分布重叠的可能性较大。这些结果表明，虽然将来可能会增加参数数量，但AI生成的文本可能在文体特征方面与人类撰写的文本存在一定差距。其次，我们验证了随机森林（RF）在日本文体特征上对于两类（GPT和人类）的分类性能。该研究揭示了RF在每个文体特征上的高分类性能。此外，以功能词的使用率为焦点的RF分类器达到了98.1％的准确性。以所有文体特征为焦点的RF分类器在所有性能指数（准确性、召回率、精确度和F1分数）方面均达到了100％。该研究得出结论，在这个阶段，我们限于日本语言，可以通过文体特征将ChatGPT与人区分开来。