ChatGPT is becoming a new reality. In this paper, we demonstrate a method for distinguishing ChatGPT-generated publications from those produced by scientists. The objective of this work is to introduce a newly designed supervised network-driven algorithm that illustrates how to predict machine-generated content. The premise is that ChatGPT content exhibits behavior that is distinctive and can be set apart from scientific articles. The algorithm was trained and tested on three disease-specific publications, with each model constructed from 100 abstracts. Additionally, the algorithm underwent k-Folds calibration (depending on the availability of the data) to establish a lower-upper bound range of acceptance. The network training model of ChatGPT showed a lower number of nodes and a higher number of edges when compared with models of real article abstracts. The algorithm was executed in single-mode to predict the class of one type of dataset at a time and achieved >94%. It was also executed in multi-mode on mixed documents of ChatGPT and PubMed abstracts. The algorithm remarkably predicted real articles with a precision of 100% and, on rare occasions, 96%-98%. However, ChatGPT content was often misclassified as real publications with up to 88% accuracy in all datasets of the three diseases. Our results also showed that the year of publications mixed with ChatGPT-generated content may play a factor in detecting the correct class, where the older the publication, the better the prediction.
翻译:暂无翻译