The problems of estimating the similarity index of mathematical and other scientific publications containing equations and formulas are discussed for the first time. It is shown that the presence of equations and formulas (as well as figures, drawings, and tables) is a complicating factor that significantly complicates the study of such texts. It is shown that the method for determining the similarity index of publications, based on taking into account individual mathematical symbols and parts of equations and formulas, is ineffective and can lead to erroneous and even completely absurd conclusions. The possibilities of the most popular software system iThenticate, currently used in scientific journals, are investigated for detecting plagiarism and self-plagiarism. The results of processing by the iThenticate system of specific examples and special test problems containing equations (PDEs and ODEs), exact solutions, and some formulas are presented. It has been established that this software system when analyzing inhomogeneous texts, is often unable to distinguish self-plagiarism from pseudo-self-plagiarism (false self-plagiarism). A model complex situation is considered, in which the identification of self-plagiarism requires the involvement of highly qualified specialists of a narrow profile. Various ways to improve the work of software systems for comparing inhomogeneous texts are proposed. This article will be useful to researchers and university teachers in mathematics, physics, and engineering sciences, programmers dealing with problems in image recognition and research topics of digital image processing, as well as a wide range of readers who are interested in issues of plagiarism and self-plagiarism.
翻译:首次讨论了估算含有等式和公式的数学和其他科学出版物的相似性指数的问题; 第一次讨论了估算含有等式和公式的数学和其他科学出版物的相似性指数的问题; 显示方程式和公式(以及数字、图画和表格)的存在是一个复杂因素,使这些文本的研究变得相当复杂; 显示确定出版物的相似性指数的方法是无效的,而且可能导致错误甚至完全荒谬的结论; 正在科学期刊中使用的最受欢迎的软件系统即时读系统的可能性正在调查,以发现典型和自成一体的公式和公式(以及数字、图画和表格)的存在。 由包含等式(PDEs和ODs)、精确的解决方案和一些公式的自成份系统处理具体实例和特殊测试问题的自导系统处理结果是一个复杂的因素; 已经确定这种软件系统在分析异质文本时,往往无法区分自成的自成品和伪自成品的自成品(自成品的自成品的大学的自成品的自成品的自成品的自成品的自成品的自成品的自成品的自成品的自成品的自成品的自成品的自成品的物理学的、在物理学的物理学的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的研过程中,这种复杂的研究过程的模拟的模拟的模拟的模拟的研过程的研过程的模拟的研过程的研过程的研过程的研的研过程的研过程的、在的研的研过程的研的研的研的研的研的研的研磨。