In this article, we propose using deep learning and transformer architectures combined with classical machine learning algorithms to detect and identify text anomalies in texts. Deep learning model provides a very crucial context information about the textual data which all textual context are converted to a numerical representation. We used multiple machine learning methods such as Sentence Transformers, Auto Encoders, Logistic Regression and Distance calculation methods to predict anomalies. The method are tested on the texts data and we used syntactic data from different source injected into the original text as anomalies or use them as target. Different methods and algorithm are explained in the field of outlier detection and the results of the best technique is presented. These results suggest that our algorithm could potentially reduce false positive rates compared with other anomaly detection methods that we are testing.
翻译:在本篇文章中,我们建议使用深层次的学习和变压器结构,加上古典机器学习算法,以探测和识别文本中的文字异常现象。深层次学习模型提供了非常关键的文字数据背景信息,所有文字背景都转换成数字表示。我们使用了多种机器学习方法,如句式变换器、自动编码器、后勤递减和远程计算方法来预测异常现象。该方法在文本数据上进行了测试,并且我们使用了从不同来源注入原始文本的合成数据,作为异常现象,或者作为目标使用这些数据。在外部检测领域解释了不同的方法和算法,并介绍了最佳技术的结果。这些结果表明,与我们正在测试的其他异常检测方法相比,我们的算法有可能降低假正率。