With the help of online tools, unscrupulous authors can today generate a pseudo-scientific article and attempt to publish it. Some of these tools work by replacing or paraphrasing existing texts to produce new content, but they have a tendency to generate nonsensical expressions. A recent study introduced the concept of 'tortured phrase', an unexpected odd phrase that appears instead of the fixed expression. E.g. counterfeit consciousness instead of artificial intelligence. The present study aims at investigating how tortured phrases, that are not yet listed, can be detected automatically. We conducted several experiments, including non-neural binary classification, neural binary classification and cosine similarity comparison of the phrase tokens, yielding noticeable results.
翻译:在网上工具的帮助下,肆无忌惮的作者今天可以产生一个伪科学文章,并试图出版。其中一些工具通过替换或引用现有文本来产生新内容而发挥作用,但他们倾向于产生非感官的表达方式。最近的一项研究引入了“扭曲的词组”的概念,这是一个出乎意料的奇特的词组,而不是固定的表达方式。例如,假冒意识而不是人工智能。本研究旨在调查如何自动检测尚未列出的酷刑词组。我们进行了几项实验,包括非神经二进制分类、神经二进制分类和对词组符号的类似性比较,产生了显著的结果。