Recent techniques for the task of short text clustering often rely on word embeddings as a transfer learning component. This paper shows that sentence vector representations from Transformers in conjunction with different clustering methods can be successfully applied to address the task. Furthermore, we demonstrate that the algorithm of enhancement of clustering via iterative classification can further improve initial clustering performance with different classifiers, including those based on pre-trained Transformer language models.
翻译:用于短期文本集群任务的最新技术往往依赖文字嵌入作为转移学习的组成部分。本文表明,可成功应用变换器的句子矢量表述以及不同的组群方法来完成这项任务。此外,我们还表明,通过迭代分类增强集群的算法可以进一步改善不同分类器的初步组合性能,包括以经过培训的变换器语言模型为基础的分类器。