Sentence encoders have indeed been shown to achieve superior performances for many downstream text-mining tasks and, thus, claimed to be fairly general. Inspired by this, we performed a detailed study on how to leverage these sentence encoders for the "zero-shot topic inference" task, where the topics are defined/provided by the users in real-time. Extensive experiments on seven different datasets demonstrate that Sentence-BERT demonstrates superior generality compared to other encoders, while Universal Sentence Encoder can be preferred when efficiency is a top priority.
翻译:句子编码器已被证明在许多下游文本挖掘任务中具有优越性能,并且因此被认为具有相当的通用性。受此启发,我们对如何利用这些句子编码器进行“零样本主题推理”任务进行了详细研究,其中实时由用户定义/提供主题。对七个不同数据集的广泛实验表明,与其他编码器相比,Sentence-BERT表现出更高的普适性,而通用句子编码器可以在效率是最重要的因素时被优选。