Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large neural topic models have a considerable memory footprint. In this paper, we propose a knowledge distillation framework to compress a contextualized topic model without loss in topic quality. In particular, the proposed distillation objective is to minimize the cross-entropy of the soft labels produced by the teacher and the student models, as well as to minimize the squared 2-Wasserstein distance between the latent distributions learned by the two models. Experiments on two publicly available datasets show that the student trained with knowledge distillation achieves topic coherence much higher than that of the original student model, and even surpasses the teacher while containing far fewer parameters than the teacher's. The distilled model also outperforms several other competitive topic models on topic coherence.
翻译:话题建模是探索网络和数字图书馆中文档集合的主要方法。近期的话题建模方法使用预训练的上下文化语言模型和变分自编码器。然而,大型神经话题模型具有相当大的内存占用。本文提出了一种知识蒸馏框架,用于压缩上下文化话题模型而不会损失主题质量。具体而言,所提出的蒸馏目标是最小化老师和学生模型产生的软标签的交叉熵,以及最小化两个模型学习的潜在分布之间的平方2-Wasserstein距离。在两个公开可用数据集上的实验表明,使用知识蒸馏训练的学生比原始的学生模型具有更高的主题连贯度,甚至优于老师,而包含的参数比老师的参数少得多。与其他几种竞争的话题模型相比,蒸馏模型也表现出更好的主题连贯度。