Probabilistic linear discriminant analysis (PLDA) has broad application in open-set verification tasks, such as speaker verification. A key concern for PLDA is that the model is too simple (linear Gaussian) to deal with complicated data; however, the simplicity by itself is a major advantage of PLDA, as it leads to desirable generalization. An interesting research therefore is how to improve modeling capacity of PLDA while retaining the simplicity. This paper presents a decoupling approach, which involves a global model that is simple and generalizable, and a local model that is complex and expressive. While the global model holds a bird view on the entire data, the local model represents the details of individual classes. We conduct a preliminary study towards this direction and investigate a simple decoupling model including both the global and local models. The new model, which we call decoupled PLDA, is tested on a speaker verification task. Experimental results show that it consistently outperforms the vanilla PLDA when the model is based on raw speaker vectors. However, when the speaker vectors are processed by length normalization, the advantage of decoupled PLDA will be largely lost, suggesting future research on non-linear local models.
翻译:PLDA的主要关切是,该模型过于简单(线性高斯),无法处理复杂的数据;然而,简单本身是PLDA的主要优势,因为它导致理想的概括化。因此,一项有趣的研究是如何提高PLDA的建模能力,同时保留简单性。本文介绍了一种脱钩方法,它涉及一个简单和可普遍适用的全球模型,以及一个复杂和表达性的本地模型。虽然全球模型对整个数据持有鸟类观点,但当地模型代表了各个班级的细节。我们对这一方向进行初步研究,并调查一个简单的脱钩模型,包括全球模型和地方模型。我们称之为脱钩式PLDA的新模型将用一个发言者的核查任务进行测试。实验结果显示,当该模型以原始演讲者矢量为基础时,它始终超越了Vanilla PLDA。然而,当扬声器的矢量不是按长度正常化处理时,则基本上会丧失了DPLDA的优势。