Unsupervised clustering on speakers is becoming increasingly important for its potential uses in semi-supervised learning. In reality, we are often presented with enormous amounts of unlabeled data from multi-party meetings and discussions. An effective unsupervised clustering approach would allow us to significantly increase the amount of training data without additional costs for annotations. Recently, methods based on graph convolutional networks (GCN) have received growing attention for unsupervised clustering, as these methods exploit the connectivity patterns between nodes to improve learning performance. In this work, we present a GCN-based approach for semi-supervised learning. Given a pre-trained embedding extractor, a graph convolutional network is trained on the labeled data and clusters unlabeled data with "pseudo-labels". We present a self-correcting training mechanism that iteratively runs the cluster-train-correct process on pseudo-labels. We show that this proposed approach effectively uses unlabeled data and improves speaker recognition accuracy.
翻译:在半监督的学习中,不受监督的发言人群集对于潜在用途越来越重要。在现实中,我们经常收到来自多党会议和讨论的大量无标签数据。有效的不受监督的群集方法将使我们能够大幅增加培训数据的数量,而不会增加附加说明的费用。最近,基于图表卷变网络(GCN)的方法在不受监督的群集方面日益受到越来越多的关注,因为这些方法利用节点之间的连接模式来改善学习绩效。在这项工作中,我们提出了一个基于GCN的半监督学习方法。鉴于事先经过培训的嵌入提取器,一个图形共变网络在标签数据上和带有“假标签”的未标签数据组上接受了培训。我们提出了一个自我修正的培训机制,在伪标签上迭接地运行集群-培训校正程序。我们表明,这一拟议方法有效地使用了无标签数据并提高语音识别的准确性。