Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that not only can effectively utilize unknown similarity between related tasks but is also robust against a fraction of outlier tasks from arbitrary sources. The proposed procedure is shown to achieve minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Finally, we demonstrate the effectiveness of our methods through simulations and a real data analysis. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.
翻译:在许多现实应用中广泛使用未经监督的学习方法。最简单和最重要的未经监督的学习模式之一是高斯混合模式(GMM)。在这项工作中,我们研究了关于GMM的多任务学习问题,目的是在任务之间利用可能与GMM参数结构相类似的多任务学习结构,以获得与单一任务学习相比的更好的学习性能。我们提议了一个基于EM算法的多任务GM学习程序,该算法不仅能够有效利用相关任务之间的未知相似性,而且对来自任意来源的一小部分外部任务也很有力。拟议的程序表明,在广泛的制度下,参数估计错误和过重的错误组合错误都达到最小最佳的趋同率。此外,我们普遍采用办法解决为GMMMM学习的转移问题,因为那里也有类似的理论结果。最后,我们通过模拟和真正的数据分析,展示了我们方法的有效性。据我们所知,这是用理论保证来研究关于GMMM的多任务和转移学习的第一个工作。