Semi-supervised learning is a model training method that uses both labeled and unlabeled data. This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any multi-category classification problem. We assume the labels are missing at random when using unlabeled data in a semi-supervised setting. Suppose we have $K$ classes in the data. We assume that the observations follow $K$ multivariate normal distributions depending on their true class labels after some common unknown transformation is applied to each component of the observation vector. The function is expanded in a B-splines series, and a prior is added to the coefficients. We consider a normal prior on the coefficients and constrain the values to meet the normality and identifiability constraints requirement. The precision matrices of the Gaussian distributions are given a conjugate Wishart prior, while the means are given the improper uniform prior. The resulting posterior is still conditionally conjugate, and the Gibbs sampler aided by a data-augmentation technique can thus be adopted. An extensive simulation study compares the proposed method with several other available methods. The proposed method is also applied to real datasets on diagnosing breast cancer and classification of signals. We conclude that the proposed method has a better prediction accuracy in various cases.
翻译:暂无翻译