Neural networks have been successfully used as classification models yielding state-of-the-art results when trained on a large number of labeled samples. These models, however, are more difficult to train successfully for semi-supervised problems where small amounts of labeled instances are available along with a large number of unlabeled instances. This work explores a new training method for semi-supervised learning that is based on similarity function learning using a Siamese network to obtain a suitable embedding. The learned representations are discriminative in Euclidean space, and hence can be used for labeling unlabeled instances using a nearest-neighbor classifier. Confident predictions of unlabeled instances are used as true labels for retraining the Siamese network on the expanded training set. This process is applied iteratively. We perform an empirical study of this iterative self-training algorithm. For improving unlabeled predictions, local learning with global consistency [22] is also evaluated.
翻译:在对大量标签样本进行培训时,神经网络被成功地用作分类模型,产生最先进的结果。然而,这些模型更难成功地培训解决半监督问题,因为有少量标签实例和大量未贴标签实例存在。这项工作探索了一种半监督学习的新培训方法,这种培训方法的基础是利用Siamese网络学习相似性功能,以获得适当的嵌入。在Euclidean空间,学习到的表述是歧视性的,因此可以用来用近邻分类器标出未贴标签的事例。对未贴标签案例的自信预测被用作在扩大的训练集中重新培训Siamse网络的真实标签。这一过程是迭接的。我们对这种迭代自培训算法进行了实验性研究。为了改进未贴标签的预测,也评估了符合全球一致性的本地学习情况[22]。