In some machine learning applications the availability of labeled instances for supervised classification is limited while unlabeled instances are abundant. Semi-supervised learning algorithms deal with these scenarios and attempt to exploit the information contained in the unlabeled examples. In this paper, we address the question of how to evolve neural networks for semi-supervised problems. We introduce neuroevolutionary approaches that exploit unlabeled instances by using neuron coverage metrics computed on the neural network architecture encoded by each candidate solution. Neuron coverage metrics resemble code coverage metrics used to test software, but are oriented to quantify how the different neural network components are covered by test instances. In our neuroevolutionary approach, we define fitness functions that combine classification accuracy computed on labeled examples and neuron coverage metrics evaluated using unlabeled examples. We assess the impact of these functions on semi-supervised problems with a varying amount of labeled instances. Our results show that the use of neuron coverage metrics helps neuroevolution to become less sensitive to the scarcity of labeled data, and can lead in some cases to a more robust generalization of the learned classifiers.
翻译:在某些机器学习应用中,有受监督分类的标签实例的可用性有限,而无标签实例则很多。半受监督的学习算法处理这些情况,并试图利用无标签实例中所含的信息。在本文件中,我们讨论了如何为半受监督问题发展神经网络的问题。我们采用神经进化方法,利用每个候选解决方案编码的神经网络结构中计算出的神经覆盖度指标,利用无标签实例。神经进化覆盖率指标类似于测试软件所用的代码覆盖度指标,但旨在量化测试实例中不同神经网络组成部分的涵盖范围。在神经进化方法中,我们界定健康功能,将根据标签实例和使用无标签实例评估的神经覆盖度指标计算出的分类准确度和神经覆盖度指标结合起来。我们评估这些功能对半受监督问题的影响,使用不同数量标签实例。我们的结果显示,神经进化指标有助于神经进化对标签数据稀缺性数据变得不那么敏感,在某些情况下可以导致对所学的分类者进行更加有力的概括。</s>