Performance of trained neural network (NN) models, in terms of testing accuracy, has improved remarkably over the past several years, especially with the advent of deep learning. However, even the most accurate NNs can be biased toward a specific output classification due to the inherent bias in the available training datasets, which may propagate to the real-world implementations. This paper deals with the robustness bias, i.e., the bias exhibited by the trained NN by having a significantly large robustness to noise for a certain output class, as compared to the remaining output classes. The bias is shown to result from imbalanced datasets, i.e., the datasets where all output classes are not equally represented. Towards this, we propose the UnbiasedNets framework, which leverages K-means clustering and the NN's noise tolerance to diversify the given training dataset, even from relatively smaller datasets. This generates balanced datasets and reduces the bias within the datasets themselves. To the best of our knowledge, this is the first framework catering to the robustness bias problem in NNs. We use real-world datasets to demonstrate the efficacy of the UnbiasedNets for data diversification, in case of both binary and multi-label classifiers. The results are compared to well-known tools aimed at generating balanced datasets, and illustrate how existing works have limited success while addressing the robustness bias. In contrast, UnbiasedNets provides a notable improvement over existing works, while even reducing the robustness bias significantly in some cases, as observed by comparing the NNs trained on the diversified and original datasets.
翻译:过去几年来,经过培训的神经网络模型在测试准确性方面的性能明显改善,特别是随着深层次学习的到来。然而,由于现有培训数据集的固有偏差,即使最准确的NNP也可能会偏向于具体的产出分类,因为现有的培训数据集可能会传播到现实世界的实施工作中。本文涉及的是稳性偏差,即经过培训的NNN的偏差,与其余的产出类相比,对某一产出类的噪音具有相当大的稳健性,与其余产出类的偏向性相比。这种偏差表现表现于不平衡的数据集,即所有产出类均不具有同等代表性的数据集。为此,我们建议 UnbiasedNets 框架,利用 K- means 群集和NNW 的噪声容忍度使所提供的培训数据集多样化,甚至从相对较小的数据集中产生均衡的数据集,并减少数据集本身内部的偏向性偏差。我们最了解的是,这是第一个框架,满足了NNF的稳性偏差性偏差问题,即所有产出类均匀性数据集均不具有同等性,我们所观察到的轨道上所观察到的数据集的稳性。我们使用的是,在实际-NNet上所训练的精准数据模拟中,同时用的是,在模拟数据的精准性,在模拟中,我们用的是真实性,在模拟中,在模拟数据模拟中则以较优劣性,在模拟数据模拟中则以较优性地展示性地展示性,在模拟中,而使数据流数据流数据流数据流数据生成性,而使数据流数据流数据生成性能为结果为结果为制的结果是用来使数据流。我们比较的结果。在模拟性,在模拟性,在模拟性能和多义性,在模拟的精确性,在模拟性,在模拟的精确性,在模拟性,在模拟性在模拟性在模拟中,在模拟中,在模拟中,在模拟中,在模拟性在模拟中,在模拟中,在模拟中,在模拟中,在模拟中,在模拟中,在模拟中,在模拟中,在模拟中,在模拟中,在模拟中,在模拟中则以较慢性能和多级中则在模拟性能性能性能性能性能性能性能性能性能和</s>