Overfitting is defined as the fact that the current model fits a specific data set perfectly, resulting in weakened generalization, and ultimately may affect the accuracy in predicting future data. In this research we used an EHR dataset concerning breast cancer metastasis to study overfitting of deep feedforward Neural Networks (FNNs) prediction models. We included 11 hyperparameters of the deep FNNs models and took an empirical approach to study how each of these hyperparameters was affecting both the prediction performance and overfitting when given a large range of values. We also studied how some of the interesting pairs of hyperparameters were interacting to influence the model performance and overfitting. The 11 hyperparameters we studied include activate function; weight initializer, number of hidden layers, learning rate, momentum, decay, dropout rate, batch size, epochs, L1, and L2. Our results show that most of the single hyperparameters are either negatively or positively corrected with model prediction performance and overfitting. In particular, we found that overfitting overall tends to negatively correlate with learning rate, decay, batch sides, and L2, but tends to positively correlate with momentum, epochs, and L1. According to our results, learning rate, decay, and batch size may have a more significant impact on both overfitting and prediction performance than most of the other hyperparameters, including L1, L2, and dropout rate, which were designed for minimizing overfitting. We also find some interesting interacting pairs of hyperparameters such as learning rate and momentum, learning rate and decay, and batch size and epochs. Keywords: Deep learning, overfitting, prediction, grid search, feedforward neural networks, breast cancer metastasis.
翻译:过度匹配的定义是,当前模型完全适合特定数据集,导致总体衰弱,最终可能影响未来数据预测的准确性。在这项研究中,我们使用了有关乳腺癌转移的EHR数据集,以研究是否过度配置深饲料向神经网络(FNNS)的预测模型。我们包括了11个深FNNS模型的超参数,并采取了一项经验性方法,以研究这些超参数如何对预测性能产生影响,在具备大量价值的情况下,则会过度调整。我们还研究了一些令人感兴趣的超常参数组合是如何相互作用,以影响模型性能和超常。我们研究的11个超常参数包括:激活功能;重量初始化器、隐藏层数、学习率、动力、衰减率、辍学率、批量尺寸、粒子、L1和L2。 我们的结果表明,大多数单超超常参数都因模型预测性能和超常而得到负面或正面的校正的校正。我们发现,总体的超常性能往往与更低的学习率、腐蚀、分量和最深层的网络和L2相比,但相对的机率则会与显著的学习率、更接近于学习速度。