Tabular biomedical data is often high-dimensional but with a very small number of samples. Although recent work showed that well-regularised simple neural networks could outperform more sophisticated architectures on tabular data, they are still prone to overfitting on tiny datasets with many potentially irrelevant features. To combat these issues, we propose Weight Predictor Network with Feature Selection (WPFS) for learning neural networks from high-dimensional and small sample data by reducing the number of learnable parameters and simultaneously performing feature selection. In addition to the classification network, WPFS uses two small auxiliary networks that together output the weights of the first layer of the classification model. We evaluate on nine real-world biomedical datasets and demonstrate that WPFS outperforms other standard as well as more recent methods typically applied to tabular data. Furthermore, we investigate the proposed feature selection mechanism and show that it improves performance while providing useful insights into the learning task.
翻译:尽管最近的工作表明,正规化的简单神经网络在表层数据上的表现可能比更复杂的结构要好,但它们仍然容易在极小的数据集上过度配置许多潜在不相关的特征。为了解决这些问题,我们提议通过减少可学习参数的数量,同时进行特征选择,从高维和小样本数据中学习神经网络,我们建议通过减少可学习参数的数量,同时进行特征选择,来研究高维和小样本数据。除了分类网络外,WPFS还使用两个小型辅助网络,将分类模型第一层的重量加在一起输出。我们评估了九个真实世界生物医学数据集,并表明WPFS优于其他标准以及通常适用于表格数据的最新方法。此外,我们调查了拟议的特征选择机制,并表明它提高了绩效,同时为学习任务提供了有用的洞察力。