High-dimensional, low sample-size (HDLSS) data problems have been a topic of immense importance for the last couple of decades. There is a vast literature that proposed a wide variety of approaches to deal with this situation, among which variable selection was a compelling idea. On the other hand, a deep neural network has been used to model complicated relationships and interactions among responses and features, which is hard to capture using a linear or an additive model. In this paper, we discuss the current status of variable selection techniques with the neural network models. We show that the stage-wise algorithm with neural network suffers from disadvantages such as the variables entering into the model later may not be consistent. We then propose an ensemble method to achieve better variable selection and prove that it has probability tending to zero that a false variable is selected. Then, we discuss additional regularization to deal with over-fitting and make better regression and classification. We study various statistical properties of our proposed method. Extensive simulations and real data examples are provided to support the theory and methodology.
翻译:在过去几十年中,高维、低样本规模(HDLSS)数据问题一直是一个非常重要的议题。有大量文献提出了处理这种情况的多种方法,其中变量选择是一个令人信服的想法。另一方面,一个深层神经网络被用于模拟反应和特征之间的复杂关系和互动,而使用线性或添加型模型很难捕捉到这种关系和互动。在本文中,我们与神经网络模型讨论了变量选择技术的现状。我们表明神经网络的阶段性算法存在缺点,例如后来进入模型的变量可能不一致。我们随后提出了实现更好的变量选择的共通方法,证明选择错误变量的概率为零。然后,我们讨论了处理过度装配和作出更好的回归和分类的额外规范化。我们研究了我们拟议方法的各种统计特性。提供了广泛的模拟和真实数据实例,以支持理论和方法。