In this work, we introduce Y-Drop, a regularization method that biases the dropout algorithm towards dropping more important neurons with higher probability. The backbone of our approach is neuron conductance, an interpretable measure of neuron importance that calculates the contribution of each neuron towards the end-to-end mapping of the network. We investigate the impact of the uniform dropout selection criterion on performance by assigning higher dropout probability to the more important units. We show that forcing the network to solve the task at hand in the absence of its important units yields a strong regularization effect. Further analysis indicates that Y-Drop yields solutions where more neurons are important, i.e have high conductance, and yields robust networks. In our experiments we show that the regularization effect of Y-Drop scales better than vanilla dropout w.r.t. the architecture size and consistently yields superior performance over multiple datasets and architecture combinations, with little tuning.
翻译:暂无翻译