GCWSNet:神经网络可扩缩和准确培训普遍一致的加权一致抽样 (GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks)

We develop the "generalized consistent weighted sampling" (GCWS) for hashing the "powered-GMM" (pGMM) kernel (with a tuning parameter $p$). It turns out that GCWS provides a numerically stable scheme for applying power transformation on the original data, regardless of the magnitude of $p$ and the data. The power transformation is often effective for boosting the performance, in many cases considerably so. We feed the hashed data to neural networks on a variety of public classification datasets and name our method ``GCWSNet''. Our extensive experiments show that GCWSNet often improves the classification accuracy. Furthermore, it is evident from the experiments that GCWSNet converges substantially faster. In fact, GCWS often reaches a reasonable accuracy with merely (less than) one epoch of the training process. This property is much desired because many applications, such as advertisement click-through rate (CTR) prediction models, or data streams (i.e., data seen only once), often train just one epoch. Another beneficial side effect is that the computations of the first layer of the neural networks become additions instead of multiplications because the input data become binary (and highly sparse). Empirical comparisons with (normalized) random Fourier features (NRFF) are provided. We also propose to reduce the model size of GCWSNet by count-sketch and develop the theory for analyzing the impact of using count-sketch on the accuracy of GCWS. Our analysis shows that an ``8-bit'' strategy should work well in that we can always apply an 8-bit count-sketch hashing on the output of GCWS hashing without hurting the accuracy much. There are many other ways to take advantage of GCWS when training deep neural networks. For example, one can apply GCWS on the outputs of the last layer to boost the accuracy of trained deep neural networks.

翻译：我们开发了“ 通用一致加权抽样 ” (GCWS), 用于“ pGMMM ” ( pGMM) 内核( 调值参数 $ p$ ) 。事实证明, GCWS 提供了对原始数据应用电源转换的数值稳定的计划, 不论美元和数据的数额大小如何。电源转换通常对提高性能非常有效, 在许多情况中, 这对很多情况都是如此。我们用各种公共分类数据集向神经网络输入散装数据并命名我们的方法“ GCWSNet ” 。我们的广泛实验显示, GCWSNet经常提高分类的准确性。此外, 从实验中可以明显看出, GCSWSNet 的精确性能会更快地融合。事实上, GCSWS 通常仅仅( 不到) 培训过程的一个小节点, 因为许多应用程序, 如广告点击率的预测率, 或数流( 仅看到一次), 通常只训练一个小点。另一个有利方面的效果是, 我们的计算结果的精确性战略分析网络的精确度分析方法,, 也变得更接近于。。。以高层次的精确的。。