This work introduces Directed-Evolution (DE) method for sparsification of neural networks, where the relevance of parameters to the network accuracy is directly assessed and the parameters that produce the least effect on accuracy when tentatively zeroed are indeed zeroed. DE method avoids a potentially combinatorial explosion of all possible candidate sets of parameters to be zeroed in large networks by mimicking evolution in the natural world. DE uses a distillation context [5]. In this context, the original network is the teacher and DE evolves the student neural network to the sparsification goal while maintaining minimal divergence between teacher and student. After the desired sparsification level is reached in each layer of the network by DE, a variety of quantization alternatives are used on the surviving parameters to find the lowest number of bits for their representation with acceptable loss of accuracy. A procedure to find optimal distribution of quantization levels in each sparsified layer is presented. Suitable final lossless encoding of the surviving quantized parameters is used for the final parameter representation. DE was used in sample of representative neural networks using MNIST, FashionMNIST and COCO data sets with progressive larger networks. An 80 classes YOLOv3 with more than 60 million parameters network trained on COCO dataset reached 90% sparsification and correctly identifies and segments all objects identified by the original network with more than 80% confidence using 4bit parameter quantization. Compression between 40x and 80x. It has not escaped the authors that techniques from different methods can be nested. Once the best parameter set for sparsification is identified in a cycle of DE, a decision on zeroing only a sub-set of those parameters can be made using a combination of criteria like parameter magnitude and Hessian approximations.
翻译:这项工作引入了神经网络封闭的导向进化(DE)方法, 直接评估参数与网络精确度的相关性, 并且当暂时零化时对精确度影响最小的参数确实为零。 DE 方法避免了大型网络中所有可能的候选参数组合的潜在组合爆炸, 以模拟自然世界的进化。 DE 使用蒸馏背景 [5]。 在这方面, 最初的网络是教师和 DE 将学生神经网络演变成宽化目标, 同时保持教师和学生之间的最小差异。 在DE 到达了网络每一层所期望的对精确度影响最小的参数之后, 将使用各种量化替代物在生存参数上找到其代表的最小位数, 而准确性损失。 DE 原始网络中使用MNIST、 Fashinamiliterisality 和 CO 网络的样本中, 使用经培训的80- Excial commal 数据集, 使用经测试的80- mainversal commex 和 80- main AS AS AS AS ASet AS ASet the the 80- lev 80- hex AS.