Deep convolutional neural network (CNN) training via iterative optimization has had incredible success in finding optimal parameters. However, modern CNN architectures often contain millions of parameters. Thus, any given model for a single architecture resides in a massive parameter space. Models with similar loss could have drastically different characteristics such as adversarial robustness, generalizability, and quantization robustness. For deep learning on the edge, quantization robustness is often crucial. Finding a model that is quantization-robust can sometimes require significant efforts. Recent works using Graph Hypernetworks (GHN) have shown remarkable performance predicting high-performant parameters of varying CNN architectures. Inspired by these successes, we wonder if the graph representations of GHN-2 can be leveraged to predict quantization-robust parameters as well, which we call GHN-Q. We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures. We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs. Decent quantized accuracies are observed even with 4-bit quantization despite GHN-Q not being trained on it. Quantized finetuning of GHN-Q at lower bitwidths may bring further improvements and is currently being explored.
翻译:通过迭代优化进行的深革命神经网络(CNN)培训在寻找最佳参数方面取得了令人难以置信的成功。然而,现代CNN架构往往包含数百万参数。因此,任何一个单一架构的模型都存在于一个巨大的参数空间中。类似损失的模型可能具有截然不同的特性,例如对抗性强度、通用性和量化强度。在边缘深层学习时,量化强度往往至关重要。找到一个四分化-紫外线网络模型有时需要做出重大努力。最近使用Greab Hynetwork(GHN)的工程显示,预测不同CNN架构高性能参数的显著性能。受这些成功启发,我们怀疑GHN-2的图形表示是否能够被利用来预测四分化-紫外线参数,我们称之为GHN-Q。我们进行首次研究,探索如何使用图形超网络来预测隐蔽的CNN架构参数。我们关注CNN搜索空间的减少,并发现GHNQ的改进事实上可以预测不同CNN结构架构的高分化度参数。受这些成功启发,我们怀疑GHN-2的图形显示,在各种GH-QQ级平级平级的平面研究中,目前可能没有对QQQ。