Understanding the training dynamics of deep neural networks (DNNs) is important as it can lead to improved training efficiency and task performance. Recent works have demonstrated that representing the wirings of static graph cannot capture how DNNs change over the course of training. Thus, in this work, we propose a compact, expressive temporal graph framework that effectively captures the dynamics of many workhorse architectures in computer vision. Specifically, it extracts an informative summary of graph properties (e.g., eigenvector centrality) over a sequence of DNN graphs obtained during training. We demonstrate that our framework captures useful dynamics by accurately predicting trained, task performance when using a summary over early training epochs (<5) across four different architectures and two image datasets. Moreover, by using a novel, highly-scalable DNN graph representation, we also show that the proposed framework captures generalizable dynamics as summaries extracted from smaller-width networks are effective when evaluated on larger widths.
翻译:了解深神经网络(DNN)的培训动态十分重要,因为它可以提高培训效率和任务绩效。最近的工作表明,静态图形的电线无法反映DNN在培训过程中的变化。因此,在这项工作中,我们提出了一个压缩的、直观的时空图框架,以有效捕捉计算机视觉中许多工作马结构的动态。具体地说,它为培训期间获得的DNN图表序列提取了图表属性信息性能摘要(例如,igenvictor Central)。我们证明,我们的框架通过准确预测四个不同结构和两个图像数据集早期培训小节段的概要(<5”),可以捕捉到有用的动态。此外,我们还利用一个新颖的、高度可扩缩的DNNN图形代表,表明拟议的框架在对较广的宽度进行评估时,通过从小网络提取的概要来捕捉一般可实现的动态。