Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., comparing vision transformers to CNNs, systematic comparison of function, especially across different networks, remains difficult and is often carried out layer by layer. Approaches such as canonical correlation analysis (CCA) are applicable in principle, but have been sparingly used so far. In this paper, we revisit a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions. We describe the steps necessary to carry out its deployment for large scale models -- this opens the door to a surprising array of applications ranging from conditioning one deep model w.r.t. another, learning disentangled representations as well as optimizing diverse models that would directly be more robust to adversarial attacks. Our experiments suggest a versatile regularizer (or constraint) with many advantages, which avoids some of the common difficulties one faces in such analyses. Code is at https://github.com/zhenxingjian/Partial_Distance_Correlation.
翻译:比较神经网络模型的功能行为,无论是在培训期间或培训后的一个或两个(或更多的网络)的单一网络或两个(或更多的网络),是了解它们正在学习什么(和不是什么),以及确定规范化或效率提高战略的一个重要步骤。尽管最近取得了进展,例如将视觉变压器与CNN系统比较,系统比较功能,特别是不同网络之间的功能仍然困难,而且往往由层层进行。原则上,可以适用星际关联分析等方法,但迄今很少使用。在本文件中,我们从统计数据中重新审视一个(不广为人知的)称为距离相关性(及其部分变量),目的是评估不同层面地貌空间之间的相互关系。我们描述了为大规模模型进行部署的必要步骤 -- -- 这打开了从调节一个深层次的模型 w.r.t. 另一种, 学习混乱的表达方式,以及优化对对抗性攻击直接更强大的多种模式。我们的实验表明,一个多功能化的正规化(或制约),有许多优点,避免一些常见的难题。