In this paper, we aim to design a quantitative similarity function between two neural architectures. Specifically, we define a model similarity using input gradient transferability. We generate adversarial samples of two networks and measure the average accuracy of the networks on adversarial samples of each other. If two networks are highly correlated, then the attack transferability will be high, resulting in high similarity. Using the similarity score, we investigate two topics: (1) Which network component contributes to the model diversity? (2) How does model diversity affect practical scenarios? We answer the first question by providing feature importance analysis and clustering analysis. The second question is validated by two different scenarios: model ensemble and knowledge distillation. Our findings show that model diversity takes a key role when interacting with different neural architectures. For example, we found that more diversity leads to better ensemble performance. We also observe that the relationship between teacher and student networks and distillation performance depends on the choice of the base architecture of the teacher and student networks. We expect our analysis tool helps a high-level understanding of differences between various neural architectures as well as practical guidance when using multiple architectures.
翻译:在本文中,我们的目标是设计两个神经结构之间的数量相似功能。 具体地说, 我们用输入梯度可转移性来定义一个模型相似性。 我们生成两个网络的对称样本, 并测量对称样本中网络的平均准确性。 如果两个网络高度关联, 那么攻击可转移性就会很高, 从而导致高度相似性。 我们用相似性评分来调查两个主题:(1) 模型中哪个网络组成部分有助于模型的多样性? (2) 模型多样性如何影响实际情景? 我们通过提供特征重要性分析和群集分析来回答第一个问题。 第二个问题通过两种不同的假设得到验证: 模型集成和知识蒸馏。 我们的研究结果显示, 模型多样性在与不同的神经结构互动时具有关键作用。 例如, 我们发现, 更多的多样性可以导致更好的共同性表现。 我们还观察到, 师生网络和蒸馏性表现之间的关系取决于师生网络基础结构的选择。 我们期望我们的分析工具有助于高层次地理解各种神经结构之间的差异, 以及使用多种结构时的实用指导。