Characterizing the implicit structure of the computation within neural networks is a foundational problem in the area of deep learning interpretability. Can their inner decision process be captured symbolically in some familiar logic? We show that any transformer neural network can be translated into an equivalent fixed-size first-order logic formula which may also use majority quantifiers. The idea is to simulate transformers with highly uniform threshold circuits and leverage known theoretical connections between circuits and logic. Our findings also reveal the surprising fact that the entire transformer computation can be reduced merely to the division of two (large) integers. While our results are most pertinent for transformers, they apply equally to a broader class of neural network architectures, namely those with a fixed-depth uniform computation graph made up of standard neural net components, which includes feedforward and convolutional networks.
翻译:神经网络内计算隐含结构的定性是深层学习解释领域的一个基本问题。 他们的内部决定过程能否被某些熟悉的逻辑以象征性的方式捕捉到? 我们显示,任何变压器神经网络都可以被转化成一个等效的固定尺寸一阶逻辑公式,该公式也可以使用多数量化符。 其想法是模拟具有高度统一临界电路的变压器,并利用电路和逻辑之间的已知理论联系。 我们的研究结果还揭示出一个令人惊讶的事实,即整个变压器的计算可以仅仅减少到两个(大)整数的分数。 虽然我们的结果对变压器最为相关,但它们同样适用于更广泛的神经网络结构类别,即由标准神经网组件组成的固定深度统一计算图,其中包括进料和进料网络。