The present paper reviews and discusses work from computer science that proposes to identify concepts in internal representations (hidden layers) of DNNs. It is examined, first, how existing methods actually identify concepts that are supposedly represented in DNNs. Second, it is discussed how conceptual spaces -- sets of concepts in internal representations -- are shaped by a tradeoff between predictive accuracy and compression. These issues are critically examined by drawing on philosophy. While there is evidence that DNNs able to represent non-trivial inferential relations between concepts, our ability to identify concepts is severely limited.
翻译:本文件审查并讨论计算机科学的工作,其中提议在DNN的内部陈述(隐蔽层)中确定概念,首先审查现有方法如何实际确定本应在DNN中体现的概念,其次讨论概念空间 -- -- 内部表述中的一系列概念 -- -- 是如何由预测准确性和压缩之间的权衡来决定的,这些问题根据哲学进行严格审查。虽然有证据表明DNN能够代表概念之间的非三重推定关系,但我们确定概念的能力受到严重限制。