We study computational hardness of feature and conjunction search through the lens of circuit complexity. Let $x = (x_1, ... , x_n)$ (resp., $y = (y_1, ... , y_n)$) be Boolean variables each of which takes the value one if and only if a neuron at place $i$ detects a feature (resp., another feature). We then simply formulate the feature and conjunction search as Boolean functions ${\rm FTR}_n(x) = \bigvee_{i=1}^n x_i$ and ${\rm CONJ}_n(x, y) = \bigvee_{i=1}^n x_i \wedge y_i$, respectively. We employ a threshold circuit or a discretized circuit (such as a sigmoid circuit or a ReLU circuit with discretization) as our models of neural networks, and consider the following four computational resources: [i] the number of neurons (size), [ii] the number of levels (depth), [iii] the number of active neurons outputting non-zero values (energy), and [iv] synaptic weight resolution (weight). We first prove that any threshold circuit $C$ of size $s$, depth $d$, energy $e$ and weight $w$ satisfies $\log rk(M_C) \le ed (\log s + \log w + \log n)$, where $rk(M_C)$ is the rank of the communication matrix $M_C$ of a $2n$-variable Boolean function that $C$ computes. Since ${\rm CONJ}_n$ has rank $2^n$, we have $n \le ed (\log s + \log w + \log n)$. Thus, an exponential lower bound on the size of even sublinear-depth threshold circuits exists if the energy and weight are sufficiently small. Since ${\rm FTR}_n$ is computable independently of $n$, our result suggests that computational capacity for the feature and conjunction search are different. We also show that the inequality is tight up to a constant factor if $ed = o(n/ \log n)$. We next show that a similar inequality holds for any discretized circuit. Thus, if we regard the number of gates outputting non-zero values as a measure for sparse activity, our results suggest that larger depth helps neural networks to acquire sparse activity.
翻译:我们通过电路复杂度的透镜来研究特性的计算硬度和连接搜索。 如果让 $x = (x_ 1,..., x_n) 美元 (resp., $y = (y_ 1,..., y_n) ) 的布利恩变量,其中每个值都包含一个值,只要一个神经元在现场检测了一个特性( resp., 另一个特性)。 然后我们简单地将功能和连接搜索设置为布利安函数 $( rm FTR ⁇ n (x) =\ bigvee i= 1 {n x_i$ 美元 和 $ 美元 comJNJón (x, y) =\ 美元 colm 的基值= 1 $ n_i\\\ rige\ y 美元 美元 。 我们使用一个阈限或离散电路的电路( sider) 作为神经元网络的模型,并且考虑以下四种计算资源: [i] 小神经值( 量(sizele) xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx