The success of gradient descent in ML and especially for learning neural networks is remarkable and robust. In the context of how the brain learns, one aspect of gradient descent that appears biologically difficult to realize (if not implausible) is that its updates rely on feedback from later layers to earlier layers through the same connections. Such bidirected links are relatively few in brain networks, and even when reciprocal connections exist, they may not be equi-weighted. Random Feedback Alignment (Lillicrap et al., 2016), where the backward weights are random and fixed, has been proposed as a bio-plausible alternative and found to be effective empirically. We investigate how and when feedback alignment (FA) works, focusing on one of the most basic problems with layered structure -- low-rank matrix factorization. In this problem, given a matrix $Y_{n\times m}$, the goal is to find a low rank factorization $Z_{n \times r}W_{r \times m}$ that minimizes the error $\|ZW-Y\|_F$. Gradient descent solves this problem optimally. We show that FA converges to the optimal solution when $r\ge \mbox{rank}(Y)$. We also shed light on how FA works. It is observed empirically that the forward weight matrices and (random) feedback matrices come closer during FA updates. Our analysis rigorously derives this phenomenon and shows how it facilitates convergence of FA. We also show that FA can be far from optimal when $r < \mbox{rank}(Y)$. This is the first provable separation result between gradient descent and FA. Moreover, the representations found by gradient descent and FA can be almost orthogonal even when their error $\|ZW-Y\|_F$ is approximately equal.
翻译:在ML 中梯度下降的成功率, 特别是学习神经网络的梯度下降率的成功率是惊人和强劲的。 在大脑如何学习的背景下, 一个看起来在生物上难以实现的梯度下降率的一个方面(如果不难相信的话)是, 它的更新取决于从后层到前层的反馈, 通过相同的连接。 在大脑网络中,这种双向连接相对较少, 即便存在对等连接, 它们也可能不是相等的。 随机反馈对齐( Lillicrap et al., 2016), 其后向重量是随机的和固定的。 在生物可变现的替代品中, 发现一个生物可变现的替代品, 并发现有效的经验上有效的。 我们调查反馈对时间的回落( FA) 的回升率是如何和最接近的。 我们的FAFI 分析显示, 当我们最接近于最精确的回升的回流时, 其回溯的回溯度是如何在最接近的回流中 。