Computing the matrix square root or its inverse in a differentiable manner is important in a variety of computer vision tasks. Previous methods either adopt the Singular Value Decomposition (SVD) to explicitly factorize the matrix or use the Newton-Schulz iteration (NS iteration) to derive the approximate solution. However, both methods are not computationally efficient enough in either the forward pass or in the backward pass. In this paper, we propose two more efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad\'e Approximants (MPA). The backward gradient is computed by iteratively solving the continuous-time Lyapunov equation using the matrix sign function. Both methods yield considerable speed-up compared with the SVD or the Newton-Schulz iteration. Experimental results on the de-correlated batch normalization and second-order vision transformer demonstrate that our methods can also achieve competitive and even slightly better performances. The code is available at \href{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}.
翻译:以不同的方式计算矩阵平方根或其反向对于各种计算机视觉任务很重要。 以往的方法要么采用单值分解法( SVD) 来明确对矩阵进行分解, 要么使用牛顿- Schulz 循环法( NS 迭代法) 来得出近似解决方案。 但是, 这两种方法在前进通道或后向通道中计算效率都不够高。 在本文中, 我们建议两种更高效的变量来计算不同的矩阵平方根。 对于前方传播, 一种方法是使用矩阵 Taylor 聚合( MTP), 而另一种方法是使用 矩阵 Pad\' e Approximants (MPA) 。 后向梯度的计算方法是使用矩阵符号符号函数, 迭代解决连续时间的 Lyapunov 方程式。 这两种方法与 SVD 或 牛顿- Schulst Exeration 相比, 都产生相当的超速性。 与调的批次组合和次级视觉变异的实验结果 显示我们的方法也可以实现竞争甚至更好的性表现。 。 。 马基/ / 论坛/ / / 论坛/ 战略可调/ 。 可调 / / 可调 。