When training the parameters of a linear dynamical model, the gradient descent algorithm is likely to fail to converge if the squared-error loss is used as the training loss function. Restricting the parameter space to a smaller subset and running the gradient descent algorithm within this subset can allow learning stable dynamical systems, but this strategy does not work for unstable systems. In this work, we look into the dynamics of the gradient descent algorithm and pinpoint what causes the difficulty of learning unstable systems. We show that observations taken at different times from the system to be learned influence the dynamics of the gradient descent algorithm in substantially different degrees. We introduce a time-weighted logarithmic loss function to fix this imbalance and demonstrate its effectiveness in learning unstable systems.
翻译:当培训线性动态模型的参数时,如果将正方位偏差损失用作培训损失函数,梯度下限算法可能无法趋同。将参数空间限制在较小的子集,并在子集内运行梯度下限算法,可以学习稳定的动态系统,但这一战略对不稳定系统不起作用。在这项工作中,我们研究梯度下限算法的动态,并找出学习不稳定系统的困难原因。我们显示,在系统的不同时间所观测到的情况对梯度下限算法的动态影响程度大不相同。我们引入了时间加权对数损失函数,以纠正这种不平衡,并表明其在学习不稳定系统方面的有效性。