Generalization of time series prediction remains an important open issue in machine learning, wherein earlier methods have either large generalization error or local minima. We develop an analytically solvable, unsupervised learning scheme that extracts the most informative components for predicting future inputs, termed predictive principal component analysis (PredPCA). Our scheme can effectively remove unpredictable noise and minimize test prediction error through convex optimization. Mathematical analyses demonstrate that, provided with sufficient training samples and sufficiently high-dimensional observations, PredPCA can asymptotically identify hidden states, system parameters, and dimensionalities of canonical nonlinear generative processes, with a global convergence guarantee. We demonstrate the performance of PredPCA using sequential visual inputs comprising hand-digits, rotating 3D objects, and natural scenes. It reliably estimates distinct hidden states and predicts future outcomes of previously unseen test input data, based exclusively on noisy observations. The simple architecture and low computational cost of PredPCA are highly desirable for neuromorphic hardware.
翻译:在机器学习中,时间序列预测的普及仍然是一个尚未解决的重要问题,在机器学习中,早期方法要么存在大的一般性错误,要么是局部微小错误。我们开发了一个分析性的、可溶解的、不受监督的学习计划,它提取了预测未来投入的最丰富内容,称为预测主要组成部分分析(PredPCA )。我们的计划可以有效地消除不可预测的噪音,并通过电磁优化将试验预测错误降到最低程度。数学分析表明,如果有足够的培训样本和足够高的高度观测,PredPCA 就可以以全球趋同保证的方式,在瞬间识别非线性金体基因过程的隐藏状态、系统参数和维度。我们用由手动数字、旋转3D对象和自然场景组成的连续视觉输入来展示PredPCA的性能。它可靠地估计了不同的隐蔽状态,并预测了完全以噪音观测为基础的先前不见的试验输入数据的未来结果。对于神经形态硬件来说,非常需要简单的结构和低的计算成本。