Gaussian stochastic process (GaSP) has been widely used as a prior over functions due to its flexibility and tractability in modeling. However, the computational cost in evaluating the likelihood is $O(n^3)$, where $n$ is the number of observed points in the process, as it requires to invert the covariance matrix. This bottleneck prevents GaSP being widely used in large-scale data. We propose a general class of nonseparable GaSP models for multiple functional observations with a fast and exact algorithm, in which the computation is linear ($O(n)$) and exact, requiring no approximation to compute the likelihood. We show that the commonly used linear regression and separable models are special cases of the proposed nonseparable GaSP model. Through the study of an epigenetic application, the proposed nonseparable GaSP model can accurately predict the genome-wide DNA methylation levels and compares favorably to alternative methods, such as linear regression, random forest and localized Kriging method. The algorithm for fast computation is implemented in the ${\tt FastGaSP}$ R package on CRAN.
翻译:Gausian Stochasteric 进程(Gausian Stochasteric process (GaSP) 由于其在建模方面的灵活性和可移动性,已被广泛用作先前的一项功能。然而,评估可能性的计算成本是 $O (n)3, 美元, 美元是该过程中观测到的点数, 因为它需要倒转共差矩阵。 这个瓶颈防止在大型数据中广泛使用 Gausian 的 Gauscast 进程( GaSP 进程) 。 我们提议了一个非可分离的 GaSP 模型的一般类别, 使用快速精确的算法进行多重功能观测, 其计算是线性( O(n) $) 和准确的, 不需要近似值来计算可能性。 我们显示, 常用的线性回归和可分离模型是拟议不可分离的 GaSP 模型的特殊案例。 通过对异基因应用的研究, 拟议的不可分离的 GaSP 模型可以准确预测整个基因组的DNA甲基化水平, 和比较优于替代方法, 如线性回归、 随机森林和局部克里金化方法。 。 快速计算的算法是在 CRAAN 的 的 的 的 $_ trast GSP 。