Gaussian Processes (GPs) are widely used to model dependency in spatial statistics and machine learning, yet the exact computation suffers an intractable time complexity of $O(n^3)$. Vecchia approximation allows scalable Bayesian inference of GPs in $O(n)$ time by introducing sparsity in the spatial dependency structure that is characterized by a directed acyclic graph (DAG). Despite the popularity in practice, it is still unclear how to choose the DAG structure and there are still no theoretical guarantees in nonparametric settings. In this paper, we systematically study the Vecchia GPs as standalone stochastic processes and uncover important probabilistic properties and statistical results in methodology and theory. For probabilistic properties, we prove that the conditional distributions of the Mat\'{e}rn GPs, as well as the Vecchia approximations of the Mat\'{e}rn GPs, can be characterized by polynomials. This allows us to prove a series of results regarding the small ball probabilities and RKHSs of Vecchia GPs. For statistical methodology, we provide a principled guideline to choose parent sets as norming sets with fixed cardinality and provide detailed algorithms following such guidelines. For statistical theory, we prove posterior contraction rates for applying Vecchia GPs to regression problems, where minimax optimality is achieved by optimally tuned GPs via either oracle rescaling or hierarchical Bayesian methods. Our theory and methodology are demonstrated with numerical studies, where we also provide efficient implementation of our methods in C++ with R interfaces.
翻译:暂无翻译