In this article we perform an asymptotic analysis of parallel Bayesian logspline density estimators. Such estimators are useful for the analysis of datasets that are partitioned into subsets and stored in separate databases without the capability of accessing the full dataset from a single computer. The parallel estimator we introduce is in the spirit of a kernel density estimator introduced in recent studies. We provide a numerical procedure that produces the normalized density estimator itself in place of the sampling algorithm. We then derive an error bound for the mean integrated squared error of the full dataset posterior estimator. The error bound depends upon the parameters that arise in logspline density estimation and the numerical approximation procedure. In our analysis, we identify the choices for the parameters that result in the error bound scaling optimally in relation to the number of samples. This provides our method with increased estimation accuracy, while also minimizing the computational cost.
翻译:暂无翻译