Multivariate histograms are difficult to construct due to the curse of dimensionality. Motivated by $k$-d trees in computer science, we show how to construct an efficient data-adaptive partition of Euclidean space that possesses the following two properties: With high confidence the distribution from which the data are generated is close to uniform on each rectangle of the partition; and despite the data-dependent construction we can give guaranteed finite sample simultaneous confidence intervals for the probabilities (and hence for the average densities) of each rectangle in the partition. This partition will automatically adapt to the sizes of the regions where the distribution is close to uniform. The methodology produces confidence intervals whose widths depend only on the probability content of the rectangles and not on the dimensionality of the space, thus avoiding the curse of dimensionality. Moreover, the widths essentially match the optimal widths in the univariate setting. The simultaneous validity of the confidence intervals allows to use this construction, which we call {\sl Beta-trees}, for various data-analytic purposes. We illustrate this by using Beta-trees for visualizing data and for multivariate mode-hunting.
翻译:暂无翻译