Many statistical problems require estimating a density function, say $f$, from data samples. In this work, for example, we are interested in highest-density regions (HDRs), i.e., minimum volume sets that contain a given probability. HDRs are typically computed using a density quantile approach, which, in the case of unknown densities, involves their estimation. This task turns out to be far from trivial, especially over increased dimensions and when data are sparse and exhibit complex structures (e.g., multimodalities or particular dependencies). We address this challenge by exploring alternative approaches to build HDRs that overcome direct (multivariate) density estimation. First, we generalize the density quantile method, currently implementable on the basis of a consistent estimator of the density, to $neighbourhood$ measures, i.e., measures that preserve the order induced in the sample by $f$. Second, we discuss a number of suitable probabilistic- and distance-based measures such as the $k$-nearest neighbourhood Euclidean distance. Third, motivated by the ubiquitous role of $copula$ modeling in modern statistics, we explore its use in the context of probabilistic-based measures. An extensive comparison among the introduced measures is provided, and their implications for computing HDRs in real-world problems are discussed.
翻译:暂无翻译