The increasing need to analyse large volumes of data has led to the development of Symbolic Data Analysis as a promising field to tackle the data challenges of our time. New data types, such as interval-valued data, have brought fresh theoretical and methodological problems to be solved. In this paper, we derive explicit formulas for computing the Mallows' distance, also known as $L_2$ Wasserstein distance, between two \textit{p}-dimensional intervals, using information regarding the distribution of the microdata. We establish this distance as a Mahalanobis' distance between two 2\textit{p}-dimensional vectors. Our comprehensive analysis leads to the generalisation of the definitions of the expected value and covariance matrix of an interval-valued random vector. These novel results bring theoretical support and interpretability to state-of-the-art contributions. Additionally, we discuss real examples that illustrate how we can model different levels of available information on the microdata, leading to proper estimates of the measures of location and association.
翻译:暂无翻译