A simple way of obtaining robust estimates of the "center" (or the "location") and of the "scatter" of a dataset is to use the maximum likelihood estimate with a class of heavy-tailed distributions, regardless of the "true" distribution generating the data. We observe that the maximum likelihood problem for the Cauchy distributions, which have particularly heavy tails, is geodesically convex and therefore efficiently solvable (Cauchy distributions are parametrized by the upper half plane, i.e. by the hyperbolic plane). Moreover, it has an appealing geometrical meaning: the datapoints, living on the boundary of the hyperbolic plane, are attracting the parameter by unit forces, and we search the point where these forces are in equilibrium. This picture generalizes to several classes of multivariate distributions with heavy tails, including, in particular, the multivariate Cauchy distributions. The hyperbolic plane gets replaced by symmetric spaces of noncompact type. Geodesic convexity gives us an efficient numerical solution of the maximum likelihood problem for these distribution classes. This can then be used for robust estimates of location and spread, thanks to the heavy tails of these distributions.
翻译:暂无翻译