Archetypal analysis (AA) was originally proposed in 1994 by Adele Cutler and Leo Breiman as a computational procedure for extracting distinct aspects, so-called archetypes, from observations, with each observational record approximated as a mixture (i.e., convex combination) of these archetypes. AA thereby provides straightforward, interpretable, and explainable representations for feature extraction and dimensionality reduction, facilitating the understanding of the structure of high-dimensional data and enabling wide applications across the sciences. However, AA also faces challenges, particularly as the associated optimization problem is non-convex. This is the first survey that provides researchers and data mining practitioners with an overview of the methodologies and opportunities that AA offers, surveying the many applications of AA across disparate fields of science, as well as best practices for modeling data with AA and its limitations. The survey concludes by explaining crucial future research directions concerning AA.
翻译:原型分析(Archetypal Analysis, AA)最初由Adele Cutler与Leo Breiman于1994年提出,是一种从观测数据中提取独特方面(即所谓原型)的计算方法,其中每个观测记录均可近似表示为这些原型的混合(即凸组合)。因此,AA为特征提取与降维提供了直观、可解释且可说明的表征方式,有助于理解高维数据的结构,并促成了其在各科学领域的广泛应用。然而,AA也面临挑战,尤其是其相关的优化问题具有非凸性。本文是首篇为研究人员与数据挖掘实践者系统梳理AA所提供的方法体系与应用前景的综述性文章,全面考察了AA在众多科学领域中的广泛应用,以及使用AA进行数据建模的最佳实践与其局限性。最后,本文通过阐述AA未来关键的研究方向作为总结。