Compositional data are multivariate observations that carry only relative information between components. Applying standard multivariate statistical methodology directly to analyze compositional data can lead to paradoxes and misinterpretations. Compositional data also frequently appear in insurance, especially with telematics information. However, such type of data does not receive deserved special treatment in most existing actuarial literature. In this paper, we explore and investigate the use of exponential family principal component analysis (EPCA) to analyze compositional data in insurance. The method is applied to analyze a dataset obtained from the U.S. Mine Safety and Health Administration. The numerical results show that EPCA is able to produce principal components that are significant predictors and improve the prediction accuracy of the regression model. The EPCA method can be a promising useful tool for actuaries to analyze compositional data.
翻译:构成数据是多变量的观察,它只包含不同组成部分之间的相对信息。直接应用标准的多变量统计方法分析组成数据可能导致自相矛盾和误解。组成数据也经常出现在保险中,特别是在远程信息信息中。然而,在大多数现有的精算文献中,这类数据并不值得特别处理。在本文中,我们探讨并调查使用指数式家庭主要组成部分分析分析(EPCA)来分析保险中的组成数据。这种方法用于分析从美国矿山安全和卫生管理局获得的数据集。数字结果显示,EPCA能够产生主要组成部分,这些主要组成部分是重要的预测数据,并提高回归模型的预测准确性。EPCA方法可以成为精算师分析组成数据的一个很有希望的有用工具。