人寿保险中采用混合类型数据 (Applications of Clustering with Mixed Type Data in Life Insurance)

Death benefits are generally the largest cash flow item that affects financial statements of life insurers where some still do not have a systematic process to track and monitor death claims experience. In this article, we explore data clustering to examine and understand how actual death claims differ from expected, an early stage of developing a monitoring system crucial for risk management. We extend the $k$-prototypes clustering algorithm to draw inference from a life insurance dataset using only the insured's characteristics and policy information without regard to known mortality. This clustering has the feature to efficiently handle categorical, numerical, and spatial attributes. Using gap statistics, the optimal clusters obtained from the algorithm are then used to compare actual to expected death claims experience of the life insurance portfolio. Our empirical data contains observations, during 2014, of approximately 1.14 million policies with a total insured amount of over 650 billion dollars. For this portfolio, the algorithm produced three natural clusters, with each cluster having a lower actual to expected death claims but with differing variability. The analytical results provide management a process to identify policyholders' attributes that dominate significant mortality deviations, and thereby enhance decision making for taking necessary actions.

翻译：一般来说,死亡抚恤金是影响人寿保险人财务报表的最大现金流量项目,其中有些人仍然没有系统跟踪和监测死亡索赔情况的程序。在本条中,我们探索数据组群,以检查和了解实际死亡索赔与预计死亡索赔情况有何不同,这是开发一个对风险管理至关重要的监测系统的早期阶段。我们扩展了美元原型分类算法,以便从人寿保险数据集中得出推断,仅使用被保险人的特性和政策信息,而不考虑已知的死亡率。这种组合具有有效处理绝对性、数字性和空间属性的特点。利用差距统计,然后利用算法获得的最佳分类法来比较人寿保险组合的实际死亡索赔情况与预期死亡索赔情况。我们的经验数据包含2014年的观察结果,约为114万个保单,总保险额超过6 500亿美元。关于这一组合,算法产生了三个自然组群,每个组群的实际情况比预期死亡索赔低,但变化不一。分析结果为管理层提供了一种程序,用以确定占重大死亡率偏离率的投保人的属性,从而增强必要行动的决策。