Insurance companies gather a growing variety of data for use in the insurance process, but most traditional ratemaking models are not designed to support them. In particular, many emerging data sources (text, images, sensors) may complement traditional data to provide better insights to predict the future losses in an insurance contract. This paper presents some of these emerging data sources and presents a unified framework for actuaries to incorporate these in existing ratemaking models. Our approach stems from representation learning, whose goal is to create representations of raw data. A useful representation will transform the original data into a dense vector space where the ultimate predictive task is simpler to model. Our paper presents methods to transform non-vectorial data into vectorial representations and provides examples for actuarial science.
翻译:保险公司收集了越来越多的不同数据,用于保险过程,但大多数传统的制价模型并不是用来支持这些数据的,特别是许多新兴数据来源(文字、图像、传感器)可以补充传统数据,以提供更好的洞察力来预测保险合同中未来的损失,本文件介绍了其中一些新兴数据来源,为精算师提供了一个统一框架,以便将这些数据纳入现有的制价模型。我们的方法来自代表性学习,目的是建立原始数据的表述。有用的表述方式将把原始数据转化为密集的矢量空间,最终的预测任务更便于建模。我们的文件介绍了将非车辆数据转化为载量表的方法,并为精算科学提供了实例。