Recent years have seen rapid increase in the application of machine learning to insurance loss reserving. They yield most value when applied to large data sets, such as individual claims, or large claim triangles. In short, they are likely to be useful in the analysis of any data set whose volume is sufficient to obscure a naked-eye view of its features. Unfortunately, such large data sets are in short supply in the actuarial literature. Accordingly, one needs to turn to synthetic data. Although the ultimate objective of these methods is application to real data, the use of synthetic data containing features commonly observed in real data is also to be encouraged. While there are a number of claims simulators in existence, each valuable within its own context, the inclusion of a number of desirable (but complicated) data features requires further development. Accordingly, in this paper we review those desirable features, and propose a new simulator of individual claim experience called `SynthETIC`. Our simulator is publicly available, open source, and fills a gap in the non-life actuarial toolkit. The simulator specifically allows for desirable (but optionally complicated) data features typically occurring in practice, such as variations in rates of settlements and development patterns; as with superimposed inflation, and various discontinuities, and also enables various dependencies between variables. The user has full control of the mechanics of the evolution of an individual claim. As a result, the complexity of the data set generated (meaning the level of difficulty of analysis) may be dialled anywhere from extremely simple to extremely complex.
翻译:近些年来,机器学习应用于保险损失保留方面的应用迅速增加,在应用大型数据组时,如个人索赔或大型索赔三角体,其价值最大,简而言之,它们可能有助于分析数量足以掩盖其特征裸视观的任何数据集,不幸的是,精算文献中这类大型数据集供应不足,因此,需要转向合成数据。虽然这些方法的最终目标是应用真实数据,但也鼓励使用含有真实数据中常见特征的合成数据。虽然存在一些索赔模拟器,每个索赔模拟器都具有其内部价值,但列入一些可取(但复杂)数据特征,需要进一步发展。因此,我们在本文件中审查这些理想特征,并提出个人索赔经验的新模拟器,称为`合成数据'。我们的模拟器是公开的、公开的来源,填补了非寿命精算工具包中常见的难度。模拟器特别允许使用一些要求模拟器(但有选择性的复杂)在其自身背景下具有价值,但列入一些可取的(但复杂的)数据特征需要进一步发展。因此,我们在本文件中审查这些理想的特征,并提出一个称为`合成'合成'的新的模拟数据模拟数据模拟器,并填补非生命精算工具的难度。