Data on a continuous variable are often summarized by means of histograms or displayed in tabular format: the range of data is partitioned into consecutive interval classes and the number of observations falling within each class is provided to the analyst. Computations can then be carried in a nonparametric way by assuming a uniform distribution of the variable within each partitioning class, by concentrating all the observed values in the center, or by spreading them to the extremities. Smoothing methods can also be applied to estimate the underlying density or a parametric model can be fitted to these grouped data. For insurance loss data, some additional information is often provided about the observed values contained in each class, typically class-specific sample moments such as the mean, the variance or even the skewness and the kurtosis. The question is then how to include this additional information in the estimation procedure. The present paper proposes a method for performing density and quantile estimation based on such augmented information with an illustration on car insurance data.
翻译:连续变量的数据往往以直方图的形式或以表格的形式加以归纳:数据的范围被分割成连续的间隔类别,并且向分析员提供每个类别内观测的数量。然后,计算可以以非参数的方式进行,假设每个分区类别内变量的统一分布,集中在中间的所有观察到的值,或将其散布到外形。平滑方法也可以用于估计底密度,或可对这些组别数据安装参数模型。关于保险损失数据,往往提供一些额外资料,说明每个类别中观察到的数值,典型的类别特定抽样时间,如平均值、差异、甚至临界值和圆柱体;然后的问题是如何将这一额外资料纳入估计程序。本文件提出一种基于这种增强的信息进行密度和孔洞估计的方法,并附有汽车保险数据的说明。