In recent years it has become possible to collect GPS data from drivers and to incorporate this data into automobile insurance pricing for the driver. This data is continuously collected and processed nightly into metadata consisting of mileage and time summaries of each discrete trip taken, and a set of behavioral scores describing attributes of the trip (e.g, driver fatigue or driver distraction) so we examine whether it can be used to identify periods of increased risk by successfully classifying trips that occur immediately before a trip in which there was an incident leading to a claim for that driver. Identification of periods of increased risk for a driver is valuable because it creates an opportunity for intervention and, potentially, avoidance of a claim. We examine metadata for each trip a driver takes and train a classifier to predict whether \textit{the following trip} is one in which a claim occurs for that driver. By achieving a area under the receiver-operator characteristic above 0.6, we show that it is possible to predict claims in advance. Additionally, we compare the predictive power, as measured by the area under the receiver-operator characteristic of XGBoost classifiers trained to predict whether a driver will have a claim using exposure features such as driven miles, and those trained using behavioral features such as a computed speed score.
翻译:近年来,从司机那里收集全球定位系统数据,并将这些数据纳入司机的汽车保险定价。这些数据在夜间不断收集和处理,成为元数据,包括每次离散旅行的里程和时间摘要,以及一套描述旅行属性的行为分数(如司机疲劳或司机分心),因此我们研究是否可以利用这些数据来查明风险增加的时期,办法是成功地对旅行前夕发生的导致该司机索赔的事故进行分类。确定司机风险增加的时期很有价值,因为它创造了干预机会,而且有可能避免索赔。我们检查每次旅行的元数据,并培训一个叙级员,以预测该司机是否在其中提出了索赔要求。通过在0.6以上接收器操作器特征下实现一个区域,我们表明有可能提前预测索赔。此外,我们比较了预测能力,根据受培训的XGBoost叙级员的接收器特征测量的地区的预测能力,因为它创造了干预机会,并有可能避免索赔。我们检查了每次旅行的元数据,并培训了一个叙级员,以便预测该司机是否会使用受培训的历程特征进行风险程度的评分数。