This tutorial demonstrates workflows to incorporate text data into actuarial classification and regression tasks. The main focus is on methods employing transformer-based models. A dataset of car accident descriptions with an average length of 400 words, available in English and German, and a dataset with short property insurance claims descriptions are used to demonstrate these techniques. The case studies tackle challenges related to a multi-lingual setting and long input sequences. They also show ways to interpret model output, to assess and improve model performance, by fine-tuning the models to the domain of application or to a specific prediction task. Finally, the tutorial provides practical approaches to handle classification tasks in situations with no or only few labeled data. The results achieved by using the language-understanding skills of off-the-shelf natural language processing (NLP) models with only minimal pre-processing and fine-tuning clearly demonstrate the power of transfer learning for practical applications.
翻译:这一指导性工作展示了将文本数据纳入精算分类和回归任务的工作流程,主要重点是采用以变压器为基础的模型的方法。用英文和德文提供的关于车祸说明的数据集平均长度为400字,用短期财产保险索赔说明的数据集来演示这些技术。案例研究解决了与多种语文环境和较长输入顺序有关的挑战。案例研究还展示了解释模型产出、评估和改进模型绩效的方法,将模型微调到应用领域或具体预测任务。最后,该指导性文件提供了处理分类任务的实用方法,在没有或只有很少标记数据的情况下。通过使用现成的自然语言处理(NLP)模式的语文技能而取得的成果,只有最低限度的预处理和微调,明确显示了实际应用的转移学习能力。