Accurately predicting customer churn using large scale time-series data is a common problem facing many business domains. The creation of model features across various time windows for training and testing can be particularly challenging due to temporal issues common to time-series data. In this paper, we will explore the application of extreme gradient boosting (XGBoost) on a customer dataset with a wide-variety of temporal features in order to create a highly-accurate customer churn model. In particular, we describe an effective method for handling temporally sensitive feature engineering. The proposed model was submitted in the WSDM Cup 2018 Churn Challenge and achieved first-place out of 575 teams.
翻译:准确预测使用大型时间序列数据的客户群是许多商业领域面临的一个常见问题,由于时间序列数据常见的时间问题,在各种时间窗口创建培训和测试的模型特征可能特别具有挑战性。在本文件中,我们将探讨在具有广泛时间特征的客户数据集中应用极端梯度增强(XGBoost),以创建高度精确的客户群模型。特别是,我们描述了处理时间敏感特征工程的有效方法。拟议的模型是在2018年WSDM Curn Chorn Creative(WSDM Cup 2018 Churn Chorn Creat)杯上提交的,在575个团队中取得了第一位。