Customer churn prediction is a valuable task in many industries. In telecommunications it presents great challenges, given the high dimensionality of the data, and how difficult it is to identify underlying frustration signatures, which may represent an important driver regarding future churn behaviour. Here, we propose a novel Bayesian hierarchical joint model that is able to characterise customer profiles based on how many events take place within different television watching journeys, and how long it takes between events. The model drastically reduces the dimensionality of the data from thousands of observations per customer to 11 customer-level parameter estimates and random effects. We test our methodology using data from 40 BT customers (20 active and 20 who eventually cancelled their subscription) whose TV watching behaviours were recorded from October to December 2019, totalling approximately half a million observations. Employing different machine learning techniques using the parameter estimates and random effects from the Bayesian hierarchical model as features yielded up to 92\% accuracy predicting churn, associated with 100\% true positive rates and false positive rates as low as 14\% on a validation set. Our proposed methodology represents an efficient way of reducing the dimensionality of the data, while at the same time maintaining high descriptive and predictive capabilities. We provide code to implement the Bayesian model at https://github.com/rafamoral/profiling_tv_watching_behaviour.
翻译:在许多行业中,客户的预测是一项宝贵的任务。在电信领域,它提出了巨大的挑战,因为数据具有高度的维度,而且很难确定基本的挫折性信号,这可能是未来骚动行为的一个重要驱动因素。在这里,我们提出一个新颖的贝叶西亚等级联合模型,能够根据不同电视观光旅程中发生的事件数量以及事件间隔时间来描述客户概况。该模型将数据的维度从每客户数千次观测到11次客户级参数估计和随机效应。我们使用40 BT客户(20个积极和20个最终取消订阅)的数据测试我们的方法,这些客户的电视观察行为从2019年10月至12月记录,总计约50万次。使用不同机器学习技术,使用参数估计和来自巴伊西亚等级模型的随机效应,其特征达到92 ⁇ 准确的预测值,与100 ⁇ 真实正正率和误正率低至14 ⁇ 。我们提议的方法是降低数据维度的有效方法,同时在Bayfavia/Mexxxxxxal 提供高压/高压/高压。