Understanding the customer behaviours behind transactional data has high commercial value in the grocery retail industry. Customers generate millions of transactions every day, choosing and buying products to satisfy specific shopping needs. Product availability may vary geographically due to local demand and local supply, thus driving the importance of analysing transactions within their corresponding store and regional context. Topic models provide a powerful tool in the analysis of transactional data, identifying topics that display frequently-bought-together products and summarising transactions as mixtures of topics. We use the Segmented Topic Model (STM) to capture customer behaviours that are nested within stores. STM not only provides topics and transaction summaries but also topical summaries at the store level that can be used to identify regional topics. We summarised the posterior distribution of STM by post-processing multiple posterior samples and selecting semantic modes represented as recurrent topics. We use linear Gaussian process regression to model topic prevalence across British territory while accounting for spatial autocorrelation. We implement our methods on a dataset of transactional data from a major UK grocery retailer and demonstrate that shopping behaviours may vary regionally and nearby stores tend to exhibit similar regional demand.
翻译:在杂货零售业中,客户每天产生数以百万计的交易,选择和购买产品,以满足具体的购物需求。产品供应量可能因当地需求和当地供应而因地域不同而不同,从而推动分析相应商店和区域范围内交易的重要性。主题模型为分析交易数据提供了强有力的工具,确定了经常购买的产品和作为主题组合的汇总交易的话题。我们使用分类主题模型(STM)来捕捉在商店内嵌入的客户行为。STM不仅提供专题和交易摘要,而且还提供商店一级的专题摘要,可用于确定区域议题。我们通过后处理的多个远地点样品总结了STM的远地点分布,并选择了作为经常性议题的语义模式。我们用线性高斯进程回归来模拟整个英国领土的专题流行情况,同时核算空间自动化关系。我们采用我们从英国主要零售商那里收集交易数据的方法,并表明购物行为可能因区域和附近商店的不同而表现出类似的区域需求。