Sequential recommendation (SR) aims to model users' dynamic preferences from their historical interactions. Recently, Transformer and convolution neural networks (CNNs) have shown great success in learning representations for SR. Nevertheless, Transformer mainly focus on capturing content-based global interactions, while CNNs effectively exploit local features in practical recommendation scenarios. Thus, how to effectively aggregate CNNs and Transformer to model both \emph{local} and \emph{global} dependencies of historical item sequence still remains an open challenge and is rarely studied in SR. To this regard, we inject locality inductive bias into Transformer by combining its global attention mechanism with a local convolutional filter, and adaptively determine the mixing importance on a personalized basis through a module and layer-aware adaptive mixture units, named AdaMCT. Moreover, considering that softmax-based attention may encourage unimodal activation, we introduce the Squeeze-Excitation Attention (with sigmoid activation) into sequential recommendation to capture multiple relevant items (keys) simultaneously. Extensive experiments on three widely used benchmark datasets demonstrate that AdaMCT significantly outperforms the previous Transformer and CNNs-based models by an average of 18.46% and 60.85% respectively in terms of NDCG@5 and achieves state-of-the-art performance.
翻译:顺序建议(SR)旨在从历史互动中模拟用户的动态偏好。最近,变异和变异神经网络(CNNNs)在为SR的学习演示中表现出巨大的成功。然而,变异器主要侧重于捕捉基于内容的全球互动,而CNN则在切实可行的建议情景中有效地利用当地特点。因此,如何有效地将CNN和变异器的历史性物品序列依赖性结合起来,以模拟其历史序列的动态偏向性(emph{local}和\emph{global})仍是一个公开的挑战,在SR中很少研究。在这方面,我们通过将其全球关注机制与地方变异过滤器相结合,向变异器注入感偏向性偏向。然而,我们将其全球关注机制与本地变异过滤器相结合,通过一个模块和分层适应性适应性适应性地决定个人化的重要性,称为AdaMCT。此外,考虑到软式的注意可能会鼓励单式激活,我们将“地震感应注意”(和微感应作用)引入一系列建议,以便同时捕捉到多个相关项目(钥匙)。在三个广泛使用的全球注意机制上进行广泛的实验,通过AMAMCT分别在以前的18和18的模型中实现先前的状态和18的状态上的业绩。