In the context of sequential recommendation, a pivotal issue pertains to the comparative analysis between bi-directional/auto-encoding (AE) and uni-directional/auto-regressive (AR) attention mechanisms, where the conclusions regarding architectural and performance superiority remain inconclusive. Previous efforts in such comparisons primarily involve summarizing existing works to identify a consensus or conducting ablation studies on peripheral modeling techniques, such as choices of loss functions. However, far fewer efforts have been made in (1) theoretical and (2) extensive empirical analysis of the self-attention module, the very pivotal structure on which performance and designing insights should be anchored. In this work, we first provide a comprehensive theoretical analysis of AE/AR attention matrix in the aspect of (1) sparse local inductive bias, a.k.a neighborhood effects, and (2) low rank approximation. Analytical metrics reveal that the AR attention exhibits sparse neighborhood effects suitable for generally sparse recommendation scenarios. Secondly, to support our theoretical analysis, we conduct extensive empirical experiments on comparing vanilla and variant AE/AR attention on five popular benchmarks with AR performing better overall. Results based on adaptive tuning, modularized design and Huggingface are reported. Lastly, we shed light on future design choices for performant self-attentive recommenders. We make our code and data available at https://github.com/yueqirex/Self-Attention-Direction-Check.
翻译:暂无翻译