This study investigates the task of dwell time prediction and proposes a Transformer framework based on interaction behavior modeling. The method first represents user interaction sequences on the interface by integrating dwell duration, click frequency, scrolling behavior, and contextual features, which are mapped into a unified latent space through embedding and positional encoding. On this basis, a multi-head self-attention mechanism is employed to capture long-range dependencies, while a feed-forward network performs deep nonlinear transformations to model the dynamic patterns of dwell time. Multiple comparative experiments are conducted with BILSTM, DRFormer, FedFormer, and iTransformer as baselines under the same conditions. The results show that the proposed method achieves the best performance in terms of MSE, RMSE, MAPE, and RMAE, and more accurately captures the complex patterns in interaction behavior. In addition, sensitivity experiments are carried out on hyperparameters and environments to examine the impact of the number of attention heads, sequence window length, and device environment on prediction performance, which further demonstrates the robustness and adaptability of the method. Overall, this study provides a new solution for dwell time prediction from both theoretical and methodological perspectives and verifies its effectiveness in multiple aspects.
翻译:本研究针对停留时间预测任务,提出了一种基于交互行为建模的Transformer框架。该方法首先通过整合停留时长、点击频率、滚动行为及上下文特征来表示界面上的用户交互序列,并通过嵌入表示与位置编码将其映射到统一的潜在空间。在此基础上,采用多头自注意力机制捕捉长程依赖关系,同时通过前馈网络进行深度非线性变换以建模停留时间的动态模式。在相同条件下,以BILSTM、DRFormer、FedFormer及iTransformer为基线模型进行了多组对比实验。结果表明,所提方法在MSE、RMSE、MAPE和RMAE指标上均取得最优性能,且能更精准地捕捉交互行为中的复杂模式。此外,通过对超参数与环境进行敏感性实验,考察了注意力头数、序列窗口长度及设备环境对预测性能的影响,进一步验证了方法的鲁棒性与适应性。总体而言,本研究从理论与方法层面为停留时间预测提供了新的解决方案,并在多角度验证了其有效性。