Clickstreams on individual websites have been studied for decades to gain insights into user interests and to improve website experiences. This paper proposes and examines a novel sequence modeling approach for web clickstreams, that also considers multi-tab branching and backtracking actions across websites to capture the full action sequence of a user while browsing. All of this is done using machine learning on the client side to obtain a more comprehensive view and at the same time preserve privacy. We evaluate our formalism with a model trained on data collected in a user study with three different browsing tasks based on different human information seeking strategies from psychological literature. Our results show that the model can successfully distinguish between browsing behaviors and correctly predict future actions. A subsequent qualitative analysis identified five common web browsing patterns from our collected behavior data, which help to interpret the model. More generally, this illustrates the power of overparameterization in ML and offers a new way of modeling, reasoning with, and prediction of observable sequential human interaction behaviors.
翻译:为了深入了解用户兴趣并改进网站经验,对单个网站的点击流进行了数十年的研究,以了解用户兴趣并改进网站经验。本文件提出并研究了对网络点击流的一种新型序列建模方法,该方法还考虑到跨网站的多拖子分支和回溯跟踪行动,以捕捉用户的完整动作序列,同时浏览浏览。所有这一切都是利用客户方的机器学习来获得更全面的观点,同时保护隐私。我们用在用户研究中收集的数据模型来评估我们的形式主义,该模型经过培训,根据不同人类信息收集了三种不同的浏览任务,以寻求心理文献的战略。我们的结果表明,该模型可以成功地区分浏览行为和正确预测未来行动。随后的一项定性分析从我们收集的行为数据中找出了五种共同的网络浏览模式,这有助于解释模型。更一般地说,这说明了多计法在ML的功率,并提供了一种新的模式、推理和预测可观测到的人类相继互动行为的方法。