Finite mixture models have been used for unsupervised learning for some time, and their use within the semi-supervised paradigm is becoming more commonplace. Clickstream data is one of the various emerging data types that demands particular attention because there is a notable paucity of statistical learning approaches currently available. A mixture of first-order continuous time Markov models is introduced for unsupervised and semi-supervised learning of clickstream data. This approach assumes continuous time, which distinguishes it from existing mixture model-based approaches; practically, this allows account to be taken of the amount of time each user spends on each webpage. The approach is evaluated, and compared to the discrete time approach, using simulated and real data.
翻译:一段时间以来,在不受监督的学习中使用了极量混合模型,在半受监督的范式内使用这些模型的情况越来越普遍。点击流数据是需要特别注意的新兴数据类型之一,因为目前可用的统计学习方法明显缺乏。引入了一组第一级连续时间模型,用于在不受监督和半监督的情况下学习点击流数据。这一方法假设持续时间,将其与现有的基于混合模式的方法区分开来;实际上,这考虑到每个用户在每个网页上花费的时间。该方法经过评估,并与使用模拟和真实数据的离散时间方法进行比较。