In recent years, there has been significant interest in understanding users' online content consumption patterns. But, the unstructured, high-dimensional, and dynamic nature of such data makes extracting valuable insights challenging. Here we propose a model that combines the simplicity of matrix factorization with the flexibility of neural networks to efficiently extract nonlinear patterns from massive text data collections relevant to consumers' online consumption patterns. Our model decomposes a user's content consumption journey into nonlinear user and content factors that are used to model their dynamic interests. This natural decomposition allows us to summarize each user's content consumption journey with a dynamic probabilistic weighting over a set of underlying content attributes. The model is fast to estimate, easy to interpret and can harness external data sources as an empirical prior. These advantages make our method well suited to the challenges posed by modern datasets. We use our model to understand the dynamic news consumption interests of Boston Globe readers over five years. Thorough qualitative studies, including a crowdsourced evaluation, highlight our model's ability to accurately identify nuanced and coherent consumption patterns. These results are supported by our model's superior and robust predictive performance over several competitive baseline methods.
翻译:近年来,人们对了解用户的在线内容消费模式非常感兴趣。 但是,这些数据的非结构化、高维和动态性质使得这种数据具有富有价值的洞察力。 我们在这里提出了一个模型,将矩阵因素的简单化与神经网络的灵活性结合起来,以便有效地从与消费者在线消费模式有关的大量文本数据收集中提取非线性模式。 我们的模型分解了用户的内容消费旅程,进入非线性用户和用于模拟其动态兴趣的内容因素。 这种自然分解使我们能够以动态概率加权来总结每个用户的内容消费行程,对一组基本内容属性进行动态的概率加权。 该模型快速地估算,易于解释,并能够将外部数据来源作为经验性先行加以利用。 这些优势使我们的方法非常适合现代数据集带来的挑战。 我们使用我们的模型来理解波士顿环球读者五年来对动态新闻消费的兴趣。 索罗夫式质量研究,包括众源评估,突显了我们模型准确识别细化和连贯消费模式的能力。 这些结果得到我们模型优劣和稳健的预测性基线方法的支持。