Click-Through Rate prediction aims to predict the ratio of clicks to impressions of a specific link. This is a challenging task since (1) there are usually categorical features, and the inputs will be extremely high-dimensional if one-hot encoding is applied, (2) not only the original features but also their interactions are important, (3) an effective prediction may rely on different features and interactions in different time periods. To overcome these difficulties, we propose a new interaction detection method, named Online Random Intersection Chains. The method, which is based on the idea of frequent itemset mining, detects informative interactions by observing the intersections of randomly chosen samples. The discovered interactions enjoy high interpretability as they can be comprehended as logical expressions. ORIC can be updated every time new data is collected, without being retrained on historical data. What's more, the importance of the historical and latest data can be controlled by a tuning parameter. A framework is designed to deal with the streaming interactions, so almost all existing models for CTR prediction can be applied after interaction detection. Empirical results demonstrate the efficiency and effectiveness of ORIC on three benchmark datasets.
翻译:点击浏览率预测旨在预测点击率与特定链接的印象的比率。 这是一项具有挑战性的任务, 因为 (1) 通常有绝对性特征, 如果应用一热编码, 输入将是极高的维度, (2) 不仅原始特征, 而且它们之间的相互作用是重要的, (3) 一个有效的预测可能在不同的时间段依赖不同的特征和相互作用。 为了克服这些困难, 我们提议了一个新的互动检测方法, 名为在线随机交叉链。 这个方法基于经常的物品集采, 通过观察随机选择样品的交叉点, 检测信息性互动。 所发现的互动具有很高的可解释性, 因为它们可以被理解为逻辑表达方式。 每当收集新数据时, 都可以更新 ORIC, 而不对历史数据进行再培训 。 更重要的是, 历史和最新数据的重要性可以由调控参数来控制。 一个框架旨在处理流动的相互作用, 因此在互动检测后, 几乎所有现有的CTR 预测模型都可以应用。 经验性结果显示 ORIC 3 基准数据集的效率和效力 。