Large-scale commercial platforms usually involve numerous business domains for diverse business strategies and expect their recommendation systems to provide click-through rate (CTR) predictions for multiple domains simultaneously. Existing promising and widely-used multi-domain models discover domain relationships by explicitly constructing domain-specific networks, but the computation and memory boost significantly with the increase of domains. To reduce computational complexity, manually grouping domains with particular business strategies is common in industrial applications. However, this pre-defined data partitioning way heavily relies on prior knowledge, and it may neglect the underlying data distribution of each domain, hence limiting the model's representation capability. Regarding the above issues, we propose an elegant and flexible multi-distribution modeling paradigm, named Adaptive Distribution Hierarchical Model (AdaptDHM), which is an end-to-end optimization hierarchical structure consisting of a clustering process and classification process. Specifically, we design a distribution adaptation module with a customized dynamic routing mechanism. Instead of introducing prior knowledge for pre-defined data allocation, this routing algorithm adaptively provides a distribution coefficient for each sample to determine which cluster it belongs to. Each cluster corresponds to a particular distribution so that the model can sufficiently capture the commonalities and distinctions between these distinct clusters. Extensive experiments on both public and large-scale Alibaba industrial datasets verify the effectiveness and efficiency of AdaptDHM: Our model achieves impressive prediction accuracy and its time cost during the training stage is more than 50% less than that of other models.
翻译:大型商业平台通常涉及多种商业战略的众多商业领域,并期望其建议系统同时为多个领域提供点击率预测。现有有希望和广泛使用的多域模型通过明确建设特定域网络发现域际关系,但计算和记忆随着域的增加而大大增强。为降低计算复杂性,在工业应用中,以特定业务战略手工分组领域是常见的。然而,这种预先界定的数据分割方式严重依赖先前的知识,可能忽视每个域的基本数据分布,从而限制模型的代表性能力。关于上述问题,我们提议一个优雅和灵活的多分配模式模式,称为适应性分配等级模型(AdaptDHM),这是一个端到端优化的等级结构,由组合过程和分类过程组成。具体地说,我们设计了一个配有定制的动态路由机制的配送模块。这种路由算法在适应性上引入了先前的知识,为确定每个域组的分布系数提供了一个系数。关于每个抽样组的组合和灵活的多分配模式,每个集群都对应一个特殊的、灵活的多分配模式,即50个特定的分配,因此,在模型和模型期间,能够很好地核查这些模型和最有不同程度的数据组合之间的比例。