Transformer encoding networks have been proved to be a powerful tool of understanding natural languages. They are playing a critical role in native ads service, which facilitates the recommendation of appropriate ads based on user's web browsing history. For the sake of efficient recommendation, conventional methods would generate user and advertisement embeddings independently with a siamese transformer encoder, such that approximate nearest neighbour search (ANN) can be leveraged. Given that the underlying semantic about user and ad can be complicated, such independently generated embeddings are prone to information loss, which leads to inferior recommendation quality. Although another encoding strategy, the cross encoder, can be much more accurate, it will lead to huge running cost and become infeasible for realtime services, like native ads recommendation. In this work, we propose hybrid encoder, which makes efficient and precise native ads recommendation through two consecutive steps: retrieval and ranking. In the retrieval step, user and ad are encoded with a siamese component, which enables relevant candidates to be retrieved via ANN search. In the ranking step, it further represents each ad with disentangled embeddings and each user with ad-related embeddings, which contributes to the fine-grained selection of high-quality ads from the candidate set. Both steps are light-weighted, thanks to the pre-computed and cached intermedia results. To optimize the hybrid encoder's performance in this two-stage workflow, a progressive training pipeline is developed, which builds up the model's capability in the retrieval and ranking task step-by-step. The hybrid encoder's effectiveness is experimentally verified: with very little additional cost, it outperforms the siamese encoder significantly and achieves comparable recommendation quality as the cross encoder.
翻译:变式编码网络已被证明是理解自然语言的强大工具。 它们正在本地广告服务中扮演着关键角色, 它有助于根据用户的网络浏览历史推荐适当的广告。 为了高效的建议, 常规方法将产生用户和广告独立嵌入一个 saimese 变式变异器编码器, 这样可以利用近邻搜索( ANNN) 。 鉴于用户和广告的基本语义可能比较复杂, 独立生成的嵌入会导致信息流失, 从而导致建议质量低下。 虽然另一个编码战略, 交叉编码器可以更精确得多, 但它会导致巨大的运行成本, 并且无法实时服务( 如本地addads建议) 。 在这项工作中, 我们提出混合的编码器, 通过两个连续步骤( 检索和排序), 用户和广告的编码会通过一个精密的精密的精密的精密精密的精密精密的精密精密精密精密精密程序, 使得相关的候选人可以通过 ANNNNE 搜索来进行检索, 。 在排序步骤中, 它进一步代表每部精密的精密的精密的精密的精密的精密的精密的精密的精密的精密的精密的精密的精密的精密的精密的精密的精密的精选方法, 将精密的精密的精密的精密的精细的精准的精准的精密的精准的精细的精细的精细的精准的精密的精准的精密的精密的精密的精密的精密的精细的精密的精细的精准的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细