Recent advancement of large-scale pretrained models such as BERT, GPT-3, CLIP, and Gopher, has shown astonishing achievements across various task domains. Unlike vision recognition and language models, studies on general-purpose user representation at scale still remain underexplored. Here we explore the possibility of general-purpose user representation learning by training a universal user encoder at large scales. We demonstrate that the scaling law is present in user representation learning areas, where the training error scales as a power-law with the amount of computation. Our Contrastive Learning User Encoder (CLUE), optimizes task-agnostic objectives, and the resulting user embeddings stretch our expectation of what is possible to do in various downstream tasks. CLUE also shows great transferability to other domains and companies, as performances on an online experiment shows significant improvements in Click-Through-Rate (CTR). Furthermore, we also investigate how the model performance is influenced by the scale factors, such as training data size, model capacity, sequence length, and batch size. Finally, we discuss the broader impacts of CLUE in general.
翻译:最近对诸如BERT、GPT-3、CLIP和Gopher等大规模预先培训的模型的进步在各种任务领域都显示出惊人的成就。与愿景识别和语言模型不同,关于一般用途用户规模代表性的研究仍然未得到充分探讨。在这里,我们探索了通过培训大规模通用用户编码器进行普通用途用户代表性学习的可能性。我们证明,在用户代表性学习领域存在比例法,培训错误尺度作为计算量的功率法。我们的竞争学习用户编码器(CLUE)优化了任务-不可知性目标,由此产生的用户嵌入扩大了我们对各种下游任务中可能的预期。CLUE还显示了向其它领域和公司的巨大可转移性,因为在线实验的绩效显示点击-Trough-Rate(CTR)的显著改进。此外,我们还调查模型的性能如何受到规模因素的影响,例如培训数据大小、模型容量、序列长度和批量大小。最后,我们讨论了CLUE的总体影响。