Overfitting has long been considered a common issue to large neural network models in sequential recommendation. In our study, an interesting phenomenon is observed that overfitting is temporary. When the model scale is increased, the trend of the performance firstly ascends, then descends (i.e., overfitting) and finally ascends again, which is named as double ascent in this paper. We therefore raise an assumption that a considerably larger model will generalise better with a higher performance. In an extreme case to infinite-width, performance is expected to reach the limit of this specific structure. Unfortunately, it is impractical to directly build a huge model due to the limit of resources. In this paper, we propose the Overparameterised Recommender (OverRec), which utilises a recurrent neural tangent kernel (RNTK) as a similarity measurement for user sequences to successfully bypass the restriction of hardware for huge models. We further prove that the RNTK for the tied input-output embeddings in recommendation is the same as the RNTK for general untied input-output embeddings, which makes RNTK theoretically suitable for recommendation. Since the RNTK is analytically derived, OverRec does not require any training, avoiding physically building the huge model. Extensive experiments are conducted on four datasets, which verifies the state-of-the-art performance of OverRec.
翻译:长期以来,人们一直认为与大型神经网络模型相适应是一个常见的问题。在我们的研究中,一个有趣的现象是,超配是暂时的。当模型规模增加时,表现的倾向是首先上升,然后下降(即超装),最后再次上升,本文中将其命名为双倍上升。因此,我们提出一个假设,即一个大得多的模型将随着更高的性能而更好地概括。在无限宽度的极端情况下,预期性能将达到这一特定结构的极限。不幸的是,由于资源有限,直接建立一个巨大的模型是不切实际的。在本文件中,我们建议使用多分计建议器(OverRec),它使用一个经常性的神经切线内核(RNTK),作为用户序列的类似度量度测量,以成功绕过对巨型模型硬件的限制。我们进一步证明,在建议中捆绑的输入-输出嵌入的RNTK与RNTK相同, 用于一般不连续输入输入嵌入的RNTK是不切实际的,使得RNTK在理论上进行不适宜于任何巨型的测试。