We leverage the Neural Tangent Kernel and its equivalence to training infinitely-wide neural networks to devise $\infty$-AE: an autoencoder with infinitely-wide bottleneck layers. The outcome is a highly expressive yet simplistic recommendation model with a single hyper-parameter and a closed-form solution. Leveraging $\infty$-AE's simplicity, we also develop Distill-CF for synthesizing tiny, high-fidelity data summaries which distill the most important knowledge from the extremely large and sparse user-item interaction matrix for efficient and accurate subsequent data-usage like model training, inference, architecture search, etc. This takes a data-centric approach to recommendation, where we aim to improve the quality of logged user-feedback data for subsequent modeling, independent of the learning algorithm. We particularly utilize the concept of differentiable Gumbel-sampling to handle the inherent data heterogeneity, sparsity, and semi-structuredness, while being scalable to datasets with hundreds of millions of user-item interactions. Both of our proposed approaches significantly outperform their respective state-of-the-art and when used together, we observe 96-105% of $\infty$-AE's performance on the full dataset with as little as 0.1% of the original dataset size, leading us to explore the counter-intuitive question: Is more data what you need for better recommendation?
翻译:我们利用Neoral Tangty$-AE及其等效来利用Neal Tangnit Kernel及其培训无限的神经网络来设计 $\ infty$-AE : 一个具有无限宽的瓶颈层的自动编码器。 其结果是一个高度直观但简单的建议模型, 只有一个超参数和一个封闭式的解决方案。 利用 $\ infty$- AE 的简单化, 我们还开发蒸馏- CFF, 用于合成微小的、 高纤维化的数据摘要, 这些摘要从极大和稀薄的用户项目互动矩阵中提取出最重要的知识, 用于高效和准确的后续数据使用, 如模型培训、 推断、 建筑搜索等。 这对建议采用了一种以数据为中心的以数据中心为中心的方法, 我们的目标是提高日落用户对数据质量的质量, 独立于学习算法, 我们特别使用不同的 Gumbel 抽样概念来处理我们固有的数据偏差性、 粘度和半结构化, 同时可以与数以千万美元计元计的原始数据 105 的直方位对比方法 。