How can we predict missing values in multi-dimensional data (or tensors) more accurately? The task of tensor completion is crucial in many applications such as personalized recommendation, image and video restoration, and link prediction in social networks. Many tensor factorization and neural network-based tensor completion algorithms have been developed to predict missing entries in partially observed tensors. However, they can produce inaccurate estimations as real-world tensors are very sparse, and these methods tend to overfit on the small amount of data. Here, we overcome these shortcomings by presenting a data augmentation technique for tensors. In this paper, we propose DAIN, a general data augmentation framework that enhances the prediction accuracy of neural tensor completion methods. Specifically, DAIN first trains a neural model and finds tensor cell importances with influence functions. After that, DAIN aggregates the cell importance to calculate the importance of each entity (i.e., an index of a dimension). Finally, DAIN augments the tensor by weighted sampling of entity importances and a value predictor. Extensive experimental results show that DAIN outperforms all data augmentation baselines in terms of enhancing imputation accuracy of neural tensor completion on four diverse real-world tensors. Ablation studies of DAIN substantiate the effectiveness of each component of DAIN. Furthermore, we show that DAIN scales near linearly to large datasets.
翻译:如何更准确地预测多维数据( 或 Exors) 中缺失的值? 我们如何能更准确地预测多维数据( 或 Exrons) 中缺失的值呢? 智能完成的任务在许多应用中至关重要, 如个化建议、 图像和视频恢复, 以及社交网络中连接预测。 已经开发了许多基于 Exronic 和 神经网络的 Exor 完成算法, 以预测部分观测的 Exrons 中的缺失条目。 但是, 它们可以产生不准确的估算, 因为真实世界的 Exrons非常稀少, 而这些方法往往过度适应于少量的数据。 在这里, 我们通过为 Exors 提供一种数据增强技术来克服这些缺陷。 在本文中, 我们建议 DAIN, 一个普通的数据增强框架, 提高神经的预测准确性模型, 并找到具有影响力功能的 Exmocal 细胞重要性。 之后, DAINA 将计算每个实体的重要性( 即一个维度指数) 。 最后, DAIN 通过对实体重要性进行加权抽样取样和值预测, 广泛实验结果显示 DAMAILEAR 的接近所有数据放大基线, 。