Deep learning techniques have achieved great success in many fields, while at the same time deep learning models are getting more complex and expensive to compute. It severely hinders the wide applications of these models. In order to alleviate this problem, model distillation emerges as an effective means to compress a large model into a smaller one without a significant drop in accuracy. In this paper, we study a related but orthogonal issue, data distillation, which aims to distill the knowledge from a large training dataset down to a smaller and synthetic one. It has the potential to address the large and growing neural network training problem based on the small dataset. We develop a novel data distillation method for text classification. We evaluate our method on eight benchmark datasets. The results that the distilled data with the size of 0.1% of the original text data achieves approximately 90% performance of the original is rather impressive.
翻译:深层学习技术在许多领域取得了巨大成功, 同时深层学习模型的计算越来越复杂和昂贵。 这严重阻碍了这些模型的广泛应用。 为了缓解这一问题, 模型蒸馏作为一种有效手段, 将一个大型模型压缩成一个小模型, 而没有显著的精确度下降。 在本文中, 我们研究一个相关但垂直的问题, 即数据蒸馏, 目的是从一个大型培训数据集中提取知识到一个小的和合成的数据。 它有可能解决基于小数据集的大规模和不断增长的神经网络培训问题。 我们为文本分类开发了一种新的数据蒸馏方法。 我们评估了八个基准数据集的方法。 原始文本数据的0.1%大小的蒸馏数据取得了大约90%的绩效, 其结果相当令人印象深刻。