Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient algorithms for NLP tasks. In particular, it investigates distributed representations of n-gram statistics of texts. The representations are formed using hyperdimensional computing enabled embedding. These representations then serve as features, which are used as input to standard classifiers. We investigate the applicability of the embedding on one large and three small standard datasets for classification tasks using nine classifiers. The embedding achieved on par F1 scores while decreasing the time and memory requirements by several times compared to the conventional n-gram statistics, e.g., for one of the classifiers on a small dataset, the memory reduction was 6.18 times; while train and test speed-ups were 4.62 and 3.84 times, respectively. For many classifiers on the large dataset, memory reduction was ca. 100 times and train and test speed-ups were over 100 times. Importantly, the usage of distributed representations formed via hyperdimensional computing allows dissecting strict dependency between the dimensionality of the representation and n-gram size, thus, opening a room for tradeoffs.
翻译:深层学习最近的进展导致若干NLP任务的业绩大幅提高,然而,模型在计算上要求越来越高,因此,本文处理的是NLP任务计算效率算法的领域,特别是调查正克统计的分布式表示,这些表示是使用超维计算促成嵌入的嵌入方式形成的。这些表示作为特征,然后用作标准分类器的投入。我们用9个分类器调查将一个和三个小标准数据集嵌入一个和三个标准数据集用于分类任务的适用性。在F1分上实现的嵌入,而与传统的正克统计相比,时间和内存要求减少数倍,例如,对于一个小数据集的分类师来说,记忆减少6.18倍;而培训和测试加速分别是4.62和3.84倍。对于大型数据集的许多分类器来说,记忆减少100倍,培训和测试加速超过100倍。使用通过超维度计算得出的分布式表示法,从而可以使全方位代表面和正方位之间出现严格的依赖性。