We propose a novel framework to understand the text by converting sentences or articles into video-like 3-dimensional tensors. Each frame, corresponding to a slice of the tensor, is a word image that is rendered by the word's shape. The length of the tensor equals to the number of words in the sentence or article. The proposed transformation from the text to a 3-dimensional tensor makes it very convenient to implement an $n$-gram model with convolutional neural networks for text analysis. Concretely, we impose a 3-dimensional convolutional kernel on the 3-dimensional text tensor. The first two dimensions of the convolutional kernel size equal the size of the word image and the last dimension of the kernel size is $n$. That is, every time when we slide the 3-dimensional kernel over a word sequence, the convolution covers $n$ word images and outputs a scalar. By iterating this process continuously for each $n$-gram along with the sentence or article with multiple kernels, we obtain a 2-dimensional feature map. A subsequent 1-dimensional max-over-time pooling is applied to this feature map, and three fully-connected layers are used for conducting text classification finally. Experiments of several text classification datasets demonstrate surprisingly superior performances using the proposed model in comparison with existing methods.
翻译:我们建议了一个理解文本的新框架, 将句子或文章转换成像像视频的三维 Excors 。 每个框架, 对应于一个粒子的片段, 是一个由单词形状提供的单词图像。 抗拉的长度等于句子或文章中的字数。 提议的从文本转换为三维 Exorsor, 使得在文本分析的动态神经网络中执行一个以美元为单位的字形图案模型非常方便。 具体地说, 我们在三维文本 Excor 上设置了一个三维共变内核内核。 革命内核大小的前两个维度相当于单词图像的大小和内核内核的最后一个维维。 也就是说, 每次我们将三维内核的图像滑动成一个字序列, 共解包含一美元字形图像和输出一个缩略图。 通过对每张一美元和多内核的句或文章不断插入这个模型, 我们得到了一个二维的地格特征图。 一个后一维的内核化内核的图像比较方法, 将用来演示数级的高级的高级文本分类 。 。 。