We propose a method named Super Characters for sentiment classification. This method converts the sentiment classification problem into image classification problem by projecting texts into images and then applying CNN models for classification. Text features are extracted automatically from the generated Super Characters images, hence there is no need of any explicit step of embedding the words or characters into numerical vector representations. Experimental results on large social media corpus show that the Super Characters method consistently outperforms other methods for sentiment classification and topic classification tasks on ten large social media datasets of millions of contents in four different languages, including Chinese, Japanese, Korean and English.
翻译:我们建议一种名为超级字符的方法用于感官分类。 这种方法通过将文字投射成图像,然后应用CNN模式进行分类,将情绪分类问题转换成图像分类问题。 从生成的超级字符图像中自动提取文本特征,因此没有必要采取任何明确步骤将文字或字符嵌入数字矢量表达式。 大型社交媒体平台的实验结果表明,超级字符方法始终优于以四种不同语言,包括中文、日文、韩文和英文,对10个成百万内容的大型社交媒体数据集采用的其他情绪分类和专题分类方法。