In recent years, with the advent of highly scalable artificial-neural-network-based text representation methods the field of natural language processing has seen unprecedented growth and sophistication. It has become possible to distill complex linguistic information of text into multidimensional dense numeric vectors with the use of the distributional hypothesis. As a consequence, text representation methods have been evolving at such a quick pace that the research community is struggling to retain knowledge of the methods and their interrelations. We contribute threefold to this lack of compilation, composition, and systematization by providing a survey of current approaches, by arranging them in a genealogy, and by conceptualizing a taxonomy of text representation methods to examine and explain the state-of-the-art. Our research is a valuable guide and reference for artificial intelligence researchers and practitioners interested in natural language processing applications such as recommender systems, chatbots, and sentiment analysis.
翻译:近年来,随着高度可伸缩的人工-神经-网络文本代表方法的出现,自然语言处理领域出现了前所未有的增长和复杂程度,利用分布假设将复杂的文字语言信息提炼成多维密集的数字矢量,因此,文本代表方法以如此快的速度发展,使研究界难以保留有关方法及其相互关系的知识,我们对缺乏汇编、组成和系统化作出了三重贡献,我们通过对当前方法进行调查、将其安排在基因学中、对文本代表方法进行概念化以审查和解释最新艺术的分类方法进行分类,我们的研究为对建议系统、聊天机和情绪分析等自然语言处理应用感兴趣的人工智能研究人员和从业人员提供了宝贵的指南和参考。