Stock market movements are influenced by public and private information shared through news articles, company reports, and social media discussions. Analyzing these vast sources of data can give market participants an edge to make profit. However, the majority of the studies in the literature are based on traditional approaches that come short in analyzing unstructured, vast textual data. In this study, we provide a review on the immense amount of existing literature of text-based stock market analysis. We present input data types and cover main textual data sources and variations. Feature representation techniques are then presented. Then, we cover the analysis techniques and create a taxonomy of the main stock market forecast models. Importantly, we discuss representative work in each category of the taxonomy, analyzing their respective contributions. Finally, this paper shows the findings on unaddressed open problems and gives suggestions for future work. The aim of this study is to survey the main stock market analysis models, text representation techniques for financial market prediction, shortcomings of existing techniques, and propose promising directions for future research.
翻译:分析这些广泛的数据来源可使市场参与者有赢利的优势。然而,文献中的大多数研究都基于传统方法,这些方法在分析无结构的、庞大的文本数据方面是短短的。在本研究中,我们审查了大量现有的基于文本的股票市场分析文献。我们提供了输入数据类型,并覆盖了主要文本数据来源和变化。然后介绍了特质代表技术。然后,我们介绍了分析技术,创造了主要股票市场预测模型的分类。重要的是,我们讨论了每一类分类法中的代表性工作,分析了它们各自的贡献。最后,本文介绍了关于未解决的公开问题的调查结果,并为今后的工作提出了建议。本研究的目的是调查主要的股票市场分析模型、金融市场预测的文本代表技术、现有技术的缺陷,并为今后的研究提出有希望的方向。