In this paper we propose a new parallel architecture based on Big Data technologies for real-time sentiment analysis on microblogging posts. Polypus is a modular framework that provides the following functionalities: (1) massive text extraction from Twitter, (2) distributed non-relational storage optimized for time range queries, (3) memory-based intermodule buffering, (4) real-time sentiment classification, (5) near real-time keyword sentiment aggregation in time series, (6) a HTTP API to interact with the Polypus cluster and (7) a web interface to analyze results visually. The whole architecture is self-deployable and based on Docker containers.
翻译:在本文中,我们提议建立一个基于大数据技术的新的平行架构,用于对微博站进行实时情绪分析。多面体是一个模块化框架,提供以下功能:(1) 从Twitter上大量提取文本,(2) 为时间范围查询优化的分布式非关系存储,(3) 基于记忆的模块间缓冲,(4)实时情绪分类,(5) 时间序列中近实时关键词感应汇总,(6) 与多面体群集互动的 HTTP API, 以及(7) 一个对结果进行直观分析的网络界面。整个架构是可自我操作的,并以多克集装箱为基础。