Modern data science applications increasingly use heterogeneous data sources and analytics. This has led to growing interest in polystore systems, especially analytical polystores. In this work, we focus on emerging multi-data model analytics workloads over social media data that fluidly straddle relational, graph, and text analytics. Instead of a generic polystore, we build a "tri-store" system that is more aware of the underlying data models to better optimize execution to improve scalability and runtime efficiency. We name our system AWESOME (Analytics WorkbEnch for SOcial MEdia). It features a powerful domain-specific language named ADIL. ADIL builds on top of underlying query engines (e.g., SQL and Cypher) and features native data types for succinctly specifying cross-engine queries and NLP operations, as well as automatic in-memory and query optimizations. Using real-world tri-model analytical workloads and datasets, we empirically demonstrate the functionalities of AWESOME for scalable data science over social media data and evaluate its efficiency.
翻译:现代数据科学应用越来越多地使用多种数据来源和分析。 这导致人们对多层层系统,特别是分析性多层系统的兴趣日益浓厚。 在这项工作中,我们侧重于对流分层关系、图形和文本分析的社交媒体数据新出现的多数据模型分析工作量。 我们不是建立通用的多层,而是建立一个“三层”系统,该系统更了解基本数据模型,以更好地优化执行,提高可缩放性和运行时间效率。 我们命名了我们的系统AWESOME(SOMIA 分析性工作班奇 ) 。 它有强大的域域名 ADIL。 ADIL在基本查询引擎(如SQL和Cypher)的顶端建立, 并具有本地数据类型, 用于简明地指定跨引擎查询和NLP操作, 以及自动的内模和查询优化。 我们使用真实世界的三角模型分析工作量和数据集, 实证了AWESOME在社会媒体上可扩展数据科学的功能并评估其效率。