Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore developing a new end-to-end approach for graph data management and analysis based on the Hadoop ecosystem, called Gradoop (Graph analytics on Hadoop). Gradoop is designed around the so-called Extended Property Graph Data Model (EPGM) supporting semantically rich, schema-free graph data within many distinct graphs. A set of high-level operators is provided for analyzing both single graphs and collections of graphs. Based on these operators, we propose a domain-specific language to define analytical workflows. The Gradoop graph store is currently utilizing HBase for distributed storage of graph data in Hadoop clusters. An initial version of Gradoop has been used to analyze graph data for business intelligence and social network analysis.
翻译:在商业和科学中,许多大数据应用都需要对大量图表数据进行管理和分析。以往的图形分析方法,如图形数据库和平行图形处理系统(例如Pregel),要么缺乏足够的可缩放性或灵活性和直观性。因此,我们正在根据Hadoop生态系统(称为Gradoop(Hadoop上的大地分析仪)),为图表数据管理和分析开发一种新的端对端办法。Gradoop是围绕着所谓的扩展属性图数据模型(EPGM)设计的,该模型支持许多不同图表中的精密、无图解的数据。提供了一套高级操作员,用于分析单张图表和图集。根据这些操作员,我们建议了一种特定域语言来界定分析工作流程。Graddoop图库目前利用Hadoop图库在Hadoop集群中分布式存储的图形数据。Gradoop的初始版本用于分析用于商业情报和社会网络分析的图形数据。