Various computing and data resources on the Web are being enhanced with machine-interpretable semantic descriptions to facilitate better search, discovery and integration. This interconnected metadata constitutes the Semantic Web, whose volume can potentially grow the scale of the Web. Efficient management of Semantic Web data, expressed using the W3C's Resource Description Framework (RDF), is crucial for supporting new data-intensive, semantics-enabled applications. In this work, we study and compare two approaches to distributed RDF data management based on emerging cloud computing technologies and traditional relational database clustering technologies. In particular, we design distributed RDF data storage and querying schemes for HBase and MySQL Cluster and conduct an empirical comparison of these approaches on a cluster of commodity machines using datasets and queries from the Third Provenance Challenge and Lehigh University Benchmark. Our study reveals interesting patterns in query evaluation, shows that our algorithms are promising, and suggests that cloud computing has a great potential for scalable Semantic Web data management.
翻译:网络上的各种计算和数据资源正在通过机器解释的语义描述得到加强,以促进更好的搜索、发现和整合。这种相互关联的元数据构成语义网络,其数量有可能扩大网络的规模。使用W3C资源描述框架(RDF)表达的语义网络数据有效管理对于支持新的数据密集、以语义辅助的应用至关重要。在这项工作中,我们研究并比较两种基于新兴云计算技术和传统关系数据库集群技术的分布式RDF数据管理方法。特别是,我们设计了HBase和MySQL集群的RDF数据存储和查询计划,并利用第三次预测挑战与Lehigh大学基准的数据集和查询,对这些商品机器集群的这些方法进行实验性比较。我们的研究揭示了查询评估中有趣的模式,表明我们的算法很有希望,并表明云计算对于可缩放的Smantic网络数据管理具有巨大的潜力。