We introduce Hyperbard, a dataset of diverse relational data representations derived from Shakespeare's plays. Our representations range from simple graphs capturing character co-occurrence in single scenes to hypergraphs encoding complex communication settings and character contributions as hyperedges with edge-specific node weights. By making multiple intuitive representations readily available for experimentation, we facilitate rigorous representation robustness checks in graph learning, graph mining, and network analysis, highlighting the advantages and drawbacks of specific representations. Leveraging the data released in Hyperbard, we demonstrate that many solutions to popular graph mining problems are highly dependent on the representation choice, thus calling current graph curation practices into question. As an homage to our data source, and asserting that science can also be art, we present all our points in the form of a play.
翻译:我们引入了Hyperbard,这是一套来自莎士比亚剧中不同关系数据表达方式的数据集。 我们的表述方式多种多样,从简单图形捕捉字符在单一场景中共同出现,到高数据将复杂的通信设置和字符贡献编码为具有边际节点重量的高级屏障。 我们通过为实验提供多种直观的表达方式,促进在图形学习、图解开采和网络分析中进行严格的代表性强度检查,突出具体表达方式的优缺点。 利用在Hyperbard中发布的数据,我们证明对流行图形采矿问题的许多解决方案高度依赖代表性选择,从而质疑当前图表整理做法。 作为对数据源的一种敬意,我们主张科学也可以是艺术,我们以游戏的形式展示了我们所有的观点。