Improvements of entity-relationship (E-R) search techniques have been hampered by a lack of test collections, particularly for complex queries involving multiple entities and relationships. In this paper we describe a method for generating E-R test queries to support comprehensive E-R search experiments. Queries and relevance judgments are created from content that exists in a tabular form where columns represent entity types and the table structure implies one or more relationships among the entities. Editorial work involves creating natural language queries based on relationships represented by the entries in the table. We have publicly released the RELink test collection comprising 600 queries and relevance judgments obtained from a sample of Wikipedia List-of-lists-of-lists tables. The latter comprise tuples of entities that are extracted from columns and labelled by corresponding entity types and relationships they represent. In order to facilitate research in complex E-R retrieval, we have created and released as open source the RELink Framework that includes Apache Lucene indexing and search specifically tailored to E-R retrieval. RELink includes entity and relationship indexing based on the ClueWeb-09-B Web collection with FACC1 text span annotations linked to Wikipedia entities. With ready to use search resources and a comprehensive test collection, we support community in pursuing E-R research at scale.
翻译:由于缺乏测试收集,特别是涉及多个实体和关系的复杂查询,实体关系搜索技术的改进受到了阻碍。本文我们描述了产生E-R测试查询的方法,以支持全面的E-R搜索实验。查询和相关性判断来自以表格形式存在的内容,该表格中各列代表实体类型,表格结构意味着各实体之间的一种或多种关系。编辑工作涉及根据表格条目所代表的关系建立自然语言查询。我们公开公布了RELink测试集合,其中包括600个查询和从一个样本的维基百科列表列表列表列表表格中获取的相关判断。后者包括从各栏中提取的实体图象,并按相应实体类型和关系标出标签。为了便利复杂的E-R检索研究,我们创建并发布了RELink框架,其中包括阿帕奇·卢肯内索引和专门为电子-R检索量身定制的检索。RELink包括基于CueWeb-09B网络收藏的实体和关系索引,与FACCR1网站的文本覆盖了与维基单位研究规模相关的说明。我们正在使用全面的搜索和测试资源,在网上进行检索。