Information Retrieval evaluation has traditionally focused on defining principled ways of assessing the relevance of a ranked list of documents with respect to a query. Several methods extend this type of evaluation beyond relevance, making it possible to evaluate different aspects of a document ranking (e.g., relevance, usefulness, or credibility) using a single measure (multi-aspect evaluation). However, these methods either are (i) tailor-made for specific aspects and do not extend to other types or numbers of aspects, or (ii) have theoretical anomalies, e.g. assign maximum score to a ranking where all documents are labelled with the lowest grade with respect to all aspects (e.g., not relevant, not credible, etc.). We present a theoretically principled multi-aspect evaluation method that can be used for any number, and any type, of aspects. A thorough empirical evaluation using up to 5 aspects and a total of 425 runs officially submitted to 10 TREC tracks shows that our method is more discriminative than the state-of-the-art and overcomes theoretical limitations of the state-of-the-art.
翻译:信息检索评价历来侧重于确定评估与查询有关的排名文件清单相关性的原则性方法; 几种方法将这类评价范围扩大到相关性以外,从而有可能使用单一措施(多层评价)评价文件排名的不同方面(例如相关性、有用性或可信度); 然而,这些方法要么是:(一) 针对具体方面量身定做的,不扩大到其他类型或数量,要么是(二) 有理论反常现象,例如,将最高分分配给在所有方面(例如,不相关、不可信等)所有文件都标为最低等级的排名; 我们提出了一个理论性原则性多层次评价方法,可用于任何数量和任何类型的方面; 利用最多5个方面的彻底经验评价,总共425个正式提交至10个TREC轨道的经验性评价表明,我们的方法比最新技术和克服了最新理论限制的方法更具歧视性。