We are addressing two fundamental problems in authorship verification (AV): Topic variability and miscalibration. Variations in the topic of two disputed texts are a major cause of error for most AV systems. In addition, it is observed that the underlying probability estimates produced by deep learning AV mechanisms oftentimes do not match the actual case counts in the respective training data. As such, probability estimates are poorly calibrated. We are expanding our framework from PAN 2020 to include Bayes factor scoring (BFS) and an uncertainty adaptation layer (UAL) to address both problems. Experiments with the 2020/21 PAN AV shared task data show that the proposed method significantly reduces sensitivities to topical variations and significantly improves the system's calibration.
翻译:我们处理的是作者核查的两个基本问题:主题变异性和误差,两个有争议的文本专题的变异性是大多数AV系统出错的主要原因,此外,据观察,深习AV机制经常产生的基本概率估计与相关培训数据的实际案件数不相符,因此,概率估计不甚精确,我们正在扩大我们的框架,从PAN 2020扩大到包括Bayes 系数评分(BFS)和不确定性适应层(United),以解决这两个问题。 与2020/21 PAN APAV共享任务数据进行的实验表明,拟议的方法大大降低了对专题变化的敏感度,大大改善了系统的校准。