We present a new Icelandic-English parallel corpus, the Icelandic Parallel Abstracts Corpus (IPAC), composed of abstracts from student theses and dissertations. The texts were collected from the Skemman repository which keeps records of all theses, dissertations and final projects from students at Icelandic universities. The corpus was aligned based on sentence-level BLEU scores, in both translation directions, from NMT models using Bleualign. The result is a corpus of 64k sentence pairs from over 6 thousand parallel abstracts.
翻译:我们提出了冰岛-英国新的平行材料,冰岛平行摘要公司(IPAC),由来自学生论文和论文的摘要组成,从Skemman仓库收集了文本,储存冰岛大学学生的所有论文、论文和最后项目的记录,该材料根据来自使用Bleualign的NMT模型的BLEU两个翻译方向的BLEU判决分数加以调整,其结果是6 000多个平行摘要的64k句配对。