In this paper, we present FoodChem, a new Relation Extraction (RE) model for identifying chemicals present in the composition of food entities, based on textual information provided in biomedical peer-reviewed scientific literature. The RE task is treated as a binary classification problem, aimed at identifying whether the contains relation exists between a food-chemical entity pair. This is accomplished by fine-tuning BERT, BioBERT and RoBERTa transformer models. For evaluation purposes, a novel dataset with annotated contains relations in food-chemical entity pairs is generated, in a golden and silver version. The models are integrated into a voting scheme in order to produce the silver version of the dataset which we use for augmenting the individual models, while the manually annotated golden version is used for their evaluation. Out of the three evaluated models, the BioBERT model achieves the best results, with a macro averaged F1 score of 0.902 in the unbalanced augmentation setting.
翻译:在本文中,我们介绍FoodChem(FoodChem),这是根据生物医学同行审查的科学文献提供的文本信息确定食品实体构成中存在的化学品的一种新的关系提取(RE)模式。RE的任务被视为一个二元分类问题,目的是确定食品化学实体对口之间是否存在包含的关系。这是通过微调BERT、BioBERT和RoBERTA变压器模型实现的。为了评估目的,以黄金和银版的形式生成了一个带有附加注释的新型数据集,其中包含食品化学实体对口关系。这些模型被纳入一个投票方案,以便产生我们用来扩大单个模型的银版数据集,而手工加注的黄金版用于评估它们。在三个经评估的模型中,生物生物生物-生物-生物伦理模型取得了最佳结果,在不平衡的加速环境下,其宏观平均F1分为0.902。