Healthcare predictive analytics aids medical decision-making, diagnosis prediction and drug review analysis. Therefore, prediction accuracy is an important criteria which also necessitates robust predictive language models. However, the models using deep learning have been proven vulnerable towards insignificantly perturbed input instances which are less likely to be misclassified by humans. Recent efforts of generating adversaries using rule-based synonyms and BERT-MLMs have been witnessed in general domain, but the ever increasing biomedical literature poses unique challenges. We propose BBAEG (Biomedical BERT-based Adversarial Example Generation), a black-box attack algorithm for biomedical text classification, leveraging the strengths of both domain-specific synonym replacement for biomedical named entities and BERTMLM predictions, spelling variation and number replacement. Through automatic and human evaluation on two datasets, we demonstrate that BBAEG performs stronger attack with better language fluency, semantic coherence as compared to prior work.
翻译:因此,预测准确性是一个重要的标准,同时也需要强有力的预测语言模型。然而,使用深层学习的模型被证明容易成为微不足道的干扰性输入案例,这些案例不太可能被人类错误地分类。最近利用基于规则的同义词和BERT-MLMs生成对手的努力在一般领域得到了见证,但生物医学文献的不断增加带来了独特的挑战。我们提议BBAEG(基于BERT的BEB-Aversarial 样板生成)、生物医学文本分类的黑盒攻击算法(BBBEGEG)、生物医学文本分类黑盒攻击算法(BERTMM)和BERTMLMM预测、拼法变异和数字替换的优势。我们通过对两个数据集的自动和人评价,证明BBAEG在语言流度、语义一致性与先前工作相比更加强。