As one of the most powerful topic models, Latent Dirichlet Allocation (LDA) has been used in a vast range of tasks, including document understanding, information retrieval and peer-reviewer assignment. Despite its tremendous popularity, the security of LDA has rarely been studied. This poses severe risks to security-critical tasks such as sentiment analysis and peer-reviewer assignment that are based on LDA. In this paper, we are interested in knowing whether LDA models are vulnerable to adversarial perturbations of benign document examples during inference time. We formalize the evasion attack to LDA models as an optimization problem and prove it to be NP-hard. We then propose a novel and efficient algorithm, EvaLDA to solve it. We show the effectiveness of EvaLDA via extensive empirical evaluations. For instance, in the NIPS dataset, EvaLDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our work provides significant insights into the power and limitations of evasion attacks to LDA models.
翻译:作为最有力的专题模型之一,Lentant Dirichlet分配(LDA)被应用于一系列广泛的任务,包括文件理解、信息检索和同行评审任务。尽管LDA受到极大欢迎,但它的安全却很少受到研究。这对基于LDA的情绪分析和同行评审任务等至关重要的安全任务构成严重风险。在本文中,我们想知道LDA模型在推断期间是否易受良性文件示例的对立干扰。我们把逃避攻击LDA模型作为优化问题正式确定下来,并证明它是硬性的。我们然后提出一种新颖而有效的算法,EVALDA来解决它。我们通过广泛的实证评估展示EVALDA的有效性。例如,在NIPS数据集中,EVALDA可以平均提高目标专题的级别,从10到7个左右,在受害者文件中仅用类似词取代1%的词句。我们的工作为躲避攻击的模型提供了重要见解。