Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the success of these attacks is largely due to duplication in commonly used web-scraped training sets. We first show that the rate at which language models regenerate training sequences is superlinearly related to a sequence's count in the training set. For instance, a sequence that is present 10 times in the training data is on average generated ~1000 times more often than a sequence that is present only once. We next show that existing methods for detecting memorized sequences have near-chance accuracy on non-duplicated training sequences. Finally, we find that after applying methods to deduplicate training data, language models are considerably more secure against these types of privacy attacks. Taken together, our results motivate an increased focus on deduplication in privacy-sensitive applications and a reevaluation of the practicality of existing privacy attacks.
翻译:过去的工作表明,大型语言模型很容易受到隐私攻击,其中对手从经过训练的模型中产生序列,并从培训组中检测哪些序列。在这项工作中,我们表明,这些袭击的成功在很大程度上是由于常用的网络倾斜式培训组的重复。我们首先表明,语言模型再生培训组的频率与培训组的序列计数超直线相关。例如,培训数据中存在的10次序列平均生成的频率是1 000倍,比仅存在一次的序列的频率高出1 000倍。我们接下来显示,现有的记忆序列探测方法对非复制式培训组的精确度接近。最后,我们发现,在应用了非复制式培训数据的方法之后,语言模型对于这类隐私攻击的精确度要大得多。我们的结果共同促使人们更加关注对隐私敏感应用的淡化和对现有隐私攻击的实用性重新评价。