Memorization, or the tendency of large language models (LLMs) to output entire sequences from their training data verbatim, is a key concern for safely deploying language models. In particular, it is vital to minimize a model's memorization of sensitive datapoints such as those containing personal identifiable information (PII). The prevalence of such undesirable memorization can pose issues for model trainers, and may even require discarding an otherwise functional model. We therefore seek to predict which sequences will be memorized before a large model's full train-time by extrapolating the memorization behavior of lower-compute trial runs. We measure memorization of the Pythia model suite, and find that intermediate checkpoints are better predictors of a model's memorization behavior than smaller fully-trained models. We additionally provide further novel discoveries on the distribution of memorization scores across models and data.
翻译:记忆化,即大型语言模型(LLM)输出其训练数据中整个序列的倾向,是安全部署语言模型的关键问题。特别是,必须将模型在敏感数据点(如包含个人可识别信息(PII)的数据点)中的记忆最小化。这种不良记忆的普遍存在可能会给模型培训者带来问题,甚至可能需要放弃一个本来功能良好的模型。因此,我们寻求在大型模型的完全训练时间之前通过推断低计算量试验运行的记忆化行为来预测哪些序列将被记忆。我们测量了 Pythia 模型套件的记忆化情况,并发现中间检查点比较小的完全训练模型更好地预测模型的记忆行为。我们还提供了关于模型和数据中记忆化得分分布的其他新发现。