To continuously improve quality and reflect changes in data, machine learning applications have to regularly retrain and update their core models. We show that a differential analysis of language model snapshots before and after an update can reveal a surprising amount of detailed information about changes in the training data. We propose two new metrics---\emph{differential score} and \emph{differential rank}---for analyzing the leakage due to updates of natural language models. We perform leakage analysis using these metrics across models trained on several different datasets using different methods and configurations. We discuss the privacy implications of our findings, propose mitigation strategies and evaluate their effect.
翻译:为了不断提高质量和反映数据的变化,机器学习应用程序必须定期重新培训和更新核心模型。我们表明,对更新前后的语言模型截图的差别分析可以揭示出数量惊人的关于培训数据变化的详细信息。我们提出两个新的衡量标准-- emph{ 差异评分} 和\ emph{ 差异评分- 用于分析因更新自然语言模型而渗漏的情况。我们使用这些衡量标准,对使用不同方法和配置的不同数据集培训的不同模型进行渗漏分析。我们讨论了我们的调查结果的隐私影响,提出了缓解战略并评估其效果。