我的数据是否在你的AI中？应用于人脸生物识别的成员推断测试（MINT） (Is My Data in Your AI? Membership Inference Test (MINT) applied to Face Biometrics)

This article introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if given data was used during the training of AI/ML models. Specifically, we propose two MINT architectures designed to learn the distinct activation patterns that emerge when an Audited Model is exposed to data used during its training process. These architectures are based on Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). The experimental framework focuses on the challenging task of Face Recognition, considering three state-of-the-art Face Recognition systems. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total. Different experimental scenarios are considered depending on the context of the AI model to test. Our proposed MINT approach achieves promising results, with up to 90\% accuracy, indicating the potential to recognize if an AI model has been trained with specific data. The proposed MINT approach can serve to enforce privacy and fairness in several AI applications, e.g., revealing if sensitive or private data was used for training or tuning Large Language Models (LLMs).

翻译：本文介绍了成员推断测试（MINT），这是一种旨在实证评估给定数据是否在AI/ML模型训练过程中被使用的新方法。具体而言，我们提出了两种MINT架构，旨在学习当被审计模型暴露于其训练过程中使用的数据时所产生的独特激活模式。这些架构基于多层感知器（MLP）和卷积神经网络（CNN）。实验框架聚焦于具有挑战性的人脸识别任务，考虑了三种最先进的人脸识别系统。实验使用了六个公开可用的数据库进行，总计包含超过2200万张人脸图像。根据待测AI模型的上下文，考虑了不同的实验场景。我们提出的MINT方法取得了有希望的结果，准确率高达90%，表明其具备识别AI模型是否使用特定数据进行训练的潜力。所提出的MINT方法可用于在多种AI应用中加强隐私和公平性，例如，揭示敏感或私人数据是否被用于训练或调优大型语言模型（LLM）。

相关内容

关注 7076

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日