This paper presents solutions to the Machine Learning Model Attribution challenge (MLMAC) collectively organized by MITRE, Microsoft, Schmidt-Futures, Robust-Intelligence, Lincoln-Network, and Huggingface community. The challenge provides twelve open-sourced base versions of popular language models developed by well-known organizations and twelve fine-tuned language models for text generation. The names and architecture details of fine-tuned models were kept hidden, and participants can access these models only through the rest APIs developed by the organizers. Given these constraints, the goal of the contest is to identify which fine-tuned models originated from which base model. To solve this challenge, we have assumed that fine-tuned models and their corresponding base versions must share a similar vocabulary set with a matching syntactical writing style that resonates in their generated outputs. Our strategy is to develop a set of queries to interrogate base and fine-tuned models. And then perform one-to-many pairing between them based on similarities in their generated responses, where more than one fine-tuned model can pair with a base model but not vice-versa. We have employed four distinct approaches for measuring the resemblance between the responses generated from the models of both sets. The first approach uses evaluation metrics of the machine translation, and the second uses a vector space model. The third approach uses state-of-the-art multi-class text classification, Transformer models. Lastly, the fourth approach uses a set of Transformer based binary text classifiers, one for each provided base model, to perform multi-class text classification in a one-vs-all fashion. This paper reports implementation details, comparison, and experimental studies, of these approaches along with the final obtained results.
翻译:本文介绍了由麻省理工学院、微软、施密特-未来、强力-智能、林肯-Network和Huggingface社区集体组织的机器学习模型归因挑战(MLMMAC)的解决方案。 挑战提供了由知名组织开发的12个开放源基版本的流行语言模型和12个微调文本生成语言模型。 微调模型的名称和结构细节被隐藏起来, 参与者只能通过组织者开发的其余 API 访问这些模型。 鉴于这些制约因素, 竞赛的目标是确定哪些经过微调的模型起源于哪些基础模型。 为了应对这一挑战, 我们假设经过微调的模型及其相应的基础版本必须共享一个类似的词汇组, 匹配的合成合成语言模式书写风格在生成的输出中产生共鸣。 我们的战略是开发一套询问基础基础基础基础和结构细节和微调模型的模型细节。 然后根据所生成的版本的对应方法进行一对称, 其中不止一个经过微调的分类模型可以与基础模型相匹配,而不是副反转的版本。 我们用四种不同的方法来测量模型的版本的版本。 我们用一种基础的模型的模型的版本的版本的版本的版本的版本的版本, 使用了一种模型的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本, 。