Model extraction attacks have renewed interest in the classic problem of learning neural networks from queries. In this work we give the first polynomial-time algorithm for learning arbitrary one hidden layer neural networks activations provided black-box access to the network. Formally, we show that if $F$ is an arbitrary one hidden layer neural network with ReLU activations, there is an algorithm with query complexity and running time that is polynomial in all parameters that outputs a network $F'$ achieving low square loss relative to $F$ with respect to the Gaussian measure. While a number of works in the security literature have proposed and empirically demonstrated the effectiveness of certain algorithms for this problem, ours is the first with fully polynomial-time guarantees of efficiency even for worst-case networks (in particular our algorithm succeeds in the overparameterized setting).
翻译:模型抽取攻击令人们重新关注从查询中学习神经网络的经典问题。 在这项工作中,我们给出了第一个用于学习任意的、隐蔽的层神经网络激活的多元时间算法,提供了黑盒进入网络的机会。 形式上,我们证明,如果F$是一个任意的、隐蔽的层神经网络,有RELU激活功能,那么,在所有参数中,有一个具有查询复杂性和运行时间的多元算法,使得一个网络在Gaussian测量方面实现了相对于$F$的低平方块损失。 虽然安全文献中的一些作品已经提出并用经验证明了某些算法对于这一问题的有效性,但我们是第一个甚至对最坏的网络(特别是我们的算法在过分的参数设置上取得了成功 ) 。