We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test. The framework verifies any trained model, regardless of its training process and architecture. Thus, it allows us to evaluate any deep learning model on multiple fairness definitions empirically. We tackle two scenarios, where either the test data is privately available only to the tester or is publicly known in advance, even to the model creator. We investigate the soundness of the proposed approach using theoretical analysis and present statistical guarantees for the interactive test. Finally, we provide a cryptographic technique to automate fairness testing and certified inference with only black-box access to the model at hand while hiding the participants' sensitive data.
翻译:我们提出了一个框架,用以验证基于互动和隐私保护测试的模型的公平程度。框架核查任何经过培训的模型,不论其培训过程和结构如何。因此,框架使我们能够从经验上评估任何关于多重公平定义的深层次学习模型。我们处理两种情况,即测试数据只向测试者私下提供,或事先公开,甚至向模型创建者公开。我们利用理论分析对拟议方法的正确性进行调查,并为互动测试提供统计保证。最后,我们提供加密技术,自动进行公平测试和核证推断,只有黑盒才能进入手头的模型,同时隐藏参与者的敏感数据。