Modern cameras are not designed with computer vision or machine learning as the target application. There is a need for a new class of vision sensors that are privacy preserving by design, that do not leak private information and collect only the information necessary for a target machine learning task. In this paper, we introduce key-nets, which are convolutional networks paired with a custom vision sensor which applies an optical/analog transform such that the key-net can perform exact encrypted inference on this transformed image, but the image is not interpretable by a human or any other key-net. We provide five sufficient conditions for an optical transformation suitable for a key-net, and show that generalized stochastic matrices (e.g. scale, bias and fractional pixel shuffling) satisfy these conditions. We motivate the key-net by showing that without it there is a utility/privacy tradeoff for a network fine-tuned directly on optically transformed images for face identification and object detection. Finally, we show that a key-net is equivalent to homomorphic encryption using a Hill cipher, with an upper bound on memory and runtime that scales quadratically with a user specified privacy parameter. Therefore, the key-net is the first practical, efficient and privacy preserving vision sensor based on optical homomorphic encryption.
翻译:现代相机不是以计算机视觉或机器学习为目标应用程序设计的。 需要一种新的视觉传感器, 即以设计的方式保护隐私, 不泄漏私人信息, 只收集目标机器学习任务所需的信息。 在本文中, 我们引入了关键网, 它们是革命性网络, 配有定制的视觉传感器, 应用光学/ analog 变换, 使键网能够对这一变形图像进行精确加密的推断, 但图像不能被人或任何其他关键网解释。 我们为适合键网的光学转换提供了五个充分的条件, 并且显示通用的透视矩阵( 如比例、 偏差和分数像素抖动) 满足了这些条件。 我们通过显示, 没有这种网络, 就会有一个功能/ / 原始交换器, 直接根据光学变图像对面识别和对象探测进行精确的加密。 最后, 我们显示, 关键网与同质定的加密等同, 使用希尔· 密码, 并带有上层内存和运行时间 通用的透视质矩阵, 使用户的磁感系统安全度具有一定的精确度, 。