The purpose of this paper is to design a solution to the problem of facial recognition by use of convolutional neural networks, with the intention of applying the solution in a camera-based home-entry access control system. More specifically, the paper focuses on solving the supervised classification problem of taking images of people as input and classifying the person in the image as one of the authors or not. Two approaches are proposed: (1) building and training a neural network called WoodNet from scratch and (2) leveraging transfer learning by utilizing a network pre-trained on the ImageNet database and adapting it to this project's data and classes. In order to train the models to recognize the authors, a dataset containing more than 150 000 images has been created, balanced over the authors and others. Image extraction from videos and image augmentation techniques were instrumental for dataset creation. The results are two models classifying the individuals in the dataset with high accuracy, achieving over 99% accuracy on held-out test data. The pre-trained model fitted significantly faster than WoodNet, and seems to generalize better. However, these results come with a few caveats. Because of the way the dataset was compiled, as well as the high accuracy, one has reason to believe the models over-fitted to the data to some degree. An added consequence of the data compilation method is that the test dataset may not be sufficiently different from the training data, limiting its ability to validate generalization of the models. However, utilizing the models in a web-cam based system, classifying faces in real-time, shows promising results and indicates that the models generalized fairly well for at least some of the classes (see the accompanying video).
翻译:本文的目的是设计一个解决方案,解决通过使用进化神经网络进行面部识别的问题,目的是在基于摄像头的家庭进入访问控制系统中应用该解决方案。更具体地说,本文件侧重于解决将人图像作为输入输入的监管分类问题,并将图像中的人分类为作者之一或非作者。提出了两种方法:(1) 建立和培训神经网络,称为WoodNet从零开始,(2) 利用在图像网络数据库上经过预先训练的网络来利用转移学习,使之适应该项目的数据和课程。为了培训模型来识别作者,已经创建了一个包含超过15万张图像的数据集,平衡了作者和其他人。从视频和图像增强技术中提取图像用于创建数据集。结果分为两个模型,将数据集中的个人分类精度高,在搁置测试数据数据中达到99%的准确度以上。预先训练模型比WoodNet数据库要快得多,而且似乎要更加概括化。这些模型的特点是有一些最不精确的缩略图。但是,这些结果与一些预设模型相比,这些模型是最小的缩图。由于数据系统的精细化方法的精度,因此,从数据库的精细化到数据的精细度,因此,数据的精细度可能超越了数据的精度,因此使得数据推到数据排序,因此使数据推到数据的精度为高的精度。