Experimental results and Discussion

Một phần của tài liệu Authentication via deep learning facial recognition with and without mask and timekeeping implementation at working spaces (Trang 61 - 65)

CHAPTER 4: THE PROPOSED MODEL AND IMPLEMENTATION

4.4.3. Experimental results and Discussion

Performance Comparison

In this performance comparison section, I further conduct the experiments to evaluate the output of current baseline models given in the field of face recognition. During the experiment, the same LFW and MLFW datasets should be used in place of training, validation and testing to fairly assess the experimental result afterwards. The methods selected in this experiment consist of FaceNet [57], the state-of-the-art Pairwise Differential Siamese Network (PDSN) [80] and my model, Siamese Neural Network. The precision of each model after training is captured as the main factor to make a comparison between different methods. Table 4.3 shows the summary of the training, validation and testing image set used across different models:

Table 4.3: Overview of training, validation and testing image set

Models

Datasets

FaceNet

Pairwise Differential

Siamese Network

Siamese Neural Network

Siamese Neural Network +

Ensemble Learning (our model) Training Set 80 images; 20

classes

140 images; 20 classes

300 images;

20 classes

600 images;

20 classes

Validation Set 20 images; 20 classes

22 images; 20 classes

22 images; 20 classes

44 images;

20 classes

Testing Set 40 images; 20 classes

40 images; 20 classes

40 images; 20 classes

40 images;

20 classes

As depicted in Table 4.4, the performance of each model is compared based on the post-training precision tested against 40 images taken from both LFW and MLFW testing sets. However, there can be seen the main difference between the experiment on the Siamese Neural Network to the others is the use of 4 models (each represents 5 classes) and ensemble learning to aggregate the output from each pair of models to maintain the approximately high accuracy in both masked and unmasked face recognition. The Siamese Neural Network with a backbone of CNN although constructed mainly for non-masked face recognition still accounts for a precision of 70% with the help of ensemble learning (as described in details the section 2.4) when consolidating for a final face recognition. By contrast, the FaceNet built upon Inception-Resnet V1 makes up 77.5% in precision when training with the MLFW datasets whereas the SOTA PDSN achieves a significantly high precision of 97.5%

after training with pairs of images from both the corresponding LFW and MLFW datasets but it comes with the trade off complexity and computational cost in which will be described in Table 4.5.

Table 4.4: Summary of performance outcome on different face recognition baselines. “#Models” is the number of models used in the method for evaluation

Comparison of Computational Efficiency

In machine learning, computational efficiency is a well-known performance assessment and this factor consists of the cost and complexity of the model during training and testing. The primary goal of our model is to achieve the high prediction accuracy since any access to the working office is largely considered as a critical element in privacy and security of each company while maintaining a less computational cost when it comes to practicality. As fully optimized for a small capacity office of each company department and the cost-saving and accuracy (especially when training the model with any masked-face datasets) of using CNN as the backbone of the Siamese Neural Network in comparison to other state-of-the-art Inception or ResNet models [81], this experiment will testify the accumulated testing

# Methods Backbones #Models Datasets Training Data

Support

Data Classes Precision

1 FaceNet Inception -

ResNet V1 1 MLFW 100 - 20 0.775

2

Pairwise Differential Siamese Network

CNN + Siamese Network

1 LFW +

MLFW 162 - 20 0.975

3 Siamese Neural

Network CNN 4 LFW 300 22 20 0.800

4

Siamese Neural Network +

Ensemble Learning (our

model)

CNN 4 LFW +

MLFW 600 44 20 0.700

result of our model with a novel of using a single model-training exclusively for a small group of employees. The model-training time per epoch and testing time per input image of each method given the same number of 20 classes examined are given in Table 4.5 with the fastest ones are in bold quotation while the second-best results are highlighted by underlined text.

Table 4.5: Comparison of model-training and model-testing time in seconds of each epoch for different face recognition models

# Methods Training Time

(s)

Testing Time (s)

1 FaceNet 54 0.5

2 Pairwise Differential Siamese

Network 72 1.1

3 Siamese Neural Network +

Ensemble Learning (our model) 12 0.8

For the model-training phase, the FaceNet required 54 sec for MLFW dataset while the cost for PDSN clocked at around 72 sec. By comparison, it is recognizable that each new Siamese Neural Network + Ensemble Learning model for a small group of employees costs only 12 sec to train per epoch, considering the time-consuming effort of re-training the model in such a likely scenario where a new class is added to the remaining FaceNet and PDSN models. In the testing circumstance, FaceNet accounted for the least cost but accompanied with a fair precision though only trained for masked face datasets (as shown in Table 4.4) while our model reduced the cost of testing each image to below 1 sec and PDSN performed at more than 1 sec to identify the person, both maintaining a considerably high precision afterwards.

Limitations

Despite a good result of performance and accuracy as experimented, the practicality of training each model for each employee will be fairly limited when business expansion brings up a huge number of human resources to each office. In addition, CNN is vulnerable to the specific details of the human face changes or covered in terms of different types of masks or daily wearings such as glasses, hats, etc hence future works should include the multisource training data, a deep dive into the ensemble learning where the generalization performance relies on the combination of several individual models which in turn sheds a light into the aforementioned issue and last but not least, finding a route to a breakthrough in reduction of time and computational cost for the PDSN model in a matter of common sense.

Một phần của tài liệu Authentication via deep learning facial recognition with and without mask and timekeeping implementation at working spaces (Trang 61 - 65)

Tải bản đầy đủ (PDF)

(80 trang)