The face mask detector is built based on MobileNetV2, with ImageNet pre-trained weights, to detect three cases of correctly wearing, incorrectly wearing and not wearing a mask.. The resu
Trang 1An Embedded Machine Learning System For
Real-time Face Mask Detection And Human
Temperature Measurement
Lien Nguyen∗, Trang N.M Cao†, Lam Huynh-Anh‡, Hanh Dang-Ngoc§ Faculty of Electrical and Electronics Engineering, Ho Chi Minh city University of Technology,
Ho Chi Minh city, Vietnam Email:∗lien.nguyen1812799@hcmut.edu.vn,†trang.cao1814391@hcmut.edu.vn,
‡lam.huynh05042000@hcmut.edu.vn,§hanhdn@hcmut.edu.vn
Abstract—In this paper, an efficient embedded machine
learn-ing system is proposed to automatically detect face masks
and measure human temperature in a real-time application In
particular, our system uses a Raspberry-Pi camera to collect
real-time video and detect face masks by implementing a classification
model on Raspberry Pi 3 in public places The face mask
detector is built based on MobileNetV2, with ImageNet
pre-trained weights, to detect three cases of correctly wearing,
incorrectly wearing and not wearing a mask We also design
a human temperature measurement framework by deploying
a temperature sensor on the Raspberry Pi 3 The numerical
results prove the practicality and effectiveness of our embedded
systems compared to some state-of-the-art researches The results
of accuracy rate in detecting three cases of wearing a face
mask are 98.61% based on the training results and 97.63% for
validation results Meanwhile, our proposed system needs a short
time of 6 seconds for each person to be tested through the whole
process of face mask detection and human forehead temperature
measurement
Index Terms—COVID-19, embedded machine learning system,
face mask detection, human temperature measurement
I INTRODUCTION The COVID-19 virus mainly spreads through droplets that
emerge from a person infected with corona-virus
(SARS-CoV-2) and poses a risk to others The risk of transmission is highest
in public places [1] After one person gets infected, it takes
almost fourteen days for the virus to grow in the body of its
host and affect them In the meantime, it spreads to almost
everyone who is in contact with that person One of the best
ways to stay safe from getting infected is by wearing a face
mask in open territories, as indicated by the World Health
Organization [2], [3] Furthermore, elevated body temperature
can be a common symptom of the medical condition
COVID-19 [4], but the normal way of using handheld devices to
measure human temperature with a close distance fewer than
2 meters might cause the infection for people Therefore, a
stand-alone system for both face mask detection and
non-contact forehead temperature measurement in public places
has become the crucial embedded machine learning system to
tackle this global problem
Many computer vision-based systems have been deployed
since December 2019 when the SARS-CoV-2 spread around
the world from Wuhan (China) Authors in [1]–[4] used Mo-bileNetV2 and OpenCV for their face mask detection frame-works with high accuracy of training phase Their frameframe-works followed two main steps of detecting and auto-cropping human faces, then all the images were labeled for each person A bounding box was used to detect whether people were wearing face masks Other researchers used YOLOv3 and Haar cascade classifiers [5] with the accuracy result of 90.1% In [6], the authors proposed a transfer learning method based on the combination of MobileNetV2 and support vector machine However, those above methods were designed as an initial study to deploy an automatic system of face mask detection, which might not be practical for working day-by-day without human supervision In fact, the main shortage of [2]–[6] was the researchers used dataset that contained only 2 classes: with and without mask, which certainly caused missed detection for people who incorrectly wore mask or intentionally covered their faces with scarves or handkerchiefs
In the embedded machine learning research field, authors
in [7] proposed a system that includes three phases of person detection, safe distance measurement between detected people, and face mask detection using single shot object detection with MobileNet V2 and OpenCV Other authors proposed a subsystem implemented in the entering door for temperature detection, face mask detection with a smartphone application for security guards [8] They used Arduino UNO enabled with
an infrared thermal camera to measure human temperature and send alert messages to the security guards by using an ESP8266 Wi-Fi module Despite their optimistic results, the system consisted of various hardware components connected
to a laptop which contained the corresponding software This complicated deployment made research [8] not flexible enough
to work 24/7 in public places
In this paper, we propose an embedded machine learning system deployed on the Raspberry Pi 3 for automatically detecting face masks and measuring human forehead temper-ature with a MLX90614 tempertemper-ature sensor We divide our detection problems into two classes, which are “Mask” for cases of correctly wearing masks and “No mask” for cases
of incorrectly wearing or not wearing masks In order to
Trang 2Raspberry Pi 3, then results of these two phases are displayed
on the LCD screen The Raspberry Pi 3 will give a warning
sound if there is an overheated case The whole system can
work 24/7 using a power adapter
The remainder of this paper is organized as follows
Sec-tion I reviews some state-of-the-art researches in the field of
computer vision based system, then discusses some aspects
that need to be improved in some related embedded machine
learning researches Our proposed system is presented in
Section II with two main stages of building the face mask
detection model and deploying the whole system on the
Raspberry Pi 3 The experimental results and discussion are
carried out in Section III Finally, Section IV is the conclusion
that remarks the contributions of our work
II METHODOLOGY There are two main stages in our proposed system: (i)
train-ing and testtrain-ing a machine learntrain-ing model to detect cases of
wearing face masks (ii) deploying a face mask detection model
along with human temperature measurement on Raspberry Pi
3
Our dataset has 5481 images in total with various sizes,
which comprises 3 cases of wearing masks: 1915 images of
correctly wearing masks, 1782 images of incorrectly wearing
masks1 and 1784 images of not wearing masks2 as shown in
Fig 1 All the images of correctly wearing masks are labeled
as “Mask” Both the incorrectly wearing and not wearing
masks cases are labeled as “No mask” We use 80% of the
dataset for training and 20% remaining for testing, as shown
in Table I
A The Face Mask Detection Model
1) Pre-processing: Pre-processing steps include resizing
each image to 224×224 pixels, converting them into array
format, and scaling the pixel intensities in the input image
to the range of [-1, 1] by some preprocessing functions Then,
one-hot coding is used to represent categorical variables as
binary vectors on the labels Essentially, this process converts
our two labels, which are “Mask” and “No mask” into specific
vectors If a training image is representative of the “Mask”,
then the value will be [1, 0] Otherwise, for a “No mask”
case, the value would be [0, 1] In the next step, we split the
data into 80% for training and 20% for testing In the data
augmentation step, we use the ImageDataGenerator to rotate,
1 ”MaskedFace-Net - A dataset of correctly/incorrectly masked face
images in the context of COVID-19”, Adnane Cabani, Karim Hammoudi,
Halim Benhabiles, and Mahmoud Melkemi, Smart Health, ISSN 2352-6483,
Elsevier, 2020 https://doi.org/10.1016/j.smhl.2020.100144
2 ”Masked Face Recognition Dataset and Application”, Zhongyuan Wang,
Guangcheng Wang, Baojin Huang, Zhangyang Xiong, Qi Hong, Hao Wu,
Peng Yi, Kui Jiang, Nanxi Wang, Yingjiao Pei, Heling Chen, Yu Miao, Zhibing
Huang, Jinbi Liang, abs/2003.09093, 2020 https://arxiv.org/abs/2003.09093
(a) Correctly wearing a mask (b) Correctly wearing a
pat-tern mask
(c) Incorrectly wearing a mask
(d) Incorrectly wearing masks in public
(e) Not wearing a mask (f) Not wearing masks in
public
Fig 1: Dataset for face mask detection model TABLE I: Dataset for training and testing phases
Training phase Testing phase Total image
zoom, shift, shear, and horizontally flip all the images in the training set
2) Training Model: MobileNetV2 was built upon the ideas
of MobileNet [9], using depthwise separable convolution as efficient building blocks [10] The key difference in depthwise separable convolutions of MobileNetV2 was to replace a full convolutional operator with a factorized version that splits convolution into two separate layers The first layer was called
a depthwise convolution, it performed lightweight filtering
by applying a single convolutional filter per input channel The second layer was a 1x1 convolution, called a pointwise convolution, which was responsible for building new features through computing linear combinations of the input channels All the layers used in this proposed model are implemented using Keras layers API
In order to improve the accuracy of the pre-trained model for face mask detection, we coordinate some Keras layers and MobileNetV2 with pre-trained Imagenet weights They are used as the base model and left off the head of fully connected layer sets The input shape dimension for MobileNetV2 as a base model is 224x224 using 3 channels Then we construct some layers that will be replaced as the head of the base model, which are some Keras layers such as Average Pooling 2D, Flatten, Dense, and Dropout The Average Pooling 2D layer calculates the average output of each feature map in the previous layer and, in order to prevent overfitting, we have a
Trang 3Fig 2: Block diagram of our proposed system.
50% drop-out rate The fully connected head model is placed
on top of the base model This is the actual model which
will be trained Finally, the model is compiled with the Adam
optimizer and binary cross-entropy loss function
3) Testing Model: In the testing phase, OpenCV is used
for object detection We use both collected videos on Youtube
and real-time video captured by the camera of Raspberry Pi 3
Each video in the testing set will be grabbed by the dimensions
of the frame and then constructed into a blob The blob will
then be passed through the network to obtain a bounding box
for the face detection step Finally, faces in the frame from the
video stream and bounding boxes are added to their respective
lists The prediction of wearing masks will be processed only
if at least one face was detected The detected face locations
and their corresponding locations will be looped over and the
output frame will give out the class label “Mask” or “No
mask” on the bounding box rectangle
B Face Mask Detection and Human Temperature
Measure-ment on Raspberry Pi 3
This section describes the coordination of face mask
detec-tion and temperature measurement The combinadetec-tion of these
two functions must be done on the Raspberry Pi which is
connected with some peripherals such as Raspberry-Pi camera,
temperature sensor, 16x2 LCD, and a buzzer to complete the
block system as shown in Fig 2
First, real-time video from the Raspberry-Pi camera will be
an input for the pre-trained model to detect face masks In
this initial step, the Raspberry-Pi camera also helps to detect
human existence Whenever the face detection step is done,
a non-contact temperature sensor MLX90614 is turned on by
the human existence signal, so that it can send the human
forehead temperature to the Raspberry Pi to analyze The
default temperature is set at 37 degrees Celcius, since it is
the common temperature of a healthy person [11] Meanwhile,
the buzzer gives a warning sound if there is any “No mask” detected cases Afterward, the result of human temperature is displayed on the 16x2 LCD and a buzzer will give a warning sound to indicate an overheated case detected
III RESULTS ANDDISCUSSION The experimental setup computer is Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz with 16.0GB RAM
A Training/Testing the face mask detection model using im-ages in dataset
The parameters are initialized as follows: the learning rate is 0.0001, the number of epochs is 20 and the batch size is 32 Our proposed framework for face mask detection uses 80%
of a total 5481 images for the training phases As shown in Fig 3, the training loss and validation loss were achieved at 4.66% and 5.56%, respectively
A total of 1097 images are used for the testing phases The results are classified into four categories: true positive, true negative, false positive and false negative True positives (TP) and true negatives (TN) are the observations indicating the correct detection False positive (FP) means the number of samples in the detected object category is inconsistent with the actual object category, and false negative (FN) indicates that the actual sample is detected as the opposite result or in the undetected category Because all positive cases predicted
by the model are (TP + FP), the proportion of real cases (TP)
is called the precision rate, which represents the proportion of samples of real cases in positive cases among samples detected
by the model, as shown in equation (1) As shown in Table II, our model can achieve 96% and 99% of precision rate for
“Mask” and “No mask” cases, respectively These optimistic results show the classification ability to accurately detect the considered positive class to the other, such as “Mask” to “No mask” and vice versa
Trang 4Fig 3: Training accuracy/loss vs number of epochs.
TABLE II: Performance results of our proposed system
Testing images Precision Recall F1-score
P recision = TP
The recall rate is used to measure the ability of the model for
correctly predicting positive observations from all observations
in the actual class [5], as written in (2) It is also called the
sensitivity of the model in detecting the considered positive
images from all the labeled ones As shown in Table II, our
model can achieve 98% of recall rate for both “Mask” and
“No mask” cases, which ensures the ability of our model
in detecting the actual positive images from the correctly
predicted positive images and the incorrectly negative detected
ones
Recall = TP
Equation (1) and (2) show that there is a contradiction
between precision rate and recall rate F1-score represents the
harmonic average of precision rate and recall rate, as shown in
equation (3) It is the weighted average of precision and recall
so that F1-score takes both FP and FN cases into account
As shown in Table II, our model can achieve 97% and 98%
of F1-score for “Mask” and “No mask” cases, respectively
Because our classification problems also consider the TP and
TN cases, we show our high accuracy rate of 98% in Table II
F 1 − score = 2 × P recision × Recall
P recision + Recall (3)
the model work with real-time video
Fig 4 shows some captured frames of successfully detected results from four collected videos from Youtube3,4,5,6 As shown in Fig 4(a), Fig 4(b), Fig 4(c) and Fig 4(d), our model can give a high accuracy rate for “No mask” and “Mask” cases of both single person and multiple people Our proposed model can also correctly recognize whether human faces are covered with scarves, handkerchiefs or fabric face masks, such
as in Fig 4(e) and Fig 4(h) Furthermore, by combining both incorrectly and not wearing mask images as one label named
“No mask”, our model can give high accuracy in detecting some incorrect cases as shown in Fig 4(f) and Fig 4(g), while face masks can not fully cover human faces Last but not least, Fig 4(i) and Fig 4(j) shows some successful detected results, which proves the effectiveness of our proposed model
in recognizing incorrect and correct wearing masks cases Look into some missed detected cases in Fig 5, there are some captured frames without recognizing face masks
In fact, since we aim to deploy our model to process one-by-one person in real-time, this can be improved by setting
an appropriate distance from our system to people, which could help the camera to detect all the details of human faces and masks Based on our experimental results, the most appropriate distance from our system to people is 4 centimeters, which could help both Raspberry-Pi camera and MLX90614 temperature sensor to work at their best to collect all the information from human faces and human temperature
C Testing the face mask detection model by real-time video captured by a built-in camera on Raspberry Pi
We use Raspberry Pi 3 Model B with 1.2 GHz 64-bit quad-core ARMv8 CPU to deploy the face mask detection phase after training it on our experimental setup computer In order
to deploy the model in public places, we conduct a testing phase with real-time video taken from the built-in camera
on Raspberry Pi In this step, we use VNC Viewer which
is a cross-platform screen sharing system that was created to remotely control Raspberry Pi through our experimental setup laptop to collect some successful detected cases as shown in Fig 6 Our model can detect whether people use different common types of face masks, such as medical face masks or fabric pattern face masks, as shown in Fig 6(a) and Fig 6(b),
3 Coronavirus outbreak: Mixed messaging about mandatory face masks, https://www.youtube.com/watch?v=hekZBf8oUq0
4 DWCRA Women Face to Face - Making Masks In Tirupati - Chittoor District - Sakshi TV, https://www.youtube.com/watch?v=EIY9xJc4s0Q
https://www.youtube.com/watch?v=nf4bZgHsa5E
6 Nigerians React To Wearing Face Mask - Street Login, https://www.youtube.com/watch?v=V54hhnyAntU
Trang 5(a) Video 1 - Frame 1 (b) Video 1 - Frame 2
(c) Video 2 - Frame 1 (d) Video 2 - Frame 2
(e) Video 2 - Frame 3 (f) Video 2 - Frame 4
(g) Video 3 - Frame 1 (h) Video 3 - Frame 2
(i) Video 4 - Frame 1 (j) Video 4 - Frame 2
Fig 4: Example of successful detected cases
(a) Video 1 - Frame 3 (b) Video 2 - Frame 3
Fig 5: Example of missed detected cases
respectively Besides, because of the variety of our dataset, we
can also evaluate some cases when people wear both face mask
and sunglasses, as shown in Fig 6(c) and Fig 6(d), which also
gives high accuracy results Furthermore, our model can either
detect some cases of people who do not wear mask or use
their hand to cover their face and label them as “No mask”,
as shown in and Fig 6(e) and Fig 6(f) For some cases when
people incorrectly wearing mask as shown in Fig 6(g) and
Fig 6(h), our model gives high accuracy of detecting them as
“No mask”, then the buzzer will be turned on for warning
(a) Mask (b) Fabric mask
(c) Mask with sunglasses (d) Fabric mask with
sun-glasses
(e) Not wearning face mask (f) Face covered by hand
(g) Incorrectly wearing mask (h) Incorrectly wearing mask
Fig 6: Example of successful detected cases
(a) Face covered by a note-book
(b) Face covered by a hand-kerchief
Fig 7: Example of failed detected cases
Look into some failed detected cases as shown in Fig 7, there are some missed cases when people intentionally try to cover their face with some other things but masks such as
a notebook or a handkerchief Fig 7(a) and Fig 7(b) show
a low accuracy rate of “Mask” label, which can prove that these issues can be improved by increasing the quantity of our dataset for more incorrectly wearing mask cases
Trang 6Fig 8: Experimental proposed system
D The face mask detection and human forehead temperature
measurement real-time application
As described in Fig 2, we use a Raspberry Pi with some
connected peripherals to complete the whole system as shown
in Fig 8 This packed system is set on a camera stick holder
with an appropriate height of 1.6 meters To do the experiment,
3 people will go through our system indoors with good light
conditions They are required to step one-by-one at a distance
of 1.5 meters in front of the system for face mask detection
Results show that it takes about 2 seconds to detect face
mask wearing After that, the testing person steps closer to the
system for measuring forehead temperature The appropriate
distance between the MLX90614 sensor and the testing person
is about 4 centimeters The distance for temperature
mea-surement can be lengthened using higher quality temperature
sensor The signal from face mask detection activates the
temperature sensor and the temperature measurement process
takes about 1 second The face mask detection and measured
temperature results are finally displayed on the 16x2 LCD as
shown in Fig 8 If the system detects a “No mask” and/or an
overheated case, a buzzer is turned on and gives a warning
sound for 3 seconds The warning sounds of those cases are
different to be distinguished
In summary, our system takes about 6 seconds for each
person to complete the whole process of face mask detection
and forehead temperature measurement Our system can be
deployed at the entrance of public places with good lighting
conditions Due to the RGB camera, our system will need extra
light support to work at night Last but not least, in order to
ensure the practicality of our system, we use a 220V ac - 5V
DC power adapter, so that our system can work 24/7 for a
long time without human supervision
IV CONCLUSION
In this paper, we propose an embedded machine learning
system for automatically detecting face mask wearing and
measuring human temperature The face mask detector is built
to previous works It takes a short time of 6 seconds for each person to have their face mask detected and temperature measured Our proposed system can be further developed with higher quality peripherals to obtain better results
ACKNOWLEDGMENT This research is funded by Ho Chi Minh City University of Technology – VNUHCM, under grant number SVCQ-2020-DDT-118
REFERENCES [1] A Das, M W Ansari, and R Basak, “Covid-19 face mask detection using tensorflow, keras and opencv,” 12 2020, pp 1–5.
[2] H Adusumalli, D Kalyani, R Sri, M Pratapteja, and P V R D P Rao, “Face mask detection using opencv,” in 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2021, pp 1304–1309.
[3] S A Sanjaya and S Adi Rakhmawan, “Face mask detection using mobilenetv2 in the era of covid-19 pandemic,” in 2020 International Conference on Data Analytics for Business and Industry: Way Towards
a Sustainable Economy (ICDABI), 2020, pp 1–5.
[4] P Nagrath, R Jain, A Madan, R Arora, P Kataria, and J Hemanth,
“Ssdmnv2: A real time dnn-based face mask detection system using single shot multibox detector and mobilenetv2,” Sustainable Cities and Society, vol 66, p 102692, 2021 [Online] Available: https://www.sciencedirect.com/science/article/pii/S2210670720309070 [5] T Q Vinh and N T N Anh, “Real-time face mask detector using yolov3 algorithm and haar cascade classifier,” in 2020 International Conference
on Advanced Computing and Applications (ACOMP), 2020, pp 146– 149.
[6] K Suresh, M Palangappa, and S Bhuvan, “Face mask detection by using optimistic convolutional neural network,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), 2021, pp 1084–1089.
[7] S Yadav, “Deep learning based safe social distancing and face mask detection in public areas for covid-19 safety guidelines adherence,” International Journal for Research in Applied Science and Engineering Technology, vol 8, pp 1368–1375, 07 2020.
[8] A M., S K., S K R., and Y I., “Contactless temperature detection of multiple people and detection of possible corona virus affected persons using ai enabled ir sensor camera,” in 2021 Sixth International Confer-ence on Wireless Communications, Signal Processing and Networking (WiSPNET), 2021, pp 166–170.
[9] A G Howard, M Zhu, B Chen, D Kalenichenko, W Wang, T Weyand,
M Andreetto, and H Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” 2017.
[10] M Sandler, A Howard, M Zhu, A Zhmoginov, and L.-C Chen,
“Mobilenetv2: Inverted residuals and linear bottlenecks,” 2019 [11] N S Yamanoor, S Yamanoor, and K Srivastava, “Low cost design of non-contact thermometry for diagnosis and monitoring,” in 2020 IEEE Global Humanitarian Technology Conference (GHTC), 2020, pp 1–6.