An embedded machine learning system for real time face mask detection and human temperature measurement

The face mask detector is built based on MobileNetV2, with ImageNet pre-trained weights, to detect three cases of correctly wearing, incorrectly wearing and not wearing a mask.. The resu

Trang 1

An Embedded Machine Learning System For

Real-time Face Mask Detection And Human

Temperature Measurement

Lien Nguyen∗, Trang N.M Cao†, Lam Huynh-Anh‡, Hanh Dang-Ngoc§ Faculty of Electrical and Electronics Engineering, Ho Chi Minh city University of Technology,

Ho Chi Minh city, Vietnam Email:∗lien.nguyen1812799@hcmut.edu.vn,†trang.cao1814391@hcmut.edu.vn,

‡lam.huynh05042000@hcmut.edu.vn,§hanhdn@hcmut.edu.vn

Abstract—In this paper, an efficient embedded machine

learn-ing system is proposed to automatically detect face masks

and measure human temperature in a real-time application In

particular, our system uses a Raspberry-Pi camera to collect

real-time video and detect face masks by implementing a classification

model on Raspberry Pi 3 in public places The face mask

detector is built based on MobileNetV2, with ImageNet

pre-trained weights, to detect three cases of correctly wearing,

incorrectly wearing and not wearing a mask We also design

a human temperature measurement framework by deploying

a temperature sensor on the Raspberry Pi 3 The numerical

results prove the practicality and effectiveness of our embedded

systems compared to some state-of-the-art researches The results

of accuracy rate in detecting three cases of wearing a face

mask are 98.61% based on the training results and 97.63% for

validation results Meanwhile, our proposed system needs a short

time of 6 seconds for each person to be tested through the whole

process of face mask detection and human forehead temperature

measurement

Index Terms—COVID-19, embedded machine learning system,

face mask detection, human temperature measurement

I INTRODUCTION The COVID-19 virus mainly spreads through droplets that

emerge from a person infected with corona-virus

(SARS-CoV-2) and poses a risk to others The risk of transmission is highest

in public places [1] After one person gets infected, it takes

almost fourteen days for the virus to grow in the body of its

host and affect them In the meantime, it spreads to almost

everyone who is in contact with that person One of the best

ways to stay safe from getting infected is by wearing a face

mask in open territories, as indicated by the World Health

Organization [2], [3] Furthermore, elevated body temperature

can be a common symptom of the medical condition

COVID-19 [4], but the normal way of using handheld devices to

measure human temperature with a close distance fewer than

2 meters might cause the infection for people Therefore, a

stand-alone system for both face mask detection and

non-contact forehead temperature measurement in public places

has become the crucial embedded machine learning system to

tackle this global problem

Many computer vision-based systems have been deployed

since December 2019 when the SARS-CoV-2 spread around

the world from Wuhan (China) Authors in [1]–[4] used Mo-bileNetV2 and OpenCV for their face mask detection frame-works with high accuracy of training phase Their frameframe-works followed two main steps of detecting and auto-cropping human faces, then all the images were labeled for each person A bounding box was used to detect whether people were wearing face masks Other researchers used YOLOv3 and Haar cascade classifiers [5] with the accuracy result of 90.1% In [6], the authors proposed a transfer learning method based on the combination of MobileNetV2 and support vector machine However, those above methods were designed as an initial study to deploy an automatic system of face mask detection, which might not be practical for working day-by-day without human supervision In fact, the main shortage of [2]–[6] was the researchers used dataset that contained only 2 classes: with and without mask, which certainly caused missed detection for people who incorrectly wore mask or intentionally covered their faces with scarves or handkerchiefs

In the embedded machine learning research field, authors

in [7] proposed a system that includes three phases of person detection, safe distance measurement between detected people, and face mask detection using single shot object detection with MobileNet V2 and OpenCV Other authors proposed a subsystem implemented in the entering door for temperature detection, face mask detection with a smartphone application for security guards [8] They used Arduino UNO enabled with

an infrared thermal camera to measure human temperature and send alert messages to the security guards by using an ESP8266 Wi-Fi module Despite their optimistic results, the system consisted of various hardware components connected

to a laptop which contained the corresponding software This complicated deployment made research [8] not flexible enough

to work 24/7 in public places

In this paper, we propose an embedded machine learning system deployed on the Raspberry Pi 3 for automatically detecting face masks and measuring human forehead temper-ature with a MLX90614 tempertemper-ature sensor We divide our detection problems into two classes, which are “Mask” for cases of correctly wearing masks and “No mask” for cases

of incorrectly wearing or not wearing masks In order to

Trang 2

Raspberry Pi 3, then results of these two phases are displayed

on the LCD screen The Raspberry Pi 3 will give a warning

sound if there is an overheated case The whole system can

work 24/7 using a power adapter

The remainder of this paper is organized as follows

Sec-tion I reviews some state-of-the-art researches in the field of

computer vision based system, then discusses some aspects

that need to be improved in some related embedded machine

learning researches Our proposed system is presented in

Section II with two main stages of building the face mask

detection model and deploying the whole system on the

Raspberry Pi 3 The experimental results and discussion are

carried out in Section III Finally, Section IV is the conclusion

that remarks the contributions of our work

II METHODOLOGY There are two main stages in our proposed system: (i)

train-ing and testtrain-ing a machine learntrain-ing model to detect cases of

wearing face masks (ii) deploying a face mask detection model

along with human temperature measurement on Raspberry Pi

3

Our dataset has 5481 images in total with various sizes,

which comprises 3 cases of wearing masks: 1915 images of

correctly wearing masks, 1782 images of incorrectly wearing

masks1 and 1784 images of not wearing masks2 as shown in

Fig 1 All the images of correctly wearing masks are labeled

as “Mask” Both the incorrectly wearing and not wearing

masks cases are labeled as “No mask” We use 80% of the

dataset for training and 20% remaining for testing, as shown

in Table I

A The Face Mask Detection Model

1) Pre-processing: Pre-processing steps include resizing

each image to 224×224 pixels, converting them into array

format, and scaling the pixel intensities in the input image

to the range of [-1, 1] by some preprocessing functions Then,

one-hot coding is used to represent categorical variables as

binary vectors on the labels Essentially, this process converts

our two labels, which are “Mask” and “No mask” into specific

vectors If a training image is representative of the “Mask”,

then the value will be [1, 0] Otherwise, for a “No mask”

case, the value would be [0, 1] In the next step, we split the

data into 80% for training and 20% for testing In the data

augmentation step, we use the ImageDataGenerator to rotate,

1 ”MaskedFace-Net - A dataset of correctly/incorrectly masked face

images in the context of COVID-19”, Adnane Cabani, Karim Hammoudi,

Halim Benhabiles, and Mahmoud Melkemi, Smart Health, ISSN 2352-6483,

Elsevier, 2020 https://doi.org/10.1016/j.smhl.2020.100144

2 ”Masked Face Recognition Dataset and Application”, Zhongyuan Wang,

Guangcheng Wang, Baojin Huang, Zhangyang Xiong, Qi Hong, Hao Wu,

Peng Yi, Kui Jiang, Nanxi Wang, Yingjiao Pei, Heling Chen, Yu Miao, Zhibing

Huang, Jinbi Liang, abs/2003.09093, 2020 https://arxiv.org/abs/2003.09093

(a) Correctly wearing a mask (b) Correctly wearing a

pat-tern mask

(c) Incorrectly wearing a mask

(d) Incorrectly wearing masks in public

(e) Not wearing a mask (f) Not wearing masks in

public

Fig 1: Dataset for face mask detection model TABLE I: Dataset for training and testing phases

Training phase Testing phase Total image

zoom, shift, shear, and horizontally flip all the images in the training set

2) Training Model: MobileNetV2 was built upon the ideas

of MobileNet [9], using depthwise separable convolution as efficient building blocks [10] The key difference in depthwise separable convolutions of MobileNetV2 was to replace a full convolutional operator with a factorized version that splits convolution into two separate layers The first layer was called

a depthwise convolution, it performed lightweight filtering

by applying a single convolutional filter per input channel The second layer was a 1x1 convolution, called a pointwise convolution, which was responsible for building new features through computing linear combinations of the input channels All the layers used in this proposed model are implemented using Keras layers API

In order to improve the accuracy of the pre-trained model for face mask detection, we coordinate some Keras layers and MobileNetV2 with pre-trained Imagenet weights They are used as the base model and left off the head of fully connected layer sets The input shape dimension for MobileNetV2 as a base model is 224x224 using 3 channels Then we construct some layers that will be replaced as the head of the base model, which are some Keras layers such as Average Pooling 2D, Flatten, Dense, and Dropout The Average Pooling 2D layer calculates the average output of each feature map in the previous layer and, in order to prevent overfitting, we have a

Trang 3

Fig 2: Block diagram of our proposed system.

50% drop-out rate The fully connected head model is placed

on top of the base model This is the actual model which

will be trained Finally, the model is compiled with the Adam

optimizer and binary cross-entropy loss function

3) Testing Model: In the testing phase, OpenCV is used

for object detection We use both collected videos on Youtube

and real-time video captured by the camera of Raspberry Pi 3

Each video in the testing set will be grabbed by the dimensions

of the frame and then constructed into a blob The blob will

then be passed through the network to obtain a bounding box

for the face detection step Finally, faces in the frame from the

video stream and bounding boxes are added to their respective

lists The prediction of wearing masks will be processed only

if at least one face was detected The detected face locations

and their corresponding locations will be looped over and the

output frame will give out the class label “Mask” or “No

mask” on the bounding box rectangle

B Face Mask Detection and Human Temperature

Measure-ment on Raspberry Pi 3

This section describes the coordination of face mask

detec-tion and temperature measurement The combinadetec-tion of these

two functions must be done on the Raspberry Pi which is

connected with some peripherals such as Raspberry-Pi camera,

temperature sensor, 16x2 LCD, and a buzzer to complete the

block system as shown in Fig 2

First, real-time video from the Raspberry-Pi camera will be

an input for the pre-trained model to detect face masks In

this initial step, the Raspberry-Pi camera also helps to detect

human existence Whenever the face detection step is done,

a non-contact temperature sensor MLX90614 is turned on by

the human existence signal, so that it can send the human

forehead temperature to the Raspberry Pi to analyze The

default temperature is set at 37 degrees Celcius, since it is

the common temperature of a healthy person [11] Meanwhile,

the buzzer gives a warning sound if there is any “No mask” detected cases Afterward, the result of human temperature is displayed on the 16x2 LCD and a buzzer will give a warning sound to indicate an overheated case detected

III RESULTS ANDDISCUSSION The experimental setup computer is Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz with 16.0GB RAM

A Training/Testing the face mask detection model using im-ages in dataset

The parameters are initialized as follows: the learning rate is 0.0001, the number of epochs is 20 and the batch size is 32 Our proposed framework for face mask detection uses 80%

of a total 5481 images for the training phases As shown in Fig 3, the training loss and validation loss were achieved at 4.66% and 5.56%, respectively

A total of 1097 images are used for the testing phases The results are classified into four categories: true positive, true negative, false positive and false negative True positives (TP) and true negatives (TN) are the observations indicating the correct detection False positive (FP) means the number of samples in the detected object category is inconsistent with the actual object category, and false negative (FN) indicates that the actual sample is detected as the opposite result or in the undetected category Because all positive cases predicted

by the model are (TP + FP), the proportion of real cases (TP)

is called the precision rate, which represents the proportion of samples of real cases in positive cases among samples detected

by the model, as shown in equation (1) As shown in Table II, our model can achieve 96% and 99% of precision rate for

“Mask” and “No mask” cases, respectively These optimistic results show the classification ability to accurately detect the considered positive class to the other, such as “Mask” to “No mask” and vice versa

Trang 4

Fig 3: Training accuracy/loss vs number of epochs.

TABLE II: Performance results of our proposed system

Testing images Precision Recall F1-score

P recision = TP

The recall rate is used to measure the ability of the model for

correctly predicting positive observations from all observations

in the actual class [5], as written in (2) It is also called the

sensitivity of the model in detecting the considered positive

images from all the labeled ones As shown in Table II, our

model can achieve 98% of recall rate for both “Mask” and

“No mask” cases, which ensures the ability of our model

in detecting the actual positive images from the correctly

predicted positive images and the incorrectly negative detected

ones

Recall = TP

Equation (1) and (2) show that there is a contradiction

between precision rate and recall rate F1-score represents the

harmonic average of precision rate and recall rate, as shown in

equation (3) It is the weighted average of precision and recall

so that F1-score takes both FP and FN cases into account

As shown in Table II, our model can achieve 97% and 98%

of F1-score for “Mask” and “No mask” cases, respectively

Because our classification problems also consider the TP and

TN cases, we show our high accuracy rate of 98% in Table II

F 1 − score = 2 × P recision × Recall

P recision + Recall (3)

the model work with real-time video

Fig 4 shows some captured frames of successfully detected results from four collected videos from Youtube3,4,5,6 As shown in Fig 4(a), Fig 4(b), Fig 4(c) and Fig 4(d), our model can give a high accuracy rate for “No mask” and “Mask” cases of both single person and multiple people Our proposed model can also correctly recognize whether human faces are covered with scarves, handkerchiefs or fabric face masks, such

as in Fig 4(e) and Fig 4(h) Furthermore, by combining both incorrectly and not wearing mask images as one label named

“No mask”, our model can give high accuracy in detecting some incorrect cases as shown in Fig 4(f) and Fig 4(g), while face masks can not fully cover human faces Last but not least, Fig 4(i) and Fig 4(j) shows some successful detected results, which proves the effectiveness of our proposed model

in recognizing incorrect and correct wearing masks cases Look into some missed detected cases in Fig 5, there are some captured frames without recognizing face masks

In fact, since we aim to deploy our model to process one-by-one person in real-time, this can be improved by setting

an appropriate distance from our system to people, which could help the camera to detect all the details of human faces and masks Based on our experimental results, the most appropriate distance from our system to people is 4 centimeters, which could help both Raspberry-Pi camera and MLX90614 temperature sensor to work at their best to collect all the information from human faces and human temperature

C Testing the face mask detection model by real-time video captured by a built-in camera on Raspberry Pi

We use Raspberry Pi 3 Model B with 1.2 GHz 64-bit quad-core ARMv8 CPU to deploy the face mask detection phase after training it on our experimental setup computer In order

to deploy the model in public places, we conduct a testing phase with real-time video taken from the built-in camera

on Raspberry Pi In this step, we use VNC Viewer which

is a cross-platform screen sharing system that was created to remotely control Raspberry Pi through our experimental setup laptop to collect some successful detected cases as shown in Fig 6 Our model can detect whether people use different common types of face masks, such as medical face masks or fabric pattern face masks, as shown in Fig 6(a) and Fig 6(b),

3 Coronavirus outbreak: Mixed messaging about mandatory face masks, https://www.youtube.com/watch?v=hekZBf8oUq0

4 DWCRA Women Face to Face - Making Masks In Tirupati - Chittoor District - Sakshi TV, https://www.youtube.com/watch?v=EIY9xJc4s0Q

https://www.youtube.com/watch?v=nf4bZgHsa5E

6 Nigerians React To Wearing Face Mask - Street Login, https://www.youtube.com/watch?v=V54hhnyAntU

Trang 5

(a) Video 1 - Frame 1 (b) Video 1 - Frame 2

(c) Video 2 - Frame 1 (d) Video 2 - Frame 2

(e) Video 2 - Frame 3 (f) Video 2 - Frame 4

(g) Video 3 - Frame 1 (h) Video 3 - Frame 2

(i) Video 4 - Frame 1 (j) Video 4 - Frame 2

Fig 4: Example of successful detected cases

(a) Video 1 - Frame 3 (b) Video 2 - Frame 3

Fig 5: Example of missed detected cases

respectively Besides, because of the variety of our dataset, we

can also evaluate some cases when people wear both face mask

and sunglasses, as shown in Fig 6(c) and Fig 6(d), which also

gives high accuracy results Furthermore, our model can either

detect some cases of people who do not wear mask or use

their hand to cover their face and label them as “No mask”,

as shown in and Fig 6(e) and Fig 6(f) For some cases when

people incorrectly wearing mask as shown in Fig 6(g) and

Fig 6(h), our model gives high accuracy of detecting them as

“No mask”, then the buzzer will be turned on for warning

(a) Mask (b) Fabric mask

(c) Mask with sunglasses (d) Fabric mask with

sun-glasses

(e) Not wearning face mask (f) Face covered by hand

(g) Incorrectly wearing mask (h) Incorrectly wearing mask

Fig 6: Example of successful detected cases

(a) Face covered by a note-book

(b) Face covered by a hand-kerchief

Fig 7: Example of failed detected cases

Look into some failed detected cases as shown in Fig 7, there are some missed cases when people intentionally try to cover their face with some other things but masks such as

a notebook or a handkerchief Fig 7(a) and Fig 7(b) show

a low accuracy rate of “Mask” label, which can prove that these issues can be improved by increasing the quantity of our dataset for more incorrectly wearing mask cases

Trang 6

Fig 8: Experimental proposed system

D The face mask detection and human forehead temperature

measurement real-time application

As described in Fig 2, we use a Raspberry Pi with some

connected peripherals to complete the whole system as shown

in Fig 8 This packed system is set on a camera stick holder

with an appropriate height of 1.6 meters To do the experiment,

3 people will go through our system indoors with good light

conditions They are required to step one-by-one at a distance

of 1.5 meters in front of the system for face mask detection

Results show that it takes about 2 seconds to detect face

mask wearing After that, the testing person steps closer to the

system for measuring forehead temperature The appropriate

distance between the MLX90614 sensor and the testing person

is about 4 centimeters The distance for temperature

mea-surement can be lengthened using higher quality temperature

sensor The signal from face mask detection activates the

temperature sensor and the temperature measurement process

takes about 1 second The face mask detection and measured

temperature results are finally displayed on the 16x2 LCD as

shown in Fig 8 If the system detects a “No mask” and/or an

overheated case, a buzzer is turned on and gives a warning

sound for 3 seconds The warning sounds of those cases are

different to be distinguished

In summary, our system takes about 6 seconds for each

person to complete the whole process of face mask detection

and forehead temperature measurement Our system can be

deployed at the entrance of public places with good lighting

conditions Due to the RGB camera, our system will need extra

light support to work at night Last but not least, in order to

ensure the practicality of our system, we use a 220V ac - 5V

DC power adapter, so that our system can work 24/7 for a

long time without human supervision

IV CONCLUSION

In this paper, we propose an embedded machine learning

system for automatically detecting face mask wearing and

measuring human temperature The face mask detector is built

to previous works It takes a short time of 6 seconds for each person to have their face mask detected and temperature measured Our proposed system can be further developed with higher quality peripherals to obtain better results

ACKNOWLEDGMENT This research is funded by Ho Chi Minh City University of Technology – VNUHCM, under grant number SVCQ-2020-DDT-118

REFERENCES [1] A Das, M W Ansari, and R Basak, “Covid-19 face mask detection using tensorflow, keras and opencv,” 12 2020, pp 1–5.

[2] H Adusumalli, D Kalyani, R Sri, M Pratapteja, and P V R D P Rao, “Face mask detection using opencv,” in 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2021, pp 1304–1309.

[3] S A Sanjaya and S Adi Rakhmawan, “Face mask detection using mobilenetv2 in the era of covid-19 pandemic,” in 2020 International Conference on Data Analytics for Business and Industry: Way Towards

a Sustainable Economy (ICDABI), 2020, pp 1–5.

[4] P Nagrath, R Jain, A Madan, R Arora, P Kataria, and J Hemanth,

“Ssdmnv2: A real time dnn-based face mask detection system using single shot multibox detector and mobilenetv2,” Sustainable Cities and Society, vol 66, p 102692, 2021 [Online] Available: https://www.sciencedirect.com/science/article/pii/S2210670720309070 [5] T Q Vinh and N T N Anh, “Real-time face mask detector using yolov3 algorithm and haar cascade classifier,” in 2020 International Conference

on Advanced Computing and Applications (ACOMP), 2020, pp 146– 149.

[6] K Suresh, M Palangappa, and S Bhuvan, “Face mask detection by using optimistic convolutional neural network,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), 2021, pp 1084–1089.

[7] S Yadav, “Deep learning based safe social distancing and face mask detection in public areas for covid-19 safety guidelines adherence,” International Journal for Research in Applied Science and Engineering Technology, vol 8, pp 1368–1375, 07 2020.

[8] A M., S K., S K R., and Y I., “Contactless temperature detection of multiple people and detection of possible corona virus affected persons using ai enabled ir sensor camera,” in 2021 Sixth International Confer-ence on Wireless Communications, Signal Processing and Networking (WiSPNET), 2021, pp 166–170.

[9] A G Howard, M Zhu, B Chen, D Kalenichenko, W Wang, T Weyand,

M Andreetto, and H Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” 2017.

[10] M Sandler, A Howard, M Zhu, A Zhmoginov, and L.-C Chen,

“Mobilenetv2: Inverted residuals and linear bottlenecks,” 2019 [11] N S Yamanoor, S Yamanoor, and K Srivastava, “Low cost design of non-contact thermometry for diagnosis and monitoring,” in 2020 IEEE Global Humanitarian Technology Conference (GHTC), 2020, pp 1–6.

Tiêu đề	An Embedded Machine Learning System For Real-Time Face Mask Detection And Human Temperature Measurement
Tác giả	Lien Nguyen, Trang N.M. Cao, Lam Huynh-Anh, Hanh Dang-Ngoc
Trường học	Ho Chi Minh City University of Technology
Chuyên ngành	Electrical and Electronics Engineering
Thể loại	Conference Paper
Năm xuất bản	2021
Thành phố	Ho Chi Minh City

Định dạng
Số trang	6
Dung lượng	6,03 MB