1. Trang chủ
  2. » Luận Văn - Báo Cáo

Research on cnn applied in face recognition of international school students

57 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Research on cnn applied in face recognition of international school students
Tác giả Le Thuy Hang
Người hướng dẫn Ph.D. Le Trung Thanh
Trường học Vietnam National University, Hanoi International School
Chuyên ngành Informatics and Computer Engineering
Thể loại Graduation project
Năm xuất bản 2023
Thành phố Hanoi
Định dạng
Số trang 57
Dung lượng 3 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • Chapter 1: Introduction to the topic (10)
    • 1.1. The topic’s scientific and practical importance (10)
    • 1.2. What is the goal of the topic (10)
    • 1.3. Why was this topic chosen? (11)
    • 1.4. Research methods (12)
    • 1.5 Project layout (12)
  • Chapter II: Theoretical basic (13)
    • 2.1 Biological neural network (13)
    • 2.2 Artificial neural network [4] (13)
    • 2.3 Deep learning network model (14)
      • 2.3.1 Some Deep Learning Network Components (15)
  • Chapter III: Design and build models (33)
    • 3.1 Introduction to Matlab (33)
    • 3.2 Data Collection (33)
    • 3.3 Data processing (41)
      • 3.3.1 Data preprocessing (41)
      • 3.3.2 Create train and test datasets (43)
      • 3.3.3 Data Augmentation (44)
    • Chapter 4: Results and Discussion (46)
      • 4.1 Model evaluation results (46)
      • 4.2 Discuss model accuracy and performance (49)
  • Chapter V: Conclusion and development direction (54)
    • 5.1 Conclusion (54)
    • 5.2 Development direction (54)

Nội dung

Research on CNN applied in face recognition of International School students Nghiên cứu ứng dụng CNN trong nhận diện khuôn mặt học sinh Trường Quốc tế

Introduction to the topic

The topic’s scientific and practical importance

The topic "Research on convolutional neural networks applied in face recognition of international school students at Vietnam National University, Hanoi " is important both scientifically and practically

This article explores the application of convolutional neural networks in student image identification, contributing to advancements in related techniques and technologies As artificial intelligence technology rapidly evolves, this research aids in developing methods to address challenges in the field while enhancing understanding of model training and image processing techniques.

The implementation of a convolutional neural network for student identification in photos holds significant practical value for student management at the International School of Vietnam National University, Hanoi This innovative approach streamlines student management processes, enhancing convenience while also minimizing time and costs associated with administrative procedures Furthermore, this research has the potential for application in other educational institutions and organizations, contributing to improved efficiency and quality in the student management process and overall educational experience.

What is the goal of the topic

The project "Research on CNN applied in international school student identification at Vietnam National University” intends to create a convolutional neural

11 network (CNN) model to identify international school students at Vietnam National University in Hanoi This study will specifically determine the following goals:

1 Collect student image data from different sources and preprocess the data to prepare for training the CNN model

2 Train the CNN model on student image data and evaluate the model's effectiveness

3 Apply the trained CNN model to identify actual students and evaluate the accuracy of the model in recognition

4 Compare the effectiveness of the CNN model with traditional identification methods to evaluate the contribution of this research.

Why was this topic chosen?

The project " Research on CNN applied in face recognition of International school students " was selected as my graduation project based on the following highlights:

[3] Applying artificial intelligence to challenges in student management is becoming into an exciting and modern trend in education

Artificial intelligence, particularly through convolutional neural networks (CNNs), plays a significant role in student management by enabling the identification of students via photos CNNs, commonly utilized in image and audio processing, excel at learning image features to classify various objects Consequently, the research titled "Application of Convolutional Neural Networks in Managing International School Students at Vietnam National University, Hanoi" is both relevant and practical This study aims to streamline student management processes, ultimately saving time and resources for the institution.

This thesis will enhance my skills in artificial intelligence and image processing, both of which are rapidly evolving fields with significant future potential Furthermore, this research topic is highly relevant and will contribute meaningfully to the Vietnam National University, Hanoi.

Research methods

The content of the research "Research on convolutional neural networks applied in international school student of Vietnam National University" will be divided into the following main parts:

1 Overview of Convolutional Neural Networks

2 Methods of identifying students on photos

4 Design and train convolutional neural models

Project layout

The layout of the report is divided into five main chapters as follows:

 Chapter 1: An overview about topic

 Chapter 3: System design and model building

 Chapter 4: Implementation results and discussion

 Chapter 5: Conclusion and development direction

In summary, this chapter highlights the rapid advancement of information technology in educational management, demonstrating its significant advantages This compelling evidence is what inspired the selection of this topic for the graduation project, leading to an exploration of the most effective methods for its completion.

Theoretical basic

Biological neural network

Biological neural networks consist of interconnected neurons that are chemically or functionally related, with a complex structure featuring numerous neurons and connections Each neuron can connect to multiple others, forming synapses, including dendrodendritic synapses, when axons and dendrites meet In addition to electrical signaling, these networks also utilize neurotransmitter diffusion for communication.

Artificial neural network [4]

Figure 2: Humans describe the processing architecture at a neuron

Artificial neural networks attempt to mimic how people learn A hidden layer, an output layer, and an input layer make up this structure Every node in every layer is

In a neural network, each interconnected node has a specific weight and threshold, with nodes exceeding a predetermined threshold being classified as activated The input values, represented as nodes x1, x2, , xn, form the input layer and are connected by weights denoted as w Additionally, the bias threshold area is represented by b The calculation for neuron j, referred to as z, is determined using a specific formula.

In a neural network, each connection between neurons is assigned a weight that signifies the significance of the signal transmitted through that connection These weights are fine-tuned during the training process to enhance the network's output based on the input data The output, denoted as Yj, is determined by applying an activation function, such as sigmoid, ReLU, tanh, or softmax, through the formula Yj = f(z).

The theoretical basis of neural networks consists of two main parts: the neural model and the neural network.

Deep learning network model

This is a well-liked deep learning model at the moment This model helps us to build systems with high speed and accuracy [5]

Convolutional Neural Networks (CNNs) analyze images by examining them in segments known as features, which are essentially small two-dimensional arrays By comparing these features, CNNs identify similarities between different images, effectively treating each feature as a mini-image.

When a Convolutional Neural Network (CNN) encounters a new image, it initially lacks knowledge about the image's location and features Consequently, the CNN systematically evaluates every possible position to identify relevant characteristics This process leads to the development of a filter, essential for feature extraction and image analysis.

2.3.1 Some Deep Learning Network Components

Some basic layers of deep learning network :

This neural network architecture, known as a convolutional layer, combines a multilayer perceptron with a feature-oriented layer and dimensionality reduction Unlike traditional layer-based neural networks, convolutional neural networks (CNNs) can analyze and categorize multidimensional input in its original format, eliminating the need for prior feature extraction This capability arises from the generation of features within the CNN layers, which helps prevent information loss that could occur from user-provided features or data reordering.

Figure 4: Convolutional neural network operational structure

The figure depicts the process an input goes through before classification by one of the proposed CNN models

Convolution and grouping techniques involve the application of patches to the input or the outcomes of a previously applied mask, aimed at extracting and reducing the size of features In this study, the application of these masks results in the formation of M = 64 feature matrices.

In this study, we explore convolutional neural networks (CNN) that generate a unique feature matrix through varying mask weights The process begins by applying a convolution mask to the input, resulting in a feature matrix of size (t - a + 1) × (s - b + 1), which is essential for extracting the desired features Subsequently, maximal clustering is employed to refine the data, yielding an effective feature matrix size of (t - a + 1) / c × (s - b + 1) / d when synthesis does not align With parameters set as t = 28 and s = 64, we adjust the dimensions of the masks accordingly Ultimately, these matrix characteristics are linked to the output layer for classification Notably, during the transition process, masks overlap, which is atypical for compositing, leading us to investigate CNN topologies that produce duplicate pool patches.

User -defined parameters significantly influence the performance of Convolutional Neural Networks (CNNs) Key factors to consider include the number of convolutional layers, as well as the geometry of neural arrays and pools Consistent across all evaluated structures are the learning rate, test halting condition, group stride, and optimization procedure By modifying these parameters, it is possible to assess 18 different CNN architectures in terms of their classification ratios.

Figure 5: Describe the convolution layer

Convolution is a process used to apply filters to an image, enabling the calculation of feature intensity For instance, to recognize a face in an image, specific steps are followed to analyze and identify facial features effectively.

 To know the features of the face (called filters or kernels)

 Suppose an image has a size of 5x5 corresponding to the length and width and I will choose a filter with the size of 3x3

 Use the filter to scan the image in turn and perform the convolution multiplication Stride=1

 Each time we perform a convolution multiplication we get a new value

 Scanning through the entire image, we get a new image with a reduced size compared to the original size

 The newly created image is called a feature Map

 Let (x1, y1) be the size of the original image and (x2, y2) the size of the filter Size of newly created feature Map feature with calculated dimensions = (x1 – x2 + 1) x (y1 – y2 + 1)

Figure 6: Convolution X filter = Feature Map

The polling layer is designed to reduce image size by emphasizing essential features while disregarding unnecessary details This reduction in filter size also decreases the computational load on the model Typically, polling layers are configured with a size of (2, 2), a stride of 2, and no padding.

After the picture has gone through several convolutional and composite layers, the model outputs the final layer's tensor and size after having relatively learnt the

19 characteristics of the image (such as the eyes, nose, and face,…) then output tensor of the final layer

Then we use fully connected layers to combine the features of the image to get the output of the model

Nonlinear activation functions play a crucial role in training models, as they enable the classification of complex data that cannot be effectively managed by linear functions While linear activation functions represent a straight line, nonlinear functions create curves that enhance the model's ability to identify patterns in intricate datasets The introduction of nonlinear activation functions addresses the limitations of linear approaches, allowing for more accurate and efficient data classification.

Let's take a look at a softmax nonlinear activation function A graph representing a curve through the input transformation about the segment [-1, 1]

The sigmoid function is a widely used nonlinear activation function that allows neural networks to learn complex relationships and generate non-linear outputs.

1 Công thức tính hàm sigmoid

 exp(-x) is an exponential function with base e (e is a constant of approximately 2.71828)

The sigmoid function, characterized by its S-curve shape, spans the entire real number line As input values trend towards positive infinity, the output of the sigmoid function nears 1, while it approaches 0 as input values head towards negative infinity The midpoint of this curve occurs at x = 0, where the sigmoid function yields a value of 0.5.

The sigmoid function is commonly utilized in the final layers of neural networks to transform outputs into probabilistic predictions, where values near 0 indicate low probability and values near 1 signify high probability Its applications are prevalent in tasks such as binary classification and multiclass classification within neural network architectures.

The tanh function, or hyperbolic tangent function, is a nonlinear activation function commonly utilized in neural networks Unlike the sigmoid function, the tanh function is symmetric around the origin, which leads to varied outputs from preceding layers that are then passed as input to subsequent layers.

The sigmoid function may also be used to represent the tanh function as seen below:

The tanh function can be calculated using an equivalent formula that involves the sigmoid function By applying the sigmoid function to an input of 2x, then multiplying the result by 2 and subtracting 1, we effectively determine the value of the tanh function.

The tanh function takes as an input the value x and returns a value between -1 and

1 When the input x is close to 0, the tanh function returns an approximate value of 0 When x approaches negative infinity , the tanh function moves towards -1, and when x approaches positive infinity, the tanh function approaches 1

Design and build models

Introduction to Matlab

MATLAB, developed by MathWorks, is a premier numerical computing environment and software utilized across various fields such as science and engineering It offers an extensive array of powerful functions and tools designed for numerical calculations, data analysis, programming, and graphical visualization.

MATLAB is a powerful programming language and interactive computing environment designed for numerical calculations, array and matrix manipulation, data analysis, and graph plotting It also offers users access to a variety of built-in tools and libraries for enhanced functionality.

MATLAB offers robust tools for data processing, analysis, and system design, enabling efficient simulation and development of algorithms and applications It is widely used for various tasks related to numerical computation and programming.

MATLAB is extensively utilized in research and education for tackling complex problems and conducting numerical analysis, especially in science, engineering, mathematics, physics, signal and image processing, and artificial intelligence.

In this thesis topic, MATLAB can be used as a powerful tool to perform many important tasks from data collection to processing and building neural network models.

Data Collection

The data collection process is crucial for creating a dataset focused on convolutional neural networks for student identification This thesis utilizes a comprehensive dataset that includes each student's photograph alongside essential information, such as their student ID and name.

This thesis utilizes a data set collected from two students who are the primary implementers of the project Data collection is conducted using MATLAB, which runs code to interface with a connected camera on the machine or device, enabling efficient data gathering.

The data collection process is as follows:

 Step 1: select Add-Ons/ Hardware Support Packages/ USB Webcam/ Acquire images and video from UVC compliant webcam (MATLAB Support Package for USB Webcams)

This package helps to connect to any webcam available on the machine or other webcam devices through the usb

To connect and interact with a camera or webcam using MATLAB, utilize functions like 'webcamlist' to identify available devices and 'webcam' to inspect the properties of the selected webcam.

Here is the explanation of the properties in the above webcam:

1 Name: The name of the webcam, in this case "HD User Facing"

2 AvailableResolutions: Available resolutions for the webcam There are three resolutions listed: '1280x720' , '640x480' and '640x360'

3 Resolution: The current resolution set for the webcam is '1280x720'

4 Saturation: The color saturation of the image The value is 64

5 WhiteBalanceMode: The webcam's current white balance mode In this case, the mode is "auto"

6 Gain: Adjust the brightness of the image The value is 64

7 Hue: Adjust the color of the image Value is 0

8 BacklightCompensation: Adjust the backlight Value is 0

9 ExposureMode: The webcam's automatic brightness adjustment mode In this case, the mode is "auto"

10 Gamma: Adjust the contrast of the image The value is 300

11 Sharpness: Adjust the sharpness of the image The value is 50

12 WhiteBalance: Adjust the white balance of the image The value is 4600

13 Exposure: Adjust the brightness of the image The value is -6

14 Contrast: Adjust the contrast of the image The value is 50

15 Brightness: Adjust the brightness of the image The value is -3

Webcam settings can be adjusted to optimize image quality for face recognition, including parameters such as resolution, brightness, contrast, and white balance These adjustments are essential for capturing clear and consistent images that meet the specific requirements of facial recognition technology.

 Step 3: Make a connection to the webcam and determine how much data to take

1 cam = webcam: This line creates a webcam object in MATLAB, allowing to connect and interact with the webcam connected to the computer

2 faceDetector = vision.CascadeObjectDetector: This line creates a vision.CascadeObjectDetector object, representing an object detector based on the Viola-Jones algorithm This detector is used to detect faces in images

3 sample_images: This line assigns the value 500 to the sample_images variable This is a parameter to specify the number of sample images that will be collected or processed in the next process of the code

Specify the number of samples to take as 500 continuous shots, When the sample count reaches the required number (sample_images) the webcam will stop automatically

Use snapshot to take an image from the webcam and save it to the snaps variable These snaps will be used to search for faces

Once the faceDetector object is created, it can be utilized to implement the Viola-Jones algorithm on images or video frames, providing the location of detected faces within the visuals.

ImageFacePart utilizes the faceDetector to identify faces in photographs, generating an imageFacePart matrix that details the position of each detected face Each row in this matrix represents a vector of four elements: [x y width height], where x and y denote the coordinates of the top-left corner of the crop area, while width and height indicate the dimensions of the crop Following the detection process, a small frame displays the cropped face image, and each facial movement is recorded and saved as a JPG file.

Figure 19: The Hangle dataset after being collected

The results after collecting enough data are saved in the folder corresponding to each student

Continue by constructing an 8-layer model as follows:

Layer 1 contains convolution layers, the relu function, the normal function, and maxpooling

Images with dimensions of 227x227 pixels and three RGB color channels serve as inputs, representing each image as a 3-D matrix of size 227 × 227 This structure allows for the effective processing of color information in red, green, and blue.

Layer 1 of AlexNet is a convolutional layer with 11x11x3 filters Each filter in this convolutional layer slides over the input image and computes the convolution between the filter and the corresponding elements in the image This process creates a new feature map

In the first layer of AlexNet, a total of 96 filters are utilized, resulting in 96 feature maps that are smaller than the original image The dimensions of these feature maps are determined using the formula: output size = (input size - filter size + 2 * padding) / stride + 1.

AlexNet processes input images at a size of 227x227 pixels, utilizing an 11x11 filter without any padding or stride adjustments By applying the formula for calculating output size, the resulting output dimensions are determined to be 217 pixels.

So the result is 96 feature maps with dimensions 217x217

Cross-Channel Normalization (CCN) is used after the convolution layer at layer 1 to perform feature normalization on separate channels of feature maps The parameters are as follows:

The specified window channel size for the sliding window is 5, indicating that normalized values for each pixel are calculated using data from five neighboring pixels.

5 consecutive channels in the feature map

 Alpha: 0.0001: This is a normalization formula parameter It is used to change the degree of normalization The Alpha value is set to 0.0001 in this scenario

The Beta value of 0.75 plays a crucial role in the normalization formula, as it helps quantify the intensity of competition among channels By setting the Beta parameter to 0.75, it provides a clear indication of the competitive landscape in which the channels operate.

 K: 1: This is an extra parameter that is used to avoid division by zero throughout the calculation The K value is set to 1 in this situation

Max pooling is applied following the convolution layer and prior to the cross-channel normalization (CCN) layer to decrease the dimensions of feature maps while highlighting the most significant features.

The following parameters are configured in the maxpooling layer:

The pooling window size is 3x3, which means it will slide over the feature map, extracting the maximum value within each window to create a new feature map.

 Stride: 2x2: The stride when moving the pooling window over the feature map

In this scenario, after every 3x3 window move, it jumps 2 rows and 2 columns (stride = 2) This means that feature maps are half in size after each pooling

Data processing

To enhance the identification and classification of students in images, the Cropping technique is employed to focus solely on the student's face This method effectively eliminates irrelevant data, such as hair, collars, and background elements, ensuring that only the most critical features are retained for analysis.

Figure 20: Original size photo from collection

The image displayed highlights the importance of including essential components, such as the collar, while maintaining an optimal background size of 1440x1800 Using the image in its entirety may introduce noise to the dataset and unnecessarily increase its size To enhance data quality and minimize noise, it is beneficial to crop the image to a smaller size, effectively optimizing the dataset.

42 so, we create images with higher resolution and focus on the most important details of the face

Figure 21: Original image after the surrounding parts have been removed

After applying the Cropping technique, the image measures 584x584 pixels This precise cropping, centered on the face, reduces variations in shape and size across different images Consequently, it allows the recognition model to concentrate on the overall facial features, enhancing the accuracy of face recognition.

Size normalization is essential when gathering image data from diverse sources, as it addresses discrepancies in image dimensions By resizing images to a consistent resolution, we ensure that all input images fed into the neural network are uniform in size This standardization simplifies the processing and computational tasks involved in image analysis.

Figure 22: The face in the picture is different when the surrounding is cut off

The comparison between the original image sized at 717x717 and the cropped version at 584x584 highlights the impact of defined object frames on image dimensions The cropping process results in a noticeable variation in the size of the face within the image.

43 datasets with different sizes of images, there will be data insynchrony Therefore, we need to resize these images to a uniform size of 224x224 pixels

Figure 23: The image after it has been resized to the size 224x224

The image after it has been resized to the size 224x224

The importance of using image resize technique:

Resizing images to a smaller size significantly reduces computational costs associated with convolutional neural networks Large original images can lead to increased transmission times and resource consumption during processing By minimizing image dimensions, we enhance efficiency, thereby accelerating both the training and prediction phases of the network.

When resizing an image, it is crucial to choose the appropriate dimensions and resizing method to retain essential details, such as the student's facial features By making informed size decisions, we can ensure that important information in the image remains intact.

3.3.2 Create train and test datasets

To effectively divide a dataset into training and test sets, the percentage division method is utilized In this case, the datasets in focus are LeThuyHang and NguyenDinhHieu, each containing a specific number of images.

Divide the data set by percentage as follows:

The percentage of training set is 70%

The percentage of the test set is 30%

Observing in the figure above, we can see that the amount of data has been divided equally among 2 datasets of 2 students

Figure 24: The image of the dataset after it has been split to the training and validation part

With the parameter 'randomized' is used to shuffle the data before dividing it into sub-datasets

Selecting 'randomized' shuffles the data samples within each label before splitting, enhancing the diversity and randomness of the resulting sub-datasets This approach minimizes the risk of bias in model training by preventing the ordering of data.

Shuffling the data prior to splitting prevents samples of the same label from being clustered or entirely isolated in sub-datasets This practice promotes a balanced representation of labels in both training and testing phases.

Data augmentation is a very popular technique in Deep Learning and is used to augment training data It allows to create different versions of the same image by

By altering the position, brightness, contrast, rotation, flipping, cropping, and resizing of an image, we can generate multiple new images from the original without the need for additional data collection.

By manipulating a photo of a person's face—such as rotating it, adjusting brightness, cropping, and flipping—we can generate various versions of the image These diverse representations enable the model to learn facial features from multiple angles, ultimately enhancing its accuracy.

The above code uses the imageDataAugmenter function in MATLAB to create an augmenter object to apply random transformations to the image

Specifically, the parameters used in the imageDataAugmenter function have the following meanings:

"RandRotation", [-90 90]: Apply a random rotation between -90 and 90 degrees This allows the image to be rotated at a random angle to create diversity in the data

"RandScale", [1 2]: Random scaling is applied between 1 and 2 This allows the image to be randomly scaled or expanded to create size variation

"RandXReflection", true: Apply random horizontal reflection This means that the image can be flipped horizontally randomly

Figure 25: Using Data Augmentation technique

Utilizing the imageAugmenter object during model training allows for random transformations of images, as dictated by the imageDataAugmenter This process enriches the training dataset with diverse variations, enabling the model to learn broader features and concepts, ultimately improving its generalizability to new data.

Results and Discussion

Select parameters for the model:

In the above code use:

 "InitialLearnRate",0.01: The initial learning rate is set to 0.01 This is the initial value of the learning rate, which determines the learning rate of the model during training

The mini-batch size is configured to 128, representing a small subset of data utilized for gradient computation and weight updates during training This parameter significantly influences both the training speed and the memory requirements.

 "Shuffle","every-epoch": Shuffle data after each epoch This ensures that the samples introduced into the model in each epoch are shuffled to avoid ordered phenomenology

Stochastic Gradient Descent with Momentum (SGDM) is an effective optimization technique that enhances model performance by integrating stochastic gradients derived from random subsamples with momentum This combination accelerates the optimization process while improving accuracy, making SGDM a valuable tool for model optimization.

Get the following performance results:

Figure 27: Training results when setting inititalLearnRate 0.01

The training process concludes when the training loss becomes NaN (not a number), which occurs due to excessively large loss values.

We review the parameters and data to fix this problem

Change InitialLearnRate to 0.001 and MiniBatchSize to 64

The results obtained are as follows:

Figure 28: Training results when setting inititalLearnRate 0.001

In the initial iteration, the Mini-batch Accuracy rate was recorded at 35.94%, while the Validation Accuracy rate reached around 62.43% The Loss function for the Mini-batch was 1.8353, and the Validation Loss stood at 1.0235.

Adjusting the InitialLearnRate parameter to 0.001 is essential for decelerating the learning process within the network layers This reduction in the initial learning rate aims to guarantee that the connections between the layers are thoroughly established and effectively updated.

Setting a small initial learning rate allows the model to make smaller adjustments in the search space, facilitating the discovery of local minima and optimizing the learning process This approach promotes full associativity and enhances the model's convergence.

Subsequent iterations significantly enhanced the accuracy ratio of both Mini-batch and Validation The reduction in loss values for Mini-batch and Validation indicates a marked improvement in the model's accuracy and generalization ability.

The figure illustrates that Validation Accuracy consistently increases from approximately 60% to 100% throughout each iteration Notably, after 40 epochs, the Accuracy achieves a perfect score of 100% Simultaneously, the Loss value shows a continuous decline, signifying that the model has attained its optimal performance level.

4.2 Discuss model accuracy and performance

Figure 30: The results of the predictive model of the True Position (True Class)

Up to 100% correctly predicting labels indicates that the model has achieved optimal performance

Case 1: Using another image in the dataset

Figure 31: Other image prediction results in the dataset

The results show an accuracy rate of up to 100%

Case 2: Identification through the webcam realtime of a desktop device

Figure 32: Face prediction results through camera

The results show an accuracy rate of up to 100%

Case 3: Identification through photos on instagram

Figure 33: Photo prediction results from another device

The results are quite high, because the image is slightly blurred, so the accuracy rate is not high and it is confused with others

Check the face is not angled enough:

Figure 34: Image of the predicted face is not fully angled

Although the predictions came true, the accuracy percentage only managed to reach 88.1%

In this test instance, a hand is used to hide half of the face while the face is unknown:

Figure 35: Picture predicted from any image

99.9% accuracy is a very high value

Use a live camera on desktop computers

Figure 36: Predicted outcomes from a live camera

The outcomes correctly anticipated Nguyen Dinh Hieu's appearance

Test a different person's face without training or a database without:

Figure 37: Outcomes of a photo forecast of a different face

When untrained data is input, it will indicate that the database does not have it

Conclusion and development direction

Conclusion

In this article, we explore the application of convolutional neural networks (CNNs) for student identification We collected and preprocessed facial image data to train the CNN model, ultimately developing a reliable facial recognition system for accurately identifying students.

Development direction

1 Improve accuracy: Improve model accuracy by enhancing training data, optimizing hyperparameters, or using a more advanced deep neural network model This will ensure that the system is able to recognize more accurately in difficult situations such as low light, different shooting angles, or changes in appearance

2 Application expansion: Apply student identification system in other fields such as automatic attendance management system in schools, security access control system, or building a mobile application that allows students face authentication

3 Other technology integration: Combines convolutional neural networks with other recognition technologies such as speech recognition, fingerprint recognition, or handwritten character recognition This helps to build a multi- source system of information and enhances the accuracy and reliability of the identification process

4 Practical system integration: Develop the interface and integrate the student identification system into existing student management systems or access control systems This ensures consistency and convenience for using the student identification system in a real-world environment

5 Integrated deep learning: Explore and apply advanced deep learning techniques and neural network modeling to improve system recognition and accuracy Models like Transformer, GAN (Generative Adversarial Network),

55 or multi-source neural network model can be studied and integrated to give better results

In general, the topic of student identification is a potential field and can be developed in many different directions to improve performance and apply in practice

[1] A K G S M P A Sakshi Indolia, „Procedia Computer Science,” in Conceptual

Understanding of Convolutional Neural Network- A Deep Learning Approach,

Banasthali Vidyapeeth, Rajasthan, India, ScientDirect, 2018., pp 679-688

[2] M M .K en M S B.S., Object recognition in images, Kochi, India: 2016 International Conference on Information Science (ICIS), 2016

[3] D M W a J R Allen, „Brookings,” 24 8 2018 [Online] Available: How artificial intelligence is transforming the world

[4] wikiversity, „wikiversity,” 3 11 2023 [Online] Available: https://en.wikiversity.org/wiki/Artificial_neural_network

[5] G D Luca, „Baeldung,” 17 9 2022 [Online] Available: https://www.baeldung.com/cs/ai-convolutional-neural-networks

[6] A A A H A Mahmoud Hassaballah, in Image Features Detection, Description and

Matching, South Valley University, Springer International Publishing (Verlag), 2016

[7] A Rosebrock, „PyimageSearch,” 21 5 2021 [Online] Available: https://pyimagesearch.com/2021/05/14/convolutional-neural-networks-cnns-and-layer- types/ [Geopend 12 5 2023]

[8] L Panneerselvam, "Activation Functions and their Derivatives – A Quick & Complete Guide," in Data Science Blogathon., 2021

[9] Z M Chng, „Machine Learning Mastery,” 4 6 2022 [Online] Available: https://machinelearningmastery.com/using-activation-functions-in-neural-networks/ [Geopend 13 5 2023]

[10] W D L Jason K Eshraghian, „The fine line between dead neurons and sparsity in binarized spiking neural networks,” Neural and Evolutionary Computing (cs.NE), 2022

[11] J Brownlee, „Machine Learning Mastery,” 9 1 2019 [Online] Available: https://machinelearningmastery.com/rectified-linear-activation-function-for-deep- learning-neural-networks/ [Geopend 20 5 2023]

[12] S K S B B C Shiv Ram Dubey, „Machine Learning (cs.LG),” in Activation

Functions in Deep Learning: A Comprehensive Survey and Benchmark, Kolkata, India,

[13] wikipedia, „Wikipedia,” 7 5 2021 [Online] Available: https://en.wikipedia.org/wiki/Vanishing_gradient_problem [Geopend 9 5 2023]

[14] J Deng, W Dong, R Socher, L.-J Li, K Li en L Fei-Fei, „ImageNet: A large-scale hierarchical image database,” 2009 IEEE Conference on Computer Vision and Pattern

[15] D Shah, „v7labs,” 26 1 2023 [Online] Available: https://www.v7labs.com/blog/cross- entropy-loss-guide [Geopend 21 5 2023]

[16] A Kumar, „vitalflux,” 3 4 2023 [Online] Available: https://vitalflux.com/mean- squared-error-vs-cross-entropy-loss-function/ [Geopend 22 5 2023]

[17] M u Hassan, „Neurohive,” 29 10 2018 [Online] Available: https://neurohive.io/en/popular-networks/alexnet-imagenet-classification-with-deep- convolutional-neural-networks/ [Geopend 23 5 2023].

Ngày đăng: 26/02/2025, 22:29

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[1] A. K. G. S. M. P. A. Sakshi Indolia, „Procedia Computer Science,” in Conceptual Understanding of Convolutional Neural Network- A Deep Learning Approach, Banasthali Vidyapeeth, Rajasthan, India, ScientDirect, 2018., pp. 679-688 Sách, tạp chí
Tiêu đề: Conceptual Understanding of Convolutional Neural Network- A Deep Learning Approach
Tác giả: A. K. G. S. M. P. A. Sakshi Indolia
Nhà XB: Banasthali Vidyapeeth
Năm: 2018
[2] M. M. .K. en M. S. B.S., Object recognition in images, Kochi, India: 2016 International Conference on Information Science (ICIS), 2016 Sách, tạp chí
Tiêu đề: Object recognition in images
Tác giả: M. M. .K., M. S. B.S
Nhà XB: 2016 International Conference on Information Science (ICIS)
Năm: 2016
[3] D. M. W. a. J. R. Allen, „Brookings,” 24 8 2018. [Online]. Available: How artificial intelligence is transforming the world Sách, tạp chí
Tiêu đề: Brookings
Tác giả: D. M. W., J. R. Allen
Nhà XB: Brookings
Năm: 2018
[4] wikiversity, „wikiversity,” 3 11 2023. [Online]. Available: https://en.wikiversity.org/wiki/Artificial_neural_network Sách, tạp chí
Tiêu đề: wikiversity
Năm: 2023
[5] G. D. Luca, „Baeldung,” 17 9 2022. [Online]. Available: https://www.baeldung.com/cs/ai-convolutional-neural-networks Sách, tạp chí
Tiêu đề: Baeldung
Tác giả: G. D. Luca
Năm: 2022
[6] A. A. A. H. A. Mahmoud Hassaballah, in Image Features Detection, Description and Matching, South Valley University, Springer International Publishing (Verlag), 2016 Sách, tạp chí
Tiêu đề: Image Features Detection, Description and Matching
Tác giả: A. A. A. H. A. Mahmoud Hassaballah
Nhà XB: South Valley University
Năm: 2016
[7] A. Rosebrock, „PyimageSearch,” 21 5 2021. [Online]. Available: https://pyimagesearch.com/2021/05/14/convolutional-neural-networks-cnns-and-layer-types/. [Geopend 12 5 2023] Sách, tạp chí
Tiêu đề: PyimageSearch
Tác giả: A. Rosebrock
Năm: 2021
[8] L. Panneerselvam, "Activation Functions and their Derivatives – A Quick & Complete Guide," in Data Science Blogathon., 2021 Sách, tạp chí
Tiêu đề: Activation Functions and their Derivatives – A Quick & Complete Guide
Tác giả: L. Panneerselvam
Nhà XB: Data Science Blogathon
Năm: 2021
[9] Z. M. Chng, „Machine Learning Mastery,” 4 6 2022. [Online]. Available: https://machinelearningmastery.com/using-activation-functions-in-neural-networks/.[Geopend 13 5 2023] Sách, tạp chí
Tiêu đề: Machine Learning Mastery
Tác giả: Z. M. Chng
Năm: 2022
[10] W. D. L. Jason K. Eshraghian, „The fine line between dead neurons and sparsity in binarized spiking neural networks,” Neural and Evolutionary Computing (cs.NE), 2022 Sách, tạp chí
Tiêu đề: The fine line between dead neurons and sparsity in binarized spiking neural networks
Tác giả: W. D. L. Jason K. Eshraghian
Nhà XB: Neural and Evolutionary Computing
Năm: 2022
[11] J. Brownlee, „Machine Learning Mastery,” 9 1 2019. [Online]. Available: https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/. [Geopend 20 5 2023] Sách, tạp chí
Tiêu đề: Machine Learning Mastery
Tác giả: J. Brownlee
Năm: 2019
[12] S. K. S. B. B. C. Shiv Ram Dubey, „Machine Learning (cs.LG),” in Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark, Kolkata, India, Neurocomputing, 2021 Sách, tạp chí
Tiêu đề: Activation "Functions in Deep Learning: A Comprehensive Survey and Benchmark
[13] wikipedia, „Wikipedia,” 7 5 2021. [Online]. Available: https://en.wikipedia.org/wiki/Vanishing_gradient_problem. [Geopend 9 5 2023] Sách, tạp chí
Tiêu đề: Wikipedia
Nhà XB: Wikipedia
Năm: 2021
[14] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li en L. Fei-Fei, „ImageNet: A large-scale hierarchical image database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, Vols. %1 van %2 20-25 June 2009, 2009 Sách, tạp chí
Tiêu đề: ImageNet: A large-scale hierarchical image database
Tác giả: J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei
Nhà XB: 2009 IEEE Conference on Computer Vision and Pattern Recognition
Năm: 2009
[15] D. Shah, „v7labs,” 26 1 2023. [Online]. Available: https://www.v7labs.com/blog/cross-entropy-loss-guide. [Geopend 21 5 2023] Sách, tạp chí
Tiêu đề: v7labs
Tác giả: D. Shah
Năm: 2023
[18] G. H. A. K. I. S. R. S. Nitish Srivastava, „Dropout: A Simple Way to Prevent Neural Networks from,” Journal of Machine Learning Research , nr. Journal of Machine Learning Research 15 (2014) , pp. 1929-1958, 2014 Sách, tạp chí
Tiêu đề: Dropout: A Simple Way to Prevent Neural Networks from
Tác giả: G. H. A. K. I. S. R. S. Nitish Srivastava
Nhà XB: Journal of Machine Learning Research
Năm: 2014
[19] M. Lotfinejad, „Dataquest,” 11 10 2022. [Online]. Available: https://www.dataquest.io/blog/regularization-in-machine-learning/. [Geopend 24 5 2023] Sách, tạp chí
Tiêu đề: Dataquest
Tác giả: M. Lotfinejad
Năm: 2022
[20] Wikipedia, „Wikipedia,” 1 6 2020. [Online]. Available: https://en.wikipedia.org/wiki/MATLAB. [Geopend 24 5 2023] Sách, tạp chí
Tiêu đề: Wikipedia
Năm: 2020
[16] A. Kumar, „vitalflux,” 3 4 2023. [Online]. Available: https://vitalflux.com/mean- squared-error-vs-cross-entropy-loss-function/. [Geopend 22 5 2023] Link
[17] M. u. Hassan, „Neurohive,” 29 10 2018. [Online]. Available: https://neurohive.io/en/popular-networks/alexnet-imagenet-classification-with-deep-convolutional-neural-networks/. [Geopend 23 5 2023] Link

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w