ee eee 44 4.7 EndoUnet - Confusion matrix on anatomical site classification task GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49 4.8 SFMNet - Confusion matrix on anatomical site clas
Trang 1Hanoi University of Science and Technology School of Information and Communication Technology
7
D
Master Thesis in Data Science
Unified Deep Neural Networks for Anatomical Site
Classification and Lesion Segmentation for Upper
Trang 2Author’s Declaration
Thereby declare that I am the sole author of this Uhesis The results in this work
are not complete copies of any other works
STUDENT
Nguyen Duy Manh
Trang 314 Qutlie of the thesis
2 Artificial Intelligence aud Machine Learning
2.1 Basia concepts
2.21 Supervised learning 2.2.2 Unsupervised uming 0.0 02 0000000
22.8 Reinforcement learning 2.3 Techniques
Convolnrional Neural Network
Trang 4References ñ2
Trang 5
3.9 Overview comparison between FPN and FaPN [15] 38 3.10 Feature alignment module [15] 2.0 2.00000 ee eee eee 39 3.11 Feature selection module [I5] eee 39 4.1 Demostration of upper GI
4.2 Some samples in anatomical dataset
4.3 Some samples in lesion dataset SE
4.4 Some samples in HP dataset 0.0 ee eee 44
4.7 EndoUnet - Confusion matrix on anatomical site classification task
GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49
4.8 SFMNet - Confusion matrix on anatomical site classification task on
Trang 62.3.2.3 Motivation 23.24 Activation function
23.5.1 The Transformer 23.5.2 Transformers for Vision 2
Squeeze and excitation module .-
3.2.5 Feature-atigned pyramid network 3.2.6 Classifiers " An R
Metrics and loss functions
Trang 7
3.9 Overview comparison between FPN and FaPN [15] 38 3.10 Feature alignment module [15] 2.0 2.00000 ee eee eee 39 3.11 Feature selection module [I5] eee 39 4.1 Demostration of upper GI
4.2 Some samples in anatomical dataset
4.3 Some samples in lesion dataset SE
4.4 Some samples in HP dataset 0.0 ee eee 44
4.7 EndoUnet - Confusion matrix on anatomical site classification task
GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49
4.8 SFMNet - Confusion matrix on anatomical site classification task on
Trang 8Abstract
Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images ‘There are several applications for image processing in various fields, including face recognition, optical character recognition,
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread
multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract On
a dataset of 11469 endoscopic images, the models were evaluated and produced
relatively positive results.
Trang 9
3.9 Overview comparison between FPN and FaPN [15] 38 3.10 Feature alignment module [15] 2.0 2.00000 ee eee eee 39 3.11 Feature selection module [I5] eee 39 4.1 Demostration of upper GI
4.2 Some samples in anatomical dataset
4.3 Some samples in lesion dataset SE
4.4 Some samples in HP dataset 0.0 ee eee 44
4.7 EndoUnet - Confusion matrix on anatomical site classification task
GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49
4.8 SFMNet - Confusion matrix on anatomical site classification task on
Trang 10Reinforcement Learning
Trang 11Abstract
Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images ‘There are several applications for image processing in various fields, including face recognition, optical character recognition,
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread
multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract On
a dataset of 11469 endoscopic images, the models were evaluated and produced
relatively positive results.
Trang 12Abstract
Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images ‘There are several applications for image processing in various fields, including face recognition, optical character recognition,
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread
multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract On
a dataset of 11469 endoscopic images, the models were evaluated and produced
relatively positive results.
Trang 13Abstract
Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images ‘There are several applications for image processing in various fields, including face recognition, optical character recognition,
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread
multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract On
a dataset of 11469 endoscopic images, the models were evaluated and produced
relatively positive results.
Trang 14Reinforcement Learning
Trang 15Abstract
Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images ‘There are several applications for image processing in various fields, including face recognition, optical character recognition,
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread
multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract On
a dataset of 11469 endoscopic images, the models were evaluated and produced
relatively positive results.
Trang 16Reinforcement Learning
Trang 17References ñ2
Trang 18Reinforcement Learning
Trang 192.3.2.3 Motivation 23.24 Activation function
23.5.1 The Transformer 23.5.2 Transformers for Vision 2
Squeeze and excitation module .-
3.2.5 Feature-atigned pyramid network 3.2.6 Classifiers " An R
Metrics and loss functions
Trang 20Detailed sevlings of MiT-B2 and MiT-B3 0.0.00
Number of images in each anatomical site and lighting mode Accuracy comparison on the three classification taska
Dive Score comparison on the segmentation task 0
Number of parameters and speed of models
43
a7
48 +
Trang 21Reinforcement learning components 2 00 ee 6
Mlustration of a deep learning model [2] 9
Architecture of a CN 13
Sparse connectivity, viewed from above [Đ] l5
Common activation functions [5] 16
Architecture of an FON [6] 0.0.0 bende eee 19
Architecture of VGGI6 [J] eee 20
Architecture of EndoUNet 31 VGG19-based shared block + 82 ResNet50-based shared bloek c2 262 38 DenseNet121-based shared block 33 EndoUNet decoder configuration 34
SFMNet architecture Grouped compact generalized non-local (CGNL) module [13]
Trang 222.3.2.3 Motivation 23.24 Activation function
23.5.1 The Transformer 23.5.2 Transformers for Vision 2
Squeeze and excitation module .-
3.2.5 Feature-atigned pyramid network 3.2.6 Classifiers " An R
Metrics and loss functions
Trang 23
3.9 Overview comparison between FPN and FaPN [15] 38 3.10 Feature alignment module [15] 2.0 2.00000 ee eee eee 39 3.11 Feature selection module [I5] eee 39 4.1 Demostration of upper GI
4.2 Some samples in anatomical dataset
4.3 Some samples in lesion dataset SE
4.4 Some samples in HP dataset 0.0 ee eee 44
4.7 EndoUnet - Confusion matrix on anatomical site classification task
GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49
4.8 SFMNet - Confusion matrix on anatomical site classification task on
Trang 24References ñ2
Trang 252.3.2.3 Motivation 23.24 Activation function
23.5.1 The Transformer 23.5.2 Transformers for Vision 2
Squeeze and excitation module .-
3.2.5 Feature-atigned pyramid network 3.2.6 Classifiers " An R
Metrics and loss functions
Trang 26Detailed sevlings of MiT-B2 and MiT-B3 0.0.00
Number of images in each anatomical site and lighting mode Accuracy comparison on the three classification taska
Dive Score comparison on the segmentation task 0
Number of parameters and speed of models
43
a7
48 +
Trang 27Reinforcement Learning
Trang 28Reinforcement learning components 2 00 ee 6
Mlustration of a deep learning model [2] 9
Architecture of a CN 13
Sparse connectivity, viewed from above [Đ] l5
Common activation functions [5] 16
Architecture of an FON [6] 0.0.0 bende eee 19
Architecture of VGGI6 [J] eee 20
Architecture of EndoUNet 31 VGG19-based shared block + 82 ResNet50-based shared bloek c2 262 38 DenseNet121-based shared block 33 EndoUNet decoder configuration 34
SFMNet architecture Grouped compact generalized non-local (CGNL) module [13]
Trang 29Detailed sevlings of MiT-B2 and MiT-B3 0.0.00
Number of images in each anatomical site and lighting mode Accuracy comparison on the three classification taska
Dive Score comparison on the segmentation task 0
Number of parameters and speed of models
43
a7
48 +
Trang 30References ñ2
Trang 31
3.9 Overview comparison between FPN and FaPN [15] 38 3.10 Feature alignment module [15] 2.0 2.00000 ee eee eee 39 3.11 Feature selection module [I5] eee 39 4.1 Demostration of upper GI
4.2 Some samples in anatomical dataset
4.3 Some samples in lesion dataset SE
4.4 Some samples in HP dataset 0.0 ee eee 44
4.7 EndoUnet - Confusion matrix on anatomical site classification task
GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49
4.8 SFMNet - Confusion matrix on anatomical site classification task on
Trang 32Detailed sevlings of MiT-B2 and MiT-B3 0.0.00
Number of images in each anatomical site and lighting mode Accuracy comparison on the three classification taska
Dive Score comparison on the segmentation task 0
Number of parameters and speed of models
43
a7
48 +
Trang 33Abstract
Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images ‘There are several applications for image processing in various fields, including face recognition, optical character recognition,
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread
multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract On
a dataset of 11469 endoscopic images, the models were evaluated and produced
relatively positive results.
Trang 34Abstract
Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images ‘There are several applications for image processing in various fields, including face recognition, optical character recognition,
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread
multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract On
a dataset of 11469 endoscopic images, the models were evaluated and produced
relatively positive results.
Trang 35Reinforcement learning components 2 00 ee 6
Mlustration of a deep learning model [2] 9
Architecture of a CN 13
Sparse connectivity, viewed from above [Đ] l5
Common activation functions [5] 16
Architecture of an FON [6] 0.0.0 bende eee 19
Architecture of VGGI6 [J] eee 20
Architecture of EndoUNet 31 VGG19-based shared block + 82 ResNet50-based shared bloek c2 262 38 DenseNet121-based shared block 33 EndoUNet decoder configuration 34
SFMNet architecture Grouped compact generalized non-local (CGNL) module [13]
Trang 36Reinforcement Learning
Trang 372.3.2.3 Motivation 23.24 Activation function
23.5.1 The Transformer 23.5.2 Transformers for Vision 2
Squeeze and excitation module .-
3.2.5 Feature-atigned pyramid network 3.2.6 Classifiers " An R
Metrics and loss functions
Trang 38References ñ2
Trang 39Abstract
Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images ‘There are several applications for image processing in various fields, including face recognition, optical character recognition,
main orientations: (1) as a computer-aided diagnosis to help the physicians for an
efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread
multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract On
a dataset of 11469 endoscopic images, the models were evaluated and produced
relatively positive results.