In recent years, deep learning based approach has been proven to outperform most traditional ones for person re-identification.. In this report, I introduce deep learning concept and pro
Trang 1SCHOOL OF ELECTRONICS AND TELECOMMUNICATIONS
INTERNSHIP REPORT
Topic:
PERSON RE-IDENTIFICATION
Instructor: Dr Vo Le Cuong Student: Nguyen Tuan Nghia Student ID: 20122147
Class: ET AP K57
Hanoi, 3-2017
Trang 2REVIEWS OF INTERNSHIP REPORT
Student’s name: Nguyen Tuan Nghia Student ID: 20122147
Class: Electronics and Telecoms AP Course: 57
Instructor: Dr Vo Le Cuong
Critical Officer:
1 Internship report content:
2 Reviews of Critical Officer:
Hanoi, / /2017 Critical Officer (sign and write full name)
Trang 33
INTRODUCTION
Internship is an important phase to every undergraduate student before working
on graduation thesis Since there are many differences between theory and practice, internship grants student a chance to develop and apply their knowledge adaptively on practical situations In addition, it provides students a view of expert working environment, in which they also learn to teamwork, to communicate and to present their works to others Those skills are essential that every engineer should have
It is getting simpler for students, especially who are learning electronics and telecommunications, to do internship nowadays As there are many high-tech companies
in Viet Nam, students are allowed to choose ones of interest Being able to work in an expected environment help them earn experience much faster than ever If everything is done well, they also have chance to continue working for the company after graduation
I am lucky to be accepted to do internship under Dr Vo Le Cuong’s instruction
in AICS Lab, locating at room 618 of Ta Quang Buu library in Hanoi University of Science and Technology In this report, I will introduce AICS Lab in section 1 Section
2 will be focusing on my research during the internship I would like to sincerely thank
Dr Cuong and all staffs of School of Electronics and Telecommunications for helping
me complete my internship I would like to sincerely thank Prof Hyuk-Jae Lee of Computer Architecture & Parallel Processing Lab, Seoul Nation University for allowing
me to use his workstation Without his kindness, I cannot conduct any experiments due
to lacks of hardware
Trang 4ABSTRACT
Person re-identification, known as a process of recognizing an individual in camera network, is a fundamental task in automated surveillance It has been receiving attentions for years This task is challenging due to problems such as appearance variations of an individual across different cameras or low quality of video and image resolution There have been many proposals to improve the accuracy of this process In recent years, deep learning based approach has been proven to outperform most traditional ones for person re-identification In this report, I introduce deep learning concept and propose a method to optimize multi-shot deep learning based approach for person re-identification using Recurrent Neural Network I conduct extensive experiments to compare different architectures and find out that Gated Recurrent Unit is the most optimized one that helps achieve highest accuracy while having a reasonable number of parameters
Trang 55
TABLE OF CONTENTS
INTRODUCTION 3
ABSTRACT 4
TABLE OF CONTENTS 5
LIST OF FIGURES 6
LIST OF TABLES 6
LIST OF ABBREVIATIONS 7
SECTION 1: AICS LAB 8
1.1 General information 8
1.2 Projects and research areas 8
SECTION 2: INTERNSHIP CONTENT 10
2.1 Deep learning 10
2.1.1 The concept 10
2.1.2 Deep learning for person re-identification 11
2.2 Multi-shot deep learning methods 12
2.2.1 Recurrent Neural Network 13
2.2.2 Long Short Term Memory Network 14
2.2.3 Gated Recurrent Unit 17
2.3 Experiment 17
2.3.1 Caffe 18
2.3.2 Datasets and evaluation settings 19
2.3.3 Network implementations 20
2.3.4 Classifier 22
2.3.5 Result 22
2.4 Conclusion 23
SECTION 3: GRADUATION THESIS PLAN 24
REFERENCE 25
APPENDIX: IMPLEMENTATION DETAILS 27
Trang 6LIST OF FIGURES
Figure 2.1 Problems when choosing algorithm to map input x to category y [1] 11
Figure 2.2 Recurrent Neural Networks with loops [18] 13
Figure 2.3 Unrolled recurrent neural network [18] 13
Figure 2.4 The repeating module in a standard RNN [18] 13
Figure 2.5 RNN make uses of temporal information [18] 14
Figure 2.6 The problem of long term dependencies [18] 14
Figure 2.7 The repeating module in an LSTM [18] 15
Figure 2.8 Forget gate f [18] 15
Figure 2.9 Input gate i and candidate vector C̃ [18] 16
Figure 2.10 Updating cell state C [18] 16
Figure 2.11 Output gate o and hidden output h [18] 16
Figure 2.12 Repeating module of Gated Recurrent Unit [18] 17
Figure 2.13 Experiment procedure 18
Figure 2.14 Data split settings for PRID-2011 20
Figure 2.15 LSTM with Peephole connections [15] [18] 20
Figure 2.16 LSTM with coupled gate [18] 21
Figure 2.17 Recurrent Feature Aggregation Network [9] 21
LIST OF TABLES Table 2.1 Performance of different LSTM architectures (Rank-1 accuracy) 23
Table 2.2 Size of different models (file caffemodel) 23
Trang 77
LIST OF ABBREVIATIONS
CNN Convolutional Neural Network
GRU Gated Recurrent Unit
LBP Local Binary Pattern
LSTM Long Short Term Memory
RFA Recurrent Feature Aggregation
RNN Recurrent Neural Network
SIFT Scale-Invariant Feature Transform SVM Support Vector Machine
Trang 8SECTION 1: AICS LAB
1.1 General information
AICS Lab, located at room 618 inside Ta Quang Buu library, is a laboratory of School of Electronics and Telecommunications, belonging to research center of Hanoi University of Science and Technology Its research field includes IC, computer vision and camera sensor AICS Lab has been making positive contribution to the development
of School of Electronics and Telecommunications
AICS Lab was found in 2010 by Dr Vo Le Cuong with 5 members At first, there were many difficulties such as shortage of facilities and equipment However, with youth power and passion towards researching, they have won many achievements as well as completed numerous of projects Currently, there are 5 official members and 10 trainees working on different areas Members of AICS Lab have high chance of working for big company after graduation and studying abroad in developed countries
AICS Lab provides an open working environment Lab room has essential equipment for working such as desks and computers Members can also decorate their own workspaces with anything of interest for convenience People work for all week days from 8 A.M All works will be reported to Dr Cuong twice a week through a quick discussion and a long one Meeting schedule is decided by both members and Dr Cuong via email In the long meeting, each member will present their work and receive comments for what to do next week Besides working, members also have extracurricular activities such as eating lunch together
1.2 Projects and research areas
There are currently three projects and two research areas They are:
Lens defect detection (in co-operation with Haesung Vina Co., Ltd): focus
on applying efficient algorithm and building an automated lens defect detecting system
Trang 99
Rolling door application (in co-operation with Kato Company): focus on building application that allows users to operate and control rolling door and to protect their house from thieves
Football player tracker (in cooperation with Vietnam Television): focus on building an automated system that can recognize and track football player
Image processing on FPGA: focus on implementing image processing algorithm on FPGA for real-time object detection system
Person re-identification: focus on building an efficient algorithm for person re-identification task
Trang 10SECTION 2: INTERNSHIP CONTENT
2.1 Deep learning
2.1.1 The concept
In computer science, machine learning is subfield that gives computers the ability
to learn without being explicitly programmed Machine learning explores the study and construction of algorithms that can learn from and make predictions on data [1] Machine learning is employed in a wide range of computing tasks including computer vision and pattern recognition
A machine learning algorithm is an algorithm that is able to learn from data Mitchell (1997) provides the definition [1] “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” In the context, the task, T, is generally defined and will change to fit a specific problem For example, if we want a robot to walk, then walking is the task, or if we want a robot to speak, then speaking is the task Similar to human learning process in which an individual improve himself through experience, machine builds up experience E by measuring its performance P while trying to solve the task T Every attempt to solve the task helps the machine learn from and gradually construct a model to fit the given data
Machine learning strictly depends on data Simple data structure requires simple learning algorithm As the data becomes complicated, building an equivalent algorithm
is essential Trying to model complicated data structure with simple algorithm causes underfitting problem which results in inaccurate predictions The idea of deep learning was proposed to fulfill the need of such difficult problems
Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data The term ‘deep’ generally describes a property of this kind of learning algorithm In neural computing, instead of having one hidden layer, a deep feedforward network can have ten times more, in which each layer
Trang 1111
connects to others in an unordered manner Each feedforward network approximates a function f that maps input x to a category y In overall, whole deep network creates a complicated function that is composition of many sub-functions inside it
Deep learning used to be a theoretical idea instead of being applied in real tasks due to its computational expensiveness However, as hardware becomes more powerful recently, it is practically proven to be very effective in many field such as computer vision Thus, using deep learning to solve a difficult task like person re-identification is reasonable
Figure 2.1 Problems when choosing algorithm to map input x to category y [1]
2.1.2 Deep learning for person re-identification
Person re-identification (person re-id) deals with the problem of recognizing an individual across non-overlapping cameras When a person appears in a camera, re-id system should be able to distinguish him with other persons The core idea of re-id is classification Thus, having a unique set of features for each person is essential
Traditional methods focus on building an effective algorithm to extract feature based on a specific characteristic of input images Numerous types of features have been explored to represent persons including global features like color and texture histograms [2, 3] or local features such as SIFT [4] and LBP [5] Those features may perform well
on some datasets but poorly on others In order to solve adaptation problem, some
Trang 12learning-based methods are proposed Deep learning is the one that has been acknowledged for its great and stable performance on multiple datasets
In person re-id, various deep learning architectures such as convolutional deep neural networks [6] and recurrent neural networks [7] have been applied The input of a network is typically image or video of a person and output is a set of learned features that can be further classified with any metric-learning methods There are two procedures
to build a deep learning model for a dataset At first, the model needs to learn the features
of different persons in training data After a number of training iterations, its performance is measured on testing data for how unique the extracted features are, or in equivalent, how those features work with classifiers Deep learning is similar to human learning process, in which students are taught in a period of time and their score on final exam measures how well they learned from the course
There are also two deep learning based approaches to solve re-id problem The first one is single-shot based methods, in which the input as well as learned features are from one single image of a person However in practical surveillance, persons always appear in a video rather than in a single-shot image Multi-shot based methods are proposed to make full use of temporal information In comparison with single-shot, multi-shot features are obtained from a sequence of images, which contain human pose changes and appearance variations as well as frame-wise level features The disadvantage of this method is its expensive computation and thus inappropriate for real applications at present Therefore, it is of great importance to explore a more effective and efficient scheme to make full use of the richer sequence information for person re-
id Also, the rapid growth of hardware for deep learning is expected to help bring this method to real life
2.2 Multi-shot deep learning methods
As stated above, multi-shot features are obtained from a sequence of images Extracted features are able to describe the whole sequence instead of each image disjointedly In other words, given a set of images, the order of which each image is fed
Trang 132.2.1 Recurrent Neural Network
Recurrent Neural Networks were proposed to address the above issue They are networks with loops in them, allowing information to persist
Figure 2.2 Recurrent Neural Networks with loops [18]
In Figure 2.2, a chunk of neural network, A, looks at some inputs xt and outputs
a value ht The loop connection allows information at time step t to be fed to the next step t+1 RNN can be thought of as multiple copies of the same network, each passing a message to a successor Figure 2.3 and 2.4 shows what happens if we unroll the loop
Figure 2.3 Unrolled recurrent neural network [18]
Figure 2.4 The repeating module in a standard RNN [18]
Trang 14There have been incredible success applying RNNs to a variety of problems including person re-id McLaughlin, N et al achieved a rank-1 accuracy of 70% [7] on PRID-2011 dataset by using RNNs combined with CNN for their proposed model
2.2.2 Long Short Term Memory Network
RNNs connect information from previous input to the present one, which is extremely useful to understand the changes of a person in the video They are capable of remembering information over time However, when the time gap grows, RNNs become less effective since it starts to forget relevant information and keeps only redundant one [8] In order to solve such long-term dependencies problem, Hochreiter & Schmidhuber introduced Long Short Term Memory [9], a special kind of RNN
Figure 2.5 RNN make uses of temporal information [18]
Figure 2.6 The problem of long term dependencies [18]
LSTM introduces a more complicated structure inside one chunk of network with
a new output cell state, which works like a memory LSTM has ability to remove or add information to the cell state, carefully regulated by structures called gate Gates are composed of a sigmoid neural network layer and a pointwise multiplication operation The outputs of sigmoid layer are between zero and one, describing how much of each