1. Trang chủ
  2. » Luận Văn - Báo Cáo

Person reidentification in a surveiliance camera network742

27 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 27
Dung lượng 809,01 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

schemes are proposed for both setting of multi-shot person ReID.. sub-space on which projected feature vectors are satisfied above-mentioned conditions.1.4 Fusion schemes for person ReID

Trang 1

MINISTRY OF EDUCATION AND TRAINING

UNIVERSITY OF SCIENCE AND TECHNOLOGY

NGUYEN THUY BINH

Trang 2

This study is completed at:

Hanoi University of Science and Technology

Supervisors:

1 Assoc Prof Pham Ngoc Nam

2 Assoc Prof Le Thi Lan

Reviewer 1: Assoc Prof Tran Duc Tan

Reviewer 2: Assoc Prof Le Nhat Thang

Reviewer 3: Assoc Prof Ngo Quoc Tao

This dissertation will be defended before approval commitee

at Hanoi University of Science and Technology:

Time 9h00 , date 08 month 01 year 2021

This dissertation can be found at:

1 Ta Quang Buu Library - Hanoi University of Science and Technology

2 Vietnam National Library

Trang 3

Motivation

The development of image processing and pattern recognition allows to build an au-tomatic video analysis system This system contains four crucial steps: person detection,tracking, person re-identification and recognition Person re-identification is defined as a

problem which associates images/sequences of images of a pedestrian when he or she moves in

a non-overlapping camera network [7] Although achieving some important milestones, person

single-shot approach each person has sole one image in both gallery and probe sets Inversely,

each person has multiple images in multi-shot approach Noted that probe and gallery sets

view-points, poses, etc, (2) the large number of images for each person in a camera view and the number of persons, (3) the effect of human detection and tracking results

Objective

The thesis has three main objectives as follows:

Trang 4

schemes are proposed for both setting of multi-shot person ReID Beside equal weights,

feature weights are adaptively determined for each query based on the query character-istics

Trang 5

of person ReID Chapter 4 presents a fully-automated person ReID system including human

detection, tracking and person ReID The affect of human detection and segmentation steps

1.1 Datasets and evaluation metrics

Table 1.1 Benchmark datasets used in the thesis

Datasets Time #ID #Cam #Images Label Full frames Resolution Single-shot Multiple-shot Setting VIPeR 2007 632 2 1,264 hand 128x48 X 2

CAVIAR4REID 2011 72 2 1,220 hand vary X 1

PRID-2011 2011 934 2 24,541 hand + 128x65 X X 2

iLIDS-VID 2016 300 2 42,495 hand vary X 2

Five benchmark datasets including VIPeR, CAVIAR4REID, RAiD, PRID-2011 and

VID are used for performance evaluation of the proposed methods in this thesis Among the

above five datasets, CAVIAR4REID and RAID is setup following the first setting, while three

Trang 6

sub-space on which projected feature vectors are satisfied above-mentioned conditions.

1.4 Fusion schemes for person ReID

function to get the final score

1.5 Representative frame selection

all frames in a sequences for person representation [6, 16, 24]

1.6 Fully automated person ReID systems

A fully automated person ReID system has three main phases: human detection, tracking,

REPRESENTATIVE FRAMES SELECTION AND

TEMPORAL FEATURE POOLING

Trang 7

and person matching The first step aims to determining the representative frames used for

person representation Three strategies of representative frame selection are introduced in

this work: four key frames, a walking cycle, and all images Once frames used for person

representation are determined, Gaussian of Gaussian descriptors (GOG) [18] are extracted

(XQDA)[14] technique is performed at the final step to compute the matched individuals foreach given probe person

Image-level features

Temporal pooling layer

Extract walking cycles

Extract key 4 frames

Image-level features

Extract walking cycles

Extract key 4 frames

Figure 2.1 The proposed framework consists of four main steps: representative image selec-tion, image-level feature extraction, temporal feature pooling and person matching

Algorithm 2.2 is conducted online in the test phase

Firstly, a representative walking cycle is chosen from a set of walking cycles of a person

during the moving path based on Flow Engery Profile (FEP) [21] Secondly, four key frames

are taken from this walking cycle Four representative frames are extracted from a walking

cycle: two frames corresponding to local minimum and maximum points of FEP and twoframes that are the middle frames between max- and min-frames

Trang 8

Algorithm 2.1: Algorithm for training phase (Off-line process).

Input: Image sequences on cross-view cameras: X = {X i} , i = 1, Ntr;

Z = {Z j} , j = 1, N tr Ntris the number of persons used for training

Output: Model parameters: W, M

Trang 9

Algorithm 2.2: Algorithm for test phase (On-line process).

Input: A query person: Qi

the gallery set.)

Parameters of the trained model: W, M

as occupied memory Three pooling strategies including min-, average-, and max-pooling

across all video frames are applied in this work

Trang 10

2.2.5 Person matching

XQDA technique is an extended version of the Bayesian face and Keep It Simple and

Straightforward MEtric (KISSME) [11] algorithms, in which the multi-class classification prob-

tational time

Table 2.1 shows the comparison of the three schemes in terms of person ReID accuracy,

computational time, and memory requirement on PRID-2011 dataset The values reported

four key frames, one walking cycle and all frames schemes are 96KB, 312KB, and 2,400KB,

Trang 11

7 9 1 0 % P R I D _ w a

accuracies at rank-5 of using four key frames, one walking cycle, and all frames are 94.70%,

94.99%, and 97.98%, respectively while those at rank-10 are 97.93%, 97.92%, and 99.55%

at rank-1, computational time, and memory requirement on PRID 2011 dataset

Methods Accuracy at rank-1 Frame Computational time for each person (s) Memory

selection

Feature extraction Feature pooling

Person matching Total time Four key frames 77.19 7.500 3.960 0.024 0.004 11.488 96 KB

Walking cycle 79.10 7.500 12.868 0.084 0.004 20.452 312 KB

All frames 90.56 0.000 98.988 1.931 0.004 100.919 2,400 KB

Table 2.2 shows the comparison between the obtained results of the proposed framework

with the state-of-the-art methods, the two best results are in bold This Table shows that the

proposed method outperforms all state-of-the art methods at the rank-1, even in comparison

Trang 12

Table 2.2 Comparison between the proposed method and existing works on PRID 2011 and

iLIDS-VID datasets Two best results are in bold

Matching rates (%) Rank=1 Rank=5 Rank=20 Rank=1 Rank=5 Rank=20

STFV3D + KISSME, ICCV 2015 64.1 87.3 92.0 44.3 71.7 91.7

obtains competitive results in comparison with the method STFV3D [16] Considering three

high-cost computation, large amount of memory, and this becomes a serious problem when

dealing with a large-scale dataset

2.4 Conclusions and Future work

Trang 13

This work bases on an assumption that a person stays in field of camera view in a certain timeduration In the reality, this assumption does not always hold In the future, the proposed

BASED ON FUSION SCHEMES

3.1 Introduction

This chapter will show that person ReID accuracy still can be improved through fusion

schemes Both kind of features including hand-designed and deep-learned features are used forimage representation For hand-designed features, GOG [18] and Kernel Descriptor (KDES)

[1] are considered, while, for deep-learned features, two of the most strongest convolutionalneural networks that are GoogLeNet and Residual Neural Network (ResNet) are employed

Multi-shot person ReID can be divided further into two sub categories: image-to-images

3.2.1.1 The proposed framework

3.2.1.2 Feature fusion strategies

Trang 14

based late fusion Matching

Product-rule-and ranking

ID person

Extracting CNN feature

Extracting KDES feature

Extracting CNN feature

SVM Prediction Early fusion

Model

Query-adaptive late fusion

Extracting GOG feature

Extracting GOG feature

Training phase

Testing phase

Figure 3.1 Image-to-images person ReID scheme

the two common rules that are sum-rule and product-rule Beside equal weights, inspired by

- Product-rule with equal weights:

M

Y

m=1

- Product-rule with query-adaptive weights:

Figure 3.2 shows the proposed framework for images-to-images person ReID In this

framework, the temporal linking between images of the same person is not required, and these

Trang 15

images are treated independently Images-to-images problem is formulated as a fusion function

Image-images person re-identification

Image-images person re-identification

Late fusion based on Product rule

Matching and ranking personID

3.2.3 Obtained results on the first setting

For the first setting, two benchmark datasets: CAVIAR4REID and RAiD are used for

2% to 5% compared to those in the case of using only KDES and CNN features

3.2.3.2 Images-to-images person ReID

By applying the product rule strategy, image-to-images person ReID is mapped into

images-to-images one Figure 3.5 shows the performance of images-to-images person ReID in

Trang 16

5 1 0 1 5 4 0 5 0 6 0 7 0 8 0 9 0 M a c in g a e (

R a n k

6 7 4 7 % G O G

6 5 5 0 % K D E

6 2 6 4 % C N N (a)

5 1 0 1 5 5 0 6 0 7 0 8 0 9 0 1 0 0 M a c in g a s %

R a n k

8 2 8 3 % G O G

8 1 1 9 % K D E

8 2 8 9 % C N N (b)

5 1 0 1 5 5 0 6 0 7 0 8 0 9 0 M a c in g a e ( %

R a n k

8 4 8 6 % G O G

8 1 6 0 % K D E

8 4 7 9 % C N N (c) Figure 3.3 Evaluation the performance of three chosen features (GOG, KDES, CNN) over 10 trials on (a) CAVIAR4REID-case A (b) CAVIAR4REID-case B (c) RAiD datasets in image-to-images case

5 1 0 1 5 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 C M C - C A V I A R 4 R E I D S

M a h g a ( % )

R a n k

3 7 6 9 % S D A L F

6 7 3 1 % E a r l y - f u s i o n ( K D E

7 0 6 4 % P r o d u c t - r u l e ( K D E

7 0 6 1 % Q u e r y - a d a p t i v e ( K

7 2 5 0 % E a r l y - f u s i o n ( G O G

7 3 5 8 % P r o d u c t - r u l e ( G O G

7 3 6 1 % Q u e r y - a d a p t i v e ( G (a)

5 1 0 1 5 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 C M C - C A V I A R 4 R E I D S

M a c in g a e % )

R a n k

4 9 9 7 % S D A L F

8 6 9 7 % E a r l y - f u s i o n ( K D

8 8 6 1 % P r o d u c t - r u l e ( K D

8 8 1 7 % Q u e r y - a d a p t i v e (

8 8 1 7 % E a r l y - f u s i o n ( G O

9 0 3 3 % P r o d u c t - r u l e ( G O

8 9 8 3 % Q u e r y - a d a p t i v e ( (b)

5 1 0 1 5 6 0 8 0 1 0 0 C M C - R A i D S v s M

M a h g r a ( % )

R a n k

5 9 6 3 % S D A L F

8 6 8 5 % E a r l y - f u s i o n ( K D

8 7 6 3 % P r o d u c t - r u l e ( K D

8 7 2 7 % Q u e r y - a d a p t i v e (

8 9 2 9 % E a r l y - f u s i o n ( G O

8 8 4 6 % P r o d u c t - r u l e ( G O

8 8 9 8 % L a t e - f u s i o n ( G O G (c)

Figure 3.4 Comparison the performance of the three fusion schemes when using two or three

features over 10 trials on (a) CAVIAR4REID-case A (b) CAVIAR4REID-case B (c) RAiD datasets in image-to-images case

5 1 0 1 5 6 0 6 5 7 0 7 5 8 0 8 5 9 0 9 5 1 0 0 C M C - C A V I A R 4 R E I D M

M a tc h in g r a te ( % )

R a n k

6 7 5 0 % S D A L F

9 1 5 3 % M v s MG O G + S V M

9 1 3 9 % M v s MK D E S + S V M

8 8 0 6 % M v s MC N N + S V M

9 4 4 4 % M v s ME a r l y - f u s i o n

9 3 8 9 % M v s MP r o d u c t - r u l e

9 4 3 1 % M v s MQ u e r y - a d a p t

Figure 3.5 CMC curves in case A of images-to-images person ReID on the CAVIAR4REID dataset

Trang 17

Table 3.2 Comparison of images-to-images and image-to-images schemes at rank-1 (*)

Trang 18

Extract walking cycles

Extract key 4 frames

features

level features

Sequence-Image-level features

level features

Metric learning Extract

walking cycles

Extract key 4 frames

Query-adaptive late fusion

Matching and ranking Extracting GOG

features Extracting ResNet features

Figure 3.6 The proposed method for video-based person ReID by combining the fusion schemewith metric learning technique

provides a higher performance compared to GOG descriptor, and matching rates at rank-1

are improved by 13.1%, 13.68%, and 14.13% It can be explained that a deeper structure

as ResNet can learn the complex background and determine useful information for personrepresentation The above-mentioned experimental results are compared with several existing

1 0 0

7 9 1 0 % G O G

7 1 3 6 % R e s N e t

8 4 5 7 % P r o d u c t - r u l e a d

8 0 5 6 % R e s N e t

9 1 4 6 % P r o d u c t - r u l e a

Trang 19

5 0 7 0 % R e s N e t

6 0 6 1 % P r o d u c t - r u l e a d

6 7 6 7 % R e s N e t

8 0 7 3 % P r o d u c t - r u l e a d

frames b) frames within a walking cycle c) all frames

Table 3.3 Comparison between the proposed method and existing works on PRID 2011 and

iLIDS-VID datasets.Two best results are in bold

Trang 20

The best matching rate at rank-1 in this method are 93.3% and 82.0%, higher by 1.8% and 0.2%

compared to those of the proposed framework on PRID-2011 and iLIDS-VID, respectively,however CFFM method has to incorporate both CNN and RNN combined with multipleattention networks

3.4 Conclusions

This chapter proposes several feature fusion schemes person ReID in both settings For the

CHAPTER 4 QUANTITATIVE EVALUATION OF AN END-TO-END

PERSON REID PIPELINE

Then, person bounding boxes within a camera field of view (FoV) are connected through

person tracking step Finally, person ReID associates images of the same person when he/she

moves from a camera FoV to the others ones It is worth to noting that in some surveillancesystems, the person segmentation and person detection are coupled

Trang 21

Human detection (Automatic/manual)Segmentation Person

Re-identification

ID person Tracking

Probe Gallery

Figure 4.1 The proposed framework for a fully automatic person ReID system

4.2.1 Pedestrian detection

Concerning person detection, three state-of-the-art person detection techniques that areAggregate Channel Features (ACF) [3], You Only Look Once (YOLO) [20], and Mask R-

CNN [10] are considered For person segmentation, Pedparsing [17] method is used thanks

respectively In order to bring person ReID to practical applications, GOG descriptor is implemented in C++ and the optimal parameters of this descriptor are selected throughintensive experiments The experimental results show that the proposed approach allows

Trang 22

5 0 6 7 % A u t o - d e t e

4 2 1 4 % A u t o - d e t ( A C F ) + A u

3 7 8 7 % A u t o - d e t ( Y O L O ) + A

performance than YOLO one in both cases (without/with segmentation) In addition, the

with the proposed method

88.76% Auto_Detection+Segmentation with the proposed methodwith the proposed method

Figure 4.3 CMC curves of three evaluated scenarios on PRID 2011 dataset when applying

the proposed method in Chapter 2

Figure 4.3 indicates the matching rates on PRID 2011 dataset when employing GOG

Ngày đăng: 12/03/2022, 05:19

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w