1. Trang chủ
  2. » Giáo Dục - Đào Tạo

VISUAL ATTENTION IN DYNAMIC NATURAL SCENES 3

8 162 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 8
Dung lượng 12,67 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

4.1.3.4 Categorical Fixation MapsFollowing the clustering of 1150 scenes represented by the first frame of the cor-responding scenes into scene categories, we computed categorical fixati

Trang 1

N = 2 N = 3 N = 4

with PC2 and PC3 Gist descr

Figure 4.12: Examples of movie frames with computed gist descriptors Gist descriptors are colour coded for spatial scales

82

Trang 2

Examples from scene category 1

Examples from scene category 2

Figure 4.13: Example scenes belonging to 2 different clusters The unsupervised cluster-ing method has allocated scenes to two di↵erent and semantically meancluster-ingful clusters

had either nature scenes (mountains, forests, landscapes, etc.,) or manmade scenes (tall buildings, railways stations, houses, etc.,), as shown in Figure 4.13 Secondly, the employed gist descriptor method uses the spatial frequency signature to quan-tify scene gist, thus resulting in a much simpler final gist vector In fact,Oliva and Torralba (2001) also defined very few scene categories (mountain, beach, forest, highway, street, and indoor) over a much more variable scene database

To assess the quality of the clusters found, we used an isolation distance met-ric (Schmitzer-Torbert et al., 2005) The isolation distance metric is shown for di↵erent clusters in Figures 4.14 and 4.15 The method basically gives a measure

of separation between clusters by computing the mahalanobis distance of the Kth

closest point outside the cluster Here K is the total number of points inside the cluster Thus, a larger number for any given cluster implies that it is more isolated from its neighboring clusters

Trang 3

0 0.5 1 0

0.5

1 dim−5 vs dim−7

Cluster1 Cluster2

−0.2

0

0.2 dim−1 vs dim−2

0 0.5

1 dim−1 vs dim−3

−0.5 0 0.5 dim−1 vs dim−4

0 0.5

1 dim−1 vs dim−5

−0.2 0 0.2 dim−1 vs dim−6

0 0.5

1 dim−1 vs dim−7

−0.5

0

0.5 dim−1 vs dim−8

−0.20 0 0.2 0.5

1 dim−2 vs dim−3

−0.5 0 0.5 dim−2 vs dim−4

−0.20 0 0.2 0.5

1 dim−2 vs dim−5

−0.2 0 0.2 dim−2 vs dim−6

−0.20 0 0.2 0.5

1 dim−2 vs dim−7

−0.5

0

0.5 dim−2 vs dim−8

−0.5 0 0.5 dim−3 vs dim−4

0 0.5

1 dim−3 vs dim−5

−0.2 0 0.2 dim−3 vs dim−6

0 0.5

1 dim−3 vs dim−7

−0.5 0 0.5 dim−3 vs dim−8

−0.50 0 0.5

0.5

1 dim−4 vs dim−5

−0.2 0 0.2 dim−4 vs dim−6

−0.50 0 0.5 0.5

1 dim−4 vs dim−7

−0.5 0 0.5 dim−4 vs dim−8

−0.2 0 0.2 dim−5 vs dim−6

−0.5

0

0.5 dim−5 vs dim−8

−0.20 0 0.2 0.5

1 dim−6 vs dim−7

−0.5 0 0.5 dim−6 vs dim−8

Simulation:927 −− Regions : 2x2 Cluster space of training frames : Cluster1 (936 frames) Cluster2 (208 frames)

Isolation distance [ 47.14 , 16.68 ]

−0.5 0 0.5 dim−7 vs dim−8

Figure 4.14: Example of clusters found using the reduced dimension of the gist descriptor for the training frames The example is shown for simulation 927, for which we found two scene categories (labeled using blue and red colours)

84

Trang 4

0 0.5 1 0

0.5

1 dim−5 vs dim−7

Cluster2 Cluster3 Cluster1

−0.2

0

0.2 dim−1 vs dim−2

0 0.5

1 dim−1 vs dim−3

−0.2 0 0.2 dim−1 vs dim−4

0 0.5

1 dim−1 vs dim−5

−0.2 0 0.2 dim−1 vs dim−6

0 0.5

1 dim−1 vs dim−7

−0.5

0

0.5 dim−1 vs dim−8

−0.20 0 0.2 0.5

1 dim−2 vs dim−3

−0.2 0 0.2 dim−2 vs dim−4

−0.20 0 0.2 0.5

1 dim−2 vs dim−5

−0.2 0 0.2 dim−2 vs dim−6

−0.20 0 0.2 0.5

1 dim−2 vs dim−7

−0.5

0

0.5 dim−2 vs dim−8

−0.2 0 0.2 dim−3 vs dim−4

0 0.5

1 dim−3 vs dim−5

−0.2 0 0.2 dim−3 vs dim−6

0 0.5

1 dim−3 vs dim−7

−0.5 0 0.5 dim−3 vs dim−8

−0.20 0 0.2

0.5

1 dim−4 vs dim−5

−0.2 0 0.2 dim−4 vs dim−6

−0.20 0 0.2 0.5

1 dim−4 vs dim−7

−0.5 0 0.5 dim−4 vs dim−8

−0.2 0 0.2 dim−5 vs dim−6

−0.5

0

0.5 dim−5 vs dim−8

−0.20 0 0.2 0.5

1 dim−6 vs dim−7

−0.5 0 0.5 dim−6 vs dim−8

Simulation:930 −− Regions : 2x2 Cluster space of training frames : Cluster1 (155 frames) Cluster2 (495 frames) Cluster3 (494 frames)

Isolation distance [ 14.91 , 13.93 , 26.60 ]

−0.5 0 0.5 dim−7 vs dim−8

0 0.5 1

0

0 0.5 1

0

0 0.5 1

0

0 0.5 1

0

0 0 0.5 1

0 0

0 0 0.5 1

0 0

0 0 0.5 1

0 0

0

0 0.5 1

0

0 0.5 1

0

0 0

0.5

1

0 0

0 0 0.5 1

0 0

0

0

0 0 0.5 1

0 0

0

Figure 4.15: Additional examples of clusters found by the algorithm These examples

Trang 5

4.1.3.4 Categorical Fixation Maps

Following the clustering of 1150 scenes (represented by the first frame of the cor-responding scenes) into scene categories, we computed categorical fixation maps

In total three types of categorical fixation maps were computed; a map with the centre bias intact, an average map, and a map without the centre bias

The map with the centre bias intact was built using early fixations in the training data For each scene category, we aggregated the early fixations on a blank 2D map, from all subjects and over all the scenes in that particular category

We then convolved them with a 2D Gaussian kernel The standard deviation ( )

of the kernel was set to 2 of visual angle (⇡ 40 x 40 pixels), corresponding to the high acuity foveal zone

An average fixation map was computed in a similar fashion except that early fixations were aggregated across scene categories thus yielding one average map per simulation and per choice of N A third type of fixation map, with no centre bias, was computed to illustrate di↵erences in the fixation pattern for each scene category The centre bias was removed by subtracting the average fixation map from the map with the centre bias intact map of each scene category (O’Connell and Walther, 2012)

Figure 4.16 shows examples of all three types of categorical fixation maps for two simulations and di↵erent choices of N As evident, the maps with no center bias exhibits a distinguishing fixation patterns for di↵erent scene categories This encourages further investigation of the idea that saliency maps, modulated with scene category appropriate fixation maps, would yield a better prediction of human visual attention It is important to mention that few frames, in each simulation, were categorized to noise cluster by the clustering algorithm (Harris et al., 2000) Thus categorical fixation maps were computed from total number of frames less than 1150, as also evident by the number listed over average fixation maps in

86

Trang 6

Figure 4.16.

4.1.3.5 Scene Classification in Test Data

All scenes (1150 in total) of the test data were classified into one of the scene categories obtained from training data The classification process was carried out

as follows For each learned scene category we computed a cluster centroid in gist descriptor space These centroid represented the centre of gravity for each scene category Subsequently to classify a scene in test data we first computed an euclidian distance of scene’s gist descriptor (computed from first frame) to each centroid This was followed by labeling the test scene with scene category having the shortest distance Figure 4.17 shows classification results on four simulations

4.1.3.6 Control Conditions for Gist Modulation

We had two control conditions for the gist dependent modulation of saliency maps

In first condition we modulated the saliency maps in test scene with the fixation map from di↵erent scene category As an example, a test scene classified to scene category 1 would be modulated by a fixation map from scene category 2, in two scene category case We termed this as Gist scrambled condition In second condition we modulated the saliency maps in test scene using average fixation map and termed it as average condition These two control conditions enabled us

to assess that improvements in model’s prediction, after the integration of scene’s gist, was not completely due to inclusion of the centre bias

Trang 7

No centre bias

Categorical fixation maps

Regions: 2x2

Frames : 155 Frames : 495 Frames : 494

Frames : 155 Frames : 495 Frames : 494

Frames : 1144

Frames : 288 Frames : 634 Frames : 224

Frames : 288 Frames : 634 Frames : 224

Frames : 1146

Regions: 3x3

Frames : 644 Frames : 498

Frames : 644 Frames : 498

Frames : 1142

Regions: 4x4

Frames : 104 Frames : 187 Frames : 116 Frames : 741

Frames : 104 Frames : 187 Frames : 116 Frames : 741

Frames : 1148

Simulation : 930

Regions: 2x2 Simulation : 361

Frames : 678

Frames : 466

Frames : 678

Frames : 466

Frames : 1144

Regions: 3x3

Frames : 653

Frames : 494

Frames : 653

Frames : 494

Frames : 1147

Regions: 4x4

No fixation

More fixations

88

Trang 8

1 2 3 4

0

500

1000

1500

185 200181

584 2x2

Frame count 0 1 2

500

1000

1500

715

435 3x3

Simulation : 361

0

500

1000

1500

665

485 4x4

Class labels

0 500 1000 1500

182 181175

612 2x2

0 500 1000 1500

533

3x3

0 500 1000 1500

4x4

0 500 1000

1500

857 293 2x2

0 500 1000 1500

565 3x3

0 500 1000 1500

651 499 4x4

0 500 1000 1500

243

525 382 2x2

0 500 1000 1500

3x3 Simulation : 930

0 500 1000 1500

4x4

Scene Classification

Figure 4.17: A histogram of test scene classification for four di↵erent simulations A given test scene was classified into one of the learned scene category using minimum euclidian distance method

Ngày đăng: 10/09/2015, 09:23

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN