4.1.3.4 Categorical Fixation MapsFollowing the clustering of 1150 scenes represented by the first frame of the cor-responding scenes into scene categories, we computed categorical fixati
Trang 1N = 2 N = 3 N = 4
with PC2 and PC3 Gist descr
Figure 4.12: Examples of movie frames with computed gist descriptors Gist descriptors are colour coded for spatial scales
82
Trang 2Examples from scene category 1
Examples from scene category 2
Figure 4.13: Example scenes belonging to 2 different clusters The unsupervised cluster-ing method has allocated scenes to two di↵erent and semantically meancluster-ingful clusters
had either nature scenes (mountains, forests, landscapes, etc.,) or manmade scenes (tall buildings, railways stations, houses, etc.,), as shown in Figure 4.13 Secondly, the employed gist descriptor method uses the spatial frequency signature to quan-tify scene gist, thus resulting in a much simpler final gist vector In fact,Oliva and Torralba (2001) also defined very few scene categories (mountain, beach, forest, highway, street, and indoor) over a much more variable scene database
To assess the quality of the clusters found, we used an isolation distance met-ric (Schmitzer-Torbert et al., 2005) The isolation distance metric is shown for di↵erent clusters in Figures 4.14 and 4.15 The method basically gives a measure
of separation between clusters by computing the mahalanobis distance of the Kth
closest point outside the cluster Here K is the total number of points inside the cluster Thus, a larger number for any given cluster implies that it is more isolated from its neighboring clusters
Trang 30 0.5 1 0
0.5
1 dim−5 vs dim−7
Cluster1 Cluster2
−0.2
0
0.2 dim−1 vs dim−2
0 0.5
1 dim−1 vs dim−3
−0.5 0 0.5 dim−1 vs dim−4
0 0.5
1 dim−1 vs dim−5
−0.2 0 0.2 dim−1 vs dim−6
0 0.5
1 dim−1 vs dim−7
−0.5
0
0.5 dim−1 vs dim−8
−0.20 0 0.2 0.5
1 dim−2 vs dim−3
−0.5 0 0.5 dim−2 vs dim−4
−0.20 0 0.2 0.5
1 dim−2 vs dim−5
−0.2 0 0.2 dim−2 vs dim−6
−0.20 0 0.2 0.5
1 dim−2 vs dim−7
−0.5
0
0.5 dim−2 vs dim−8
−0.5 0 0.5 dim−3 vs dim−4
0 0.5
1 dim−3 vs dim−5
−0.2 0 0.2 dim−3 vs dim−6
0 0.5
1 dim−3 vs dim−7
−0.5 0 0.5 dim−3 vs dim−8
−0.50 0 0.5
0.5
1 dim−4 vs dim−5
−0.2 0 0.2 dim−4 vs dim−6
−0.50 0 0.5 0.5
1 dim−4 vs dim−7
−0.5 0 0.5 dim−4 vs dim−8
−0.2 0 0.2 dim−5 vs dim−6
−0.5
0
0.5 dim−5 vs dim−8
−0.20 0 0.2 0.5
1 dim−6 vs dim−7
−0.5 0 0.5 dim−6 vs dim−8
Simulation:927 −− Regions : 2x2 Cluster space of training frames : Cluster1 (936 frames) Cluster2 (208 frames)
Isolation distance [ 47.14 , 16.68 ]
−0.5 0 0.5 dim−7 vs dim−8
Figure 4.14: Example of clusters found using the reduced dimension of the gist descriptor for the training frames The example is shown for simulation 927, for which we found two scene categories (labeled using blue and red colours)
84
Trang 40 0.5 1 0
0.5
1 dim−5 vs dim−7
Cluster2 Cluster3 Cluster1
−0.2
0
0.2 dim−1 vs dim−2
0 0.5
1 dim−1 vs dim−3
−0.2 0 0.2 dim−1 vs dim−4
0 0.5
1 dim−1 vs dim−5
−0.2 0 0.2 dim−1 vs dim−6
0 0.5
1 dim−1 vs dim−7
−0.5
0
0.5 dim−1 vs dim−8
−0.20 0 0.2 0.5
1 dim−2 vs dim−3
−0.2 0 0.2 dim−2 vs dim−4
−0.20 0 0.2 0.5
1 dim−2 vs dim−5
−0.2 0 0.2 dim−2 vs dim−6
−0.20 0 0.2 0.5
1 dim−2 vs dim−7
−0.5
0
0.5 dim−2 vs dim−8
−0.2 0 0.2 dim−3 vs dim−4
0 0.5
1 dim−3 vs dim−5
−0.2 0 0.2 dim−3 vs dim−6
0 0.5
1 dim−3 vs dim−7
−0.5 0 0.5 dim−3 vs dim−8
−0.20 0 0.2
0.5
1 dim−4 vs dim−5
−0.2 0 0.2 dim−4 vs dim−6
−0.20 0 0.2 0.5
1 dim−4 vs dim−7
−0.5 0 0.5 dim−4 vs dim−8
−0.2 0 0.2 dim−5 vs dim−6
−0.5
0
0.5 dim−5 vs dim−8
−0.20 0 0.2 0.5
1 dim−6 vs dim−7
−0.5 0 0.5 dim−6 vs dim−8
Simulation:930 −− Regions : 2x2 Cluster space of training frames : Cluster1 (155 frames) Cluster2 (495 frames) Cluster3 (494 frames)
Isolation distance [ 14.91 , 13.93 , 26.60 ]
−0.5 0 0.5 dim−7 vs dim−8
0 0.5 1
0
0 0.5 1
0
0 0.5 1
0
0 0.5 1
0
0 0 0.5 1
0 0
0 0 0.5 1
0 0
0 0 0.5 1
0 0
0
0 0.5 1
0
0 0.5 1
0
0 0
0.5
1
0 0
0 0 0.5 1
0 0
0
0
0 0 0.5 1
0 0
0
Figure 4.15: Additional examples of clusters found by the algorithm These examples
Trang 54.1.3.4 Categorical Fixation Maps
Following the clustering of 1150 scenes (represented by the first frame of the cor-responding scenes) into scene categories, we computed categorical fixation maps
In total three types of categorical fixation maps were computed; a map with the centre bias intact, an average map, and a map without the centre bias
The map with the centre bias intact was built using early fixations in the training data For each scene category, we aggregated the early fixations on a blank 2D map, from all subjects and over all the scenes in that particular category
We then convolved them with a 2D Gaussian kernel The standard deviation ( )
of the kernel was set to 2 of visual angle (⇡ 40 x 40 pixels), corresponding to the high acuity foveal zone
An average fixation map was computed in a similar fashion except that early fixations were aggregated across scene categories thus yielding one average map per simulation and per choice of N A third type of fixation map, with no centre bias, was computed to illustrate di↵erences in the fixation pattern for each scene category The centre bias was removed by subtracting the average fixation map from the map with the centre bias intact map of each scene category (O’Connell and Walther, 2012)
Figure 4.16 shows examples of all three types of categorical fixation maps for two simulations and di↵erent choices of N As evident, the maps with no center bias exhibits a distinguishing fixation patterns for di↵erent scene categories This encourages further investigation of the idea that saliency maps, modulated with scene category appropriate fixation maps, would yield a better prediction of human visual attention It is important to mention that few frames, in each simulation, were categorized to noise cluster by the clustering algorithm (Harris et al., 2000) Thus categorical fixation maps were computed from total number of frames less than 1150, as also evident by the number listed over average fixation maps in
86
Trang 6Figure 4.16.
4.1.3.5 Scene Classification in Test Data
All scenes (1150 in total) of the test data were classified into one of the scene categories obtained from training data The classification process was carried out
as follows For each learned scene category we computed a cluster centroid in gist descriptor space These centroid represented the centre of gravity for each scene category Subsequently to classify a scene in test data we first computed an euclidian distance of scene’s gist descriptor (computed from first frame) to each centroid This was followed by labeling the test scene with scene category having the shortest distance Figure 4.17 shows classification results on four simulations
4.1.3.6 Control Conditions for Gist Modulation
We had two control conditions for the gist dependent modulation of saliency maps
In first condition we modulated the saliency maps in test scene with the fixation map from di↵erent scene category As an example, a test scene classified to scene category 1 would be modulated by a fixation map from scene category 2, in two scene category case We termed this as Gist scrambled condition In second condition we modulated the saliency maps in test scene using average fixation map and termed it as average condition These two control conditions enabled us
to assess that improvements in model’s prediction, after the integration of scene’s gist, was not completely due to inclusion of the centre bias
Trang 7No centre bias
Categorical fixation maps
Regions: 2x2
Frames : 155 Frames : 495 Frames : 494
Frames : 155 Frames : 495 Frames : 494
Frames : 1144
Frames : 288 Frames : 634 Frames : 224
Frames : 288 Frames : 634 Frames : 224
Frames : 1146
Regions: 3x3
Frames : 644 Frames : 498
Frames : 644 Frames : 498
Frames : 1142
Regions: 4x4
Frames : 104 Frames : 187 Frames : 116 Frames : 741
Frames : 104 Frames : 187 Frames : 116 Frames : 741
Frames : 1148
Simulation : 930
Regions: 2x2 Simulation : 361
Frames : 678
Frames : 466
Frames : 678
Frames : 466
Frames : 1144
Regions: 3x3
Frames : 653
Frames : 494
Frames : 653
Frames : 494
Frames : 1147
Regions: 4x4
No fixation
More fixations
88
Trang 81 2 3 4
0
500
1000
1500
185 200181
584 2x2
Frame count 0 1 2
500
1000
1500
715
435 3x3
Simulation : 361
0
500
1000
1500
665
485 4x4
Class labels
0 500 1000 1500
182 181175
612 2x2
0 500 1000 1500
533
3x3
0 500 1000 1500
4x4
0 500 1000
1500
857 293 2x2
0 500 1000 1500
565 3x3
0 500 1000 1500
651 499 4x4
0 500 1000 1500
243
525 382 2x2
0 500 1000 1500
3x3 Simulation : 930
0 500 1000 1500
4x4
Scene Classification
Figure 4.17: A histogram of test scene classification for four di↵erent simulations A given test scene was classified into one of the learned scene category using minimum euclidian distance method