Contents – Part IBest Paper Candidate Deep Graph Laplacian Hashing for Image Retrieval.. 117Hongda Zhang, Yuanyuan Gao, Hai-Miao Hu, Qiang Guo, and Yukun Cui Single Image Dehazing Using
Trang 1Bing Zeng · Qingming Huang
Abdulmotaleb El Saddik · Hongliang Li
123
18th Pacific-Rim Conference on Multimedia
Harbin, China, September 28–29, 2017
Revised Selected Papers, Part I
Advances in Multimedia Information Processing – PCM 2017
Trang 2Lecture Notes in Computer Science 10735
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Trang 3More information about this series at http://www.springer.com/series/7409
Trang 4Bing Zeng • Qingming Huang
Trang 5Bing Zeng
University of Electronic Science
and Technology of China
ChinaShuqiang JiangChinese Academy of SciencesBeijing
ChinaXiaopeng FanHarbin Institute of TechnologyHarbin
China
ISSN 0302-9743 ISSN 1611-3349 (electronic)
Lecture Notes in Computer Science
ISBN 978-3-319-77379-7 ISBN 978-3-319-77380-3 (eBook)
https://doi.org/10.1007/978-3-319-77380-3
Library of Congress Control Number: 2018935899
LNCS Sublibrary: SL3 – Information Systems and Applications, incl Internet/Web, and HCI
© Springer International Publishing AG, part of Springer Nature 2018
This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speci fically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a speci fic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional af filiations.
Printed on acid-free paper
This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Trang 6On behalf of the Organizing Committee, it is our great pleasure to welcome you to theproceedings of the 2017 Pacific-Rim Conference on Multimedia (PCM 2017) PCMserves as an international forum to bring together researchers and practitioners fromacademia and industry to discuss research on state-of-the-art Internet multimediaprocessing, multimedia service, analysis, and applications PCM 2017 was the 18th inthe series that has been held annually since 2000 In 2017, PCM was held in Harbin,China
Consistent with previous editions of PCM, we prepared a very attractive technicalprogram with two keynote talks, one best paper candidate session, nine oral presen-tation sessions, two poster sessions, and six oral special sessions Moreover, thanks tothe co-organization with IEEE CAS Beijing chapter, this year’s program featured apanel session titled “Advanced Multimedia Technology.” Social and intellectualinteractions were enjoyed among students, young researchers, and leading scholars
We received 264 submissions for regular papers this year These submissions coverthe areas of multimedia content analysis, multimedia signal processing and systems,multimedia applications and services, etc We thank our 104 Technical ProgramCommittee members for their efforts in reviewing papers and providing valuablefeedback to the authors From the total of 264 submissions and based on at least tworeviews per submission, the Program Chairs decided to accept 48 oral papers (18.2%)and 96 poster papers, i.e, the overall acceptance ratio for regular paper is 54.9%.Among the 48 oral papers, two papers received the Best Paper and the Best StudentPaper award Moreover, we accepted six special sessions with 35 papers
The technical program is an important aspect but only delivers its full impact ifsurrounded by challenging keynotes We are extremely pleased and grateful to havetwo exceptional keynote speakers, Wenwu Zhu and Josep Lladós, accept our invitationand present interesting ideas and insights at PCM 2017 We would also like to expressour sincere gratitude to all the other Organizing Committee members, the generalchairs, Bing Zeng, Qingming Huang, and Abdulmotaleb El Saddik, the program chair,Hongliang Li, Shuqiang Jiang, and Xiaopeng Fan, the panel chairs, Zhu Li and DebinZhao, the organizing chairs, Shaohui Liu, Liang Li, and Yan Chen, the publicationchairs, Shuhui Wang and Wen-Huang Cheng, the sponsorship chairs, Wangmeng Zuo,Luhong Liang, and Ke Lv, the registration andfinance chairs, Guorong Li and WeiqingMin, among others Their outstanding effort contributed to this extremely rich andcomplex main program that characterizes PCM 2017 Last but not the least, we thank
Trang 7all the authors, session chairs, student volunteers, and supporters Their contributionsare much appreciated.
We sincerely hope that you will enjoy reading the proceedings of PCM 2017
Qingming HuangAbdulmotaleb El Saddik
Hongliang LiShuqiang JiangXiaopeng Fan
Trang 8Program Chairs
Hongliang Li University of Electronic Science and Technology
of ChinaShuqiang Jiang ICT, Chinese Academy of Sciences, China
Xiaopeng Fan Harbin Institute of Technology, China
Organizing Chairs
Shaohui Liu Harbin Institute of Technology, China
Liang Li University of Chinese Academy Sciences, ChinaYan Chen University of Electronic Science and Technology
of ChinaPanel Chairs
Debin Zhao Harbin Institute of Technology, China
Technical Committee
Publication Chairs
Shuhui Wang ICT, Chinese Academy of Sciences, China
Wen-Huang Cheng Taiwan Academia Sinica, Taiwan
SAR ChinaSpecial Session Chairs
Feng Jiang Harbin Institute of Technology, China
Trang 9Tutorial Chairs
Zheng-jun Zha Hefei Institute of Intelligent Machines,
Chinese Academy of Sciences, China
Chong-Wah Ngo City University of Hong Kong, SAR China
Publicity Chairs
Luis Herranz Computer Vision Center, Spain
Cees Snoek University of Amsterdam and Qualcomm Research,
The NetherlandsShin’ichi Satoh National Institute of Informatics, Japan
Sponsorship Chairs
Wangmeng Zuo Harbin Institute of Technology, China
Registration Chairs
Guorong Li University of Chinese Academy of Sciences, ChinaShuyuan Zhu University of Electronic Science and Technology
of ChinaWenbin Yin Harbin Institute of Technology, China
Finance Chairs
Weiqing Min ICT, Chinese Academy of Sciences, China
Wenbin Che Harbin Institute of Technology, China
VIII Organization
Trang 10Contents – Part I
Best Paper Candidate
Deep Graph Laplacian Hashing for Image Retrieval 3Jiancong Ge, Xueliang Liu, Richang Hong, Jie Shao,
and Meng Wang
Deep Video Dehazing 14Wenqi Ren and Xiaochun Cao
Image Tagging by Joint Deep Visual-Semantic Propagation 25Yuexin Ma, Xinge Zhu, Yujing Sun, and Bingzheng Yan
Exploiting Time and Frequency Diversities for High-Quality
Linear Video Transmission: A MCast Framework 36Chaofan He, Huiying Wang, Yang Hu, Yan Chen,
and Houqiang Li
Light Field Image Compression with Sub-apertures Reordering
and Adaptive Reconstruction 47Chuanmin Jia, Yekang Yang, Xinfeng Zhang, Shiqi Wang,
Shanshe Wang, and Siwei Ma
Video Coding
Fast QTBT Partition Algorithm for JVET Intra Coding
Based on CNN 59Zhipeng Jin, Ping An, and Liquan Shen
A Novel Saliency Based Bit Allocation and RDO for HEVC 70Jiajun Xu, Qiang Peng, Bing Wang, Changbin Li,
and Xiao Wu
Light Field Image Compression Scheme Based on MVD
Coding Standard 79Xinpeng Huang, Ping An, Liquan Shen, and Kai Li
A Real-Time Multi-view AVS2 Decoder on Mobile Phone 89Yingfan Zhang, Zhenan Lin, Weilun Feng, Jun Sun,
and Zongming Guo
Trang 11Compressive Sensing Depth Video Coding via Gaussian Mixture
Models and Object Edges 96Kang Wang, Xuguang Lan, Xiangwei Li, Meng Yang,
and Nanning Zheng
Image Super-Resolution, Debluring, and Dehazing
AWCR: Adaptive and Weighted Collaborative Representations
for Face Super-Resolution with Context Residual-Learning 107Tao Lu, Lanlan Pan, Jiaming Wang, Yanduo Zhang,
Zhongyuan Wang, and Zixiang Xiong
Single Image Haze Removal Based on Global-Local Optimization
for Depth Map 117Hongda Zhang, Yuanyuan Gao, Hai-Miao Hu, Qiang Guo,
and Yukun Cui
Single Image Dehazing Using Deep Convolution Neural Networks 128Shengdong Zhang, Fazhi He, and Jian Yao
SPOS: Deblur Image by Using Sparsity Prior and Outlier Suppression 138Yiwei Zhang, Ge Li, Xiaoqiang Guo, Wenmin Wang,
and Ronggang Wang
Single Image Super-Resolution Using Multi-scale Convolutional
Neural Network 149Xiaoyi Jia, Xiangmin Xu, Bolun Cai, and Kailing Guo
Person Identity and Emotion
A Novel Image Preprocessing Strategy for Foreground Extraction
in Person Re-identification 161Daiyin Wang, Wenbin Yao, and Yuesheng Zhu
Age Estimation via Pose-Invariant 3D Face Alignment Feature
in 3 Streams of CNN 172
Li Sun, Song Qiu, Qingli Li, Hongying Liu, and Mei Zhou
Face Alignment Using Local Probabilistic Features 184Qing Lu, Jun Yu, and Zengfu Wang
Multi-modal Emotion Recognition with Temporal-Band Attention
Based on LSTM-RNN 194Jiamin Liu, Yuanqi Su, and Yuehu Liu
X Contents– Part I
Trang 12Multimodal Fusion of Spatial-Temporal Features for Emotion
Recognition in the Wild 205Zuchen Wang and Yuchun Fang
A Fast and General Method for Partial Face Recognition 215Qianhao Wu and Zechao Li
Tracking and Action Recognition
Adaptive Correlation Filter Tracking with Weighted Foreground
Representation 227Chunguang Qie, Hanzi Wang, Yan Yan, Guanjun Guo,
and Jin Zheng
A Novel Method for Camera Pose Tracking Using Visual
Complementary Filtering 238Xiangkai Lin and Ronggang Wang
Trajectory-Pooled 3D Convolutional Descriptors
for Action Recognition 247Xiusheng Lu, Hongxun Yao, Xiaoshuai Sun,
Shengping Zhang, and Yanhao Zhang
Temporal Interval Regression Network for Video Action Detection 258Qing Wang, Laiyun Qing, Jun Miao, and Lijuan Duan
Semantic Sequence Analysis for Human Activity Prediction 269Guolong Wang, Zheng Qin, and Kaiping Xu
Motion State Detection Based Prediction Model for Body Parts
Tracking of Volleyball Players 280Fanglu Xie, Xina Cheng, and Takeshi Ikenaga
Detection and Classification
Adapting Generic Detector for Semi-Supervised Pedestrian Detection 293Shiyao Lei, Qiujia Ji, Shufeng Wang, and Si Wu
StairsNet: Mixed Multi-scale Network for Object Detection 303Weiyi Gao, Wenlong Cao, Jian Zhai, and Jianwu Rui
A Dual-CNN Model for Multi-label Classification by Leveraging
Co-occurrence Dependencies Between Labels 315Peng-Fei Zhang, Hao-Yi Wu, and Xin-Shun Xu
Multi-level Semantic Representation for Flower Classification 325Chuang Lin, Hongxun Yao, Wei Yu, and Wenbo Tang
Contents– Part I XI
Trang 13Multi-view Multi-label Learning via Optimal Classifier Chain 336Yiming Liu and Xingwei Hao
Tire X-ray Image Impurity Detection Based on Multiple
Kernel Learning 346Shuai Zhao, Zhineng Chen, Baokui Li, and Bin Zhang
Multimedia Signal Reconstruction and Recovery
CRF-Based Reconstruction from Narrow-Baseline Image Sequences 359Yue Xu, Qiuyan Tao, Lianghao Wang, Dongxiao Li, and Ming Zhang
Better and Faster, when ADMM Meets CNN: Compressive-Sensed
Image Reconstruction 370Chen Zhao, Ronggang Wang, and Wen Gao
Sparsity-Promoting Adaptive Coding with Robust Empirical Mode
Decomposition for Image Restoration 380Rui Chen, Huizhu Jia, Xiaodong Xie, and Gao Wen
A Splicing Interpolation Method for Head-Related Transfer Function 390Chunling Ai, Xiaochen Wang, Yafei Wu, and Cheng Yang
Structured Convolutional Compressed Sensing Based on Deterministic
Subsamplers 400Shu Wang, Zhongyuan Wang, and Yimin Luo
Blind Speech Deconvolution via Pretrained Polynomial Dictionary
and Sparse Representation 411Jian Guan, Xuan Wang, Shuhan Qi, Jing Dong, and Wenwu Wang
Text and Line Detection/Recognition
Multi-lingual Scene Text Detection Based on Fully
Convolutional Networks 423Shaohua Liu, Yan Shang, Jizhong Han, Xi Wang, Hongchao Gao,
and Dongqin Liu
Cloud of Line Distribution for Arbitrary Text Detection
in Scene/Video/License Plate Images 433Wenhai Wang, Yirui Wu, Shivakumara Palaiahnakote, Tong Lu,
and Jun Liu
Affine Collaborative Representation Based Classification for In-Air
Handwritten Chinese Character Recognition 444Jianshe Zhou, Zhaochun Xu, Jie Liu, Weiqiang Wang, and Ke Lu
XII Contents– Part I
Trang 14Overlaid Chinese Character Recognition via a Compact CNN 453Hongzhu Li and Weiqiang Wang
Efficient and Robust Lane Detection Using Three-Stage Feature
Extraction with Line Fitting 464Aming Wu and Yahong Han
Social Media
Saliency-GD: A TF-IDF Analogy for Landmark Image Mining 477Wei Li, Jianmin Li, and Bo Zhang
An Improved Clothing Parsing Method Emphasizing the Clothing
with Complex Texture 487Juan Ji and Ruoyu Yang
Detection of Similar Geo-Regions Based on Visual Concepts
in Social Photos 497Hiroki Takimoto, Magali Philippe, Yasutomo Kawanishi,
Ichiro Ide, Takatsugu Hirayama, Keisuke Doman,
Daisuke Deguchi, and Hiroshi Murase
Unsupervised Concept Learning in Text Subspace for Cross-Media
Retrieval 505Mengdi Fan, Wenmin Wang, Peilei Dong, Ronggang Wang,
and Ge Li
Image Stylization for Thread Art via Color Quantization
and Sparse Modeling 515Kewei Yang, Zhengxing Sun, Shuang Wang,
and Hui-Hsia Chen
Least-Squares Regulation Based Graph Embedding 526Si-Xing Liu, Timothy Apasiba Abeo, and Xiang-Jun Shen
SSGAN: Secure Steganography Based on Generative
Adversarial Networks 534Haichao Shi, Jing Dong, Wei Wang, Yinlong Qian,
and Xiaoyu Zhang
Generating Chinese Poems from Images Based on Neural Network 545Shuo Xing, Xueliang Liu, Richang Hong, and Ye Zhao
Detail-Enhancement for Dehazing Method Using Guided Image
Filter and Laplacian Pyramid 555Dong Zhao and Long Xu
Contents– Part I XIII
Trang 15Personalized Micro-Video Recommendation via Hierarchical
User Interest Modeling 564Lei Huang and Bin Luo
3D and Panoramic Vision
MCTD: Motion-Coordinate-Time Descriptor for 3D Skeleton-Based
Action Recognition 577
Qi Liang and Feng Wang
Dense Frame-to-Model SLAM with an RGB-D Camera 588Xiaodan Ye, Jianing Li, Lianghao Wang, Dongxiao Li,
and Ming Zhang
Parallax-Robust Hexahedral Panoramic Video Stitching 598Sha Guo, Ronggang Wang, Xiubao Jiang, Zhenyu Wang,
and Wen Gao
Image Formation Analysis and Light Field Information
Reconstruction for Plenoptic Camera 2.0 609
Li Liu, Xin Jin, and Qionghai Dai
Part Detection for 3D Shapes via Multi-view Rendering 619Youcheng Song, Zhengxing Sun, Mofei Song, and Yunjie Wu
Benchmarking Screen Content Image Quality Evaluation in Spatial
Psychovisual Modulation Display System 629Yuanchun Chen, Guangtao Zhai, Ke Gu, Xinfeng Zhang, Weisi Lin,
and Jiantao Zhou
A Fast Sample Adaptive Offset Algorithm for H.265/HEVC 641Yan Zhou and Zhenzhong Chen
Blind Quality Assessment for Screen Content Images
by Texture Information 652Ning Lu and Guohui Li
Assessment of Visually Induced Motion Sickness in Immersive Videos 662Huiyu Duan, Guangtao Zhai, Xiongkuo Min, Yucheng Zhu, Wei Sun,
and Xiaokang Yang
Hybrid Kernel-Based Template Prediction and Intra Block Copy
for Light Field Image Coding 673Deyang Liu, Ping An, Ran Ma, Xinpeng Huang, and Liquan Shen
Asymmetric Representation for 3D Panoramic Video 683Guisen Xu, Yueming Wang, Zhenyu Wang, and Ronggang Wang
XIV Contents– Part I
Trang 16Deep Learning for Signal Processing and Understanding
Shallow and Deep Model Investigation for Distinguishing
Corn and Weeds 693
Yu Xia, Hongxun Yao, Xiaoshuai Sun, and Yanhao Zhang
Representing Discrimination of Video by a Motion Map 703Wennan Yu, Yuchao Sun, Feiwu Yu, and Xinxiao Wu
Multi-scale Discriminative Patches for Fined-Grained
Visual Categorization 712Wenbo Tang, Hongxun Yao, Xiaoshuai Sun, and Wei Yu
Chinese Characters Recognition from Screen-Rendered Images
Using Inception Deep Learning Architecture 722Xin Xu, Jun Zhou, Hong Zhang, and Xiaowei Fu
Visual Tracking by Deep Discriminative Map 733Wenyi Tang, Bin Liu, and Nenghai Yu
Hand Gesture Recognition by Using 3DCNN and LSTM
with Adam Optimizer 743Siyu Jiang and Yimin Chen
Learning Temporal Context for Correlation Tracking
with Scale Estimation 754Yuhao Cui, Haoqian Wang, Xingzheng Wang, and Yi Yang
Deep Combined Image Denoising with Cloud Images 764Sifeng Xia, Jiaying Liu, Wenhan Yang, Mading Li,
and Zongming Guo
Vehicle Verification Based on Deep Siamese Network
with Similarity Metric 773Qian Zhang, Mingtao Pei, Mei Chen, and Yunde Jia
Style Transfer with Content Preservation from Multiple Images 783Dilin Liu, Wei Yu, and Hongxun Yao
Task-Specific Neural Networks for Pose Estimation in Person
Re-identification Task 792Kai Lv, Hao Sheng, Yanwei Zheng, Zhang Xiong, Wei Li,
and Wei Ke
Mini Neural Networks for Effective and Efficient Mobile
Album Organization 802Lingling Fa, Lifei Zhang, Xiangbo Shu, Yan Song,
and Jinhui Tang
Contents– Part I XV
Trang 17Sweeper: Design of the Augmented Path in Residual Networks 811Kang Shi and Weiqiang Wang
Large-Scale Multimedia Affective Computing
Sketch Based Model-Like Standing Style Recommendation 825Ying Zheng, Hongxun Yao, and Dong Wang
Joint L1 L2 Regularisation for Blind Speech Deconvolution 834Jian Guan, Xuan Wang, Zongxia Xie, Shuhan Qi,
and Wenwu Wang
Multi-modal Emotion Recognition Based on Speech and Image 844Yongqiang Li, Qi He, Yongping Zhao, and Hongxun Yao
Analysis of Psychological Behavior of Undergraduates 854Chunchang Gao
Sensor-Enhanced Multimedia Systems
Compression Artifacts Reduction for Depth Map by Deep
Intensity Guidance 863Pingping Zhang, Xu Wang, Yun Zhang, Lin Ma, Jianmin Jiang,
and Sam Kwong
LiPS: Learning Social Relationships in Probe Space 873Chaoxi Li, Chengwen Luo, Junliang Chen, Hande Hong,
Jianqiang Li, and Long Cheng
The Intelligent Monitoring for the Elderly Based on WiFi Signals 883Nan Bao, Chengyang Wu, Qiancheng Liang, Lisheng Xu, Guozhi Li,
Ziyu Qi, Wanyi Zhang, He Ma, and Yan Li
Sentiment Analysis for Social Sensor 893Xiaoyu Zhu, Tian Gan, Xuemeng Song, and Zhumin Chen
Recovering Overlapping Partials for Monaural Perfect Harmonic Musical
Sound Separation Using Modified Common Amplitude Modulation 903Yukai Gong, Xiangbo Shu, and Jinhui Tang
Author Index 913XVI Contents– Part I
Trang 18Contents – Part II
Content Analysis
A Competitive Combat Strategy and Tactics in RTS Games AI
and StarCraft 3Adil Khan, Kai Yang, Yunsheng Fu, Fang Lou, Worku Jifara,
Feng Jiang, and Liu Shaohui
Indoor Scene Classification by Incorporating Predicted Depth Descriptor 13Yingbin Zheng, Jian Pu, Hong Wang, and Hao Ye
Multiple Thermal Face Detection in Unconstrained Environments
Using Fully Convolutional Networks 24Yezhao Fan, Guangtao Zhai, Jia Wang, Menghan Hu, and Jing Liu
Object Proposal via Depth Connectivity Constrained Grouping 34Yuantian Wang, Lei Huang, Tongwei Ren, Sheng-Hua Zhong,
Yan Liu, and Gangshan Wu
Edge-Aware Saliency Detection via Novel Graph Model 45Hanpei Yang and Weihai Li
Multiple Kernel Learning Based on Weak Learner for Automatic
Image Annotation 56Hua Zhong, Xu Yuan, Zhikui Chen, Fangming Zhong, and Yonglin Leng
An Efficient Feature Selection for SAR Target Classification 68Moussa Amrani, Kai Yang, Dongyang Zhao, Xiaopeng Fan,
and Feng Jiang
Fine-Art Painting Classification via Two-Channel Deep Residual Network 79Xingsheng Huang, Sheng-hua Zhong, and Zhijiao Xiao
Automatic Foreground Seeds Discovery for Robust Video
Saliency Detection 89Lin Zhang, Yao Lu, and Tianfei Zhou
Semantic R-CNN for Natural Language Object Detection 98Shuxiong Ye, Zheng Qin, Kaiping Xu, Kai Huang, and Guolong Wang
Spatio-Temporal Context Networks for Video Question Answering 108Kun Gao and Yahong Han
Trang 19Object Discovery and Cosegmentation Based on Dense Correspondences 119Yasi Wang, Hongxun Yao, Wei Yu, and Xiaoshuai Sun
Semantic Segmentation Using Fully Convolutional Networks and Random
Walk with Prediction Prior 129Xiaoyu Lei, Yao Lu, Tingxi Liu, and Xiaoxue Shi
Multi-modality Fusion Network for Action Recognition 139Kai Huang, Zheng Qin, Kaiping Xu, Shuxiong Ye, and Guolong Wang
Fusing Appearance Features and Correlation Features for Face
Video Retrieval 150Chenchen Jing, Zhen Dong, Mingtao Pei, and Yunde Jia
A Robust Image Reflection Separation Method Based on Sift-Edge Flow 161Shaomin Du, Xiaohui Liang, and Xiaochuan Wang
A Fine-Grained Filtered Viewpoint Informed Keypoint Prediction
from 2D Images 172Qingnan Li, Ruimin Hu, Yixin Chen, Jingwen Yan, and Jing Xiao
More Efficient, Adaptive and Stable, A Virtual Fitting System
Using Kinect 182Chang-Tai Xiong, Shun-Lei Tang, and Ruo-Yu Yang
Exploiting Sub-region Deep Features for Specific Action Recognition
in Combat Sports Video 192Yongqiang Kong, Zhaoqiang Wei, Zhengang Wei, Shengke Wang,
and Feng Gao
Face Anti-spoofing Based on Motion 202Ran Wang, Jing Xiao, Ruimin Hu, and Xu Wang
A Novel Action Recognition Scheme Based on Spatial-Temporal
Pyramid Model 212Hengying Zhao and Xinguang Xiang
Co-saliency Detection via Sparse Reconstruction and Co-salient
Object Discovery 222
Bo Li, Zhengxing Sun, Jiagao Hu, and Junfeng Xu
Robust Local Effective Matching Model for Multi-target Tracking 233Hao Sheng, Li Hao, Jiahui Chen, Yang Zhang, and Wei Ke
Group Burstiness Weighting for Image Retrieval 244Mao Wang, Qiang Liu, Yuewei Ming, and Jianping Yin
XVIII Contents– Part II
Trang 20Stereo Saliency Analysis Based on Disparity Influence
and Spatial Dissimilarity 254Lijuan Duan, Fangfang Liang, Wei Ma, and Shuo Qiu
Object Classification of Remote Sensing Images Based
on Rotation-Invariant Discrete Hashing 264Hui Xu, Yazhou Liu, and Quansen Sun
Robust Principal Component Analysis via Symmetric Alternating
Direction for Moving Object Detection 275Zhenzhou Shao, Gaoyu Wu, Ying Qu, Zhiping Shi, Yong Guan,
and Jindong Tan
Driver Head Analysis Based on Deeply Supervised Transfer Metric
Learning with Virtual Data 286Keke Liu, Yazhou Liu, Quansen Sun, Sugiri Pranata, and Shengmei Shen
Joint Dictionary Learning via Split Bregman Iteration for Large-Scale
Image Classification 296Yanyun Qu, Hanqian Li, and Yan Zhang
Multi-operator Image Retargeting with Preserving Aspect Ratio
of Important Contents 306Qian Zhang, Zhenhua Tang, Hongbo Jiang, and Kan Chang
Human Action Recognition in Videos of Realistic Scenes Based
on Multi-scale CNN Feature 316Yongsheng Zhou, Nan Pu, Li Qian, Song Wu, and Guoqiang Xiao
Automatic Facial Complexion Classification Based on Mixture Model 327Minjie Xu, Chunrong Guo, Yangyang Hu, Hong Lu, Xue Li, Fufeng Li,
and Wenqiang Zhang
Spectral Context Matching for Video Object Segmentation
Under Occlusion 337Xiaoxue Shi, Yao Lu, Tianfei Zhou, and Xiaoyu Lei
Hierarchical Tree Representation Based Face Clustering
for Video Retrieval 347Pengyi Hao, Edwin Manhando, Cong Bai, and Yujiao Huang
Improved Key Poses Model for Skeleton-Based Action Recognition 358Xiaoqiang Li, Yi Zhang, and Junhui Zhang
Pic2Geom: A Fast Rendering Algorithm for Low-Poly Geometric Art 368Ruisheng Ng, Lai-Kuan Wong, and John See
Contents– Part II XIX
Trang 21Attention Window Aware Encoder-Decoder Model for Spoken
Language Understanding 378Yiming Wang, Wenge Rong, Jingshuang Liu, Jingfei Han,
and Zhang Xiong
A New Fast Algorithm for Sample Adaptive Offset 388Chentian Sun, Yang Wang, Xiaopeng Fan, and Debin Zhao
Motion-Compensated Deinterlacing Based on Scene Change Detection 397Xiaotao Zhu, Qian Huang, Feng Ye, Fan Liu, Shufang Xu,
and Yanfang Wang
Center-Adaptive Weighted Binary K-means for Image Clustering 407Yinhe Lan, Zhenyu Weng, and Yuesheng Zhu
Aligned Local Descriptors and Hierarchical Global Features
for Person Re-Identification 418Yihao Zhang, Wenmin Wang, and Jinzhuo Wang
A Novel Background Subtraction Method Based on ViBe 428Jian Liao, Hanzi Wang, Yan Yan, and Jin Zheng
Layout-Driven Top-Down Saliency Detection for Webpage 438Xixi Li, Di Liu, Kao Zhang, and Zhenzhong Chen
Saliency Detection by Superpixel-Based Sparse Representation 447Guangyao Chen and Zhenzhong Chen
Reading Two Digital Video Clocks for Broadcast Basketball Videos 457Xinguo Yu, Xiaopan Lyu, Lei Xiang, and Hon Wai Leong
Don’t Be Confused: Region Mapping Based Visual Place Recognition 467Dapeng Du, Na Liu, Xiangyang Xu, and Gangshan Wu
An Effective Head Detection Framework via Convolutional
Neural Networks 477Canmiao Fu, Yule Yuan, Qiang Zeng, Siying He, and Yong Zhao
Identifying Gambling and Porn Websites with Image Recognition 488Longxi Li, Gaopeng Gou, Gang Xiong, Zigang Cao, and Zhen Li
Image-Set Based Collaborative Representation for Face Recognition
in Videos 498Gaopeng Gou, Junzheng Shi, Gang Xiong, Peipei Fu, Zhen Li,
and Zhenzhen Li
Vectorized Data Combination and Binary Search Oriented Reweight
for CPU-GPU Based Real-Time 3D Ball Tracking 508Ziwei Deng, Yilin Hou, Xina Cheng, and Takeshi Ikenaga
XX Contents– Part II
Trang 22Hot Topic Trend Prediction of Topic Based on Markov Chain
and Dynamic Backtracking 517Feng Xu, Jue Liu, Ying He, and Yating Hou
Fast Circular Object Localization and Pose Estimation for Robotic
Bin Picking 529Linyao Luo, Yanfei Luo, Hong Lu, Haowei Yuan, Xuehua Tang,
and Wenqiang Zhang
Local Temporal Coherence for Object-Aware Keypoint Selection
in Video Sequences 539Songlin Du and Takeshi Ikenaga
A Combined Feature Approach for Speaker Segmentation
Using Convolution Neural Network 550Jiang Zhong, Pan Zhang, and Xue Li
DDSH: Deep Distribution-Separating Hashing for Image Retrieval 560Junjie Chen and Anran Wang
An Obstacle Detection Method Based on Binocular Stereovision 571Yihan Sun, Libo Zhang, Jiaxu Leng, Tiejian Luo, and Yanjun Wu
Coding, Compression, Transmission, and Processing
Target Depth Measurement for Machine Monocular Vision 583Jiafa Mao, Mingguo Zhang, Linan Zhu, Cong Bai, and Gang Xiao
Automatic Background Adjustment for Chinese Paintings
Using Pigment Lines 596Jie Guo, Chunyou Li, and Jingui Pan
Content-Based Image Recovery 606Hong-Yu Zhou and Jianxin Wu
Integrating Visual Word Embeddings into Translation Language Model
for Keyword Spotting on Historical Mongolian Document Images 616Hongxi Wei, Hui Zhang, and Guanglai Gao
The Analysis for Binaural Signal’s Characteristics of a Real Source
and Corresponding Virtual Sound Image 626Jinshan Wang, Xiaochen Wang, Weiping Tu, Jun Chen, Tingzhao Wu,
and Shanfa Ke
Primary-Ambient Extraction Based on Channel Pair for 5.1 Channel
Audio Using Least Square 634Dingyan Song, Ge Gao, Yi Chen, and Xi Hu
Contents– Part II XXI
Trang 23Multi-scale Similarity Enhanced Guided Normal Filtering 645Wenbo Zhao, Xianming Liu, Shiqi Wang, and Debin Zhao
Deep Residual Convolution Neural Network for Single-Image
Robust Crowd Counting 654Mingjie Lu and Bo Yan
An Efficient Method Using the Parameterized HRTFs for 3D Audio
Real-Time Rendering on Mobile Devices 663Yucheng Song, Weiping Tu, Ruimin Hu, Xiaochen Wang, Wei Chen,
and Cheng Yang
Efficient Logo Insertion Method for High-Resolution H.265/HEVC
Compressed Video 674
Qi Jing, Peng Xu, Jun Sun, and Zongming Guo
Image Decomposition Based Nighttime Image Enhancement 683Xuesong Jiang, Hongxun Yao, and Dilin Liu
PSNR Estimate for JPEG Compression 693
Ci Wang, Ying Yang, and Jianhua Shen
Speech Intelligibility Enhancement in Strong Mechanical Noise Based
on Neural Networks 702Feng Cheng, Xiaochen Wang, Li Gang, Weiping Tu, and Jinshan Wang
Interactive Temporal Visualization of Collaboration Networks 713Ming Jing, Xueqing Li, and Yupeng Hu
On the Impact of Environmental Sound on Perceived Visual Quality 723Wenhan Zhu, Guangtao Zhai, Wei Sun, Yi Xu, Jing Liu, Yucheng Zhu,
and Xiaokang Yang
A Novel Texture Exemplars Extraction Approach Based on Patches
Homogeneity and Defect Detection 735Hui Lai, Lulu Yin, Huisi Wu, and Zhenkun Wen
Repetitiveness Metric of Exemplar for Texture Synthesis 745Lulu Yin, Hui Lai, Huisi Wu, and Zhenkun Wen
Unsupervised Cross-Modal Hashing with Soft Constraint 756Yuxuan Zhou, Yaoxian Li, Rui Liu, Lingyun Hao, and Yuanliang Sun
Scalable Video Coding Based on the User’s View for Real-Time Virtual
Reality Applications 766Hao Jiang, Gang He, Wenxin Yu, Zheng Wang, and Yunsong Li
XXII Contents– Part II
Trang 24Towards Visual SLAM with Memory Management
for Large-Scale Environments 776
Fu Li, Shaowu Yang, Xiaodong Yi, and Xuejun Yang
Entropy Based Sub-band Deletion for Multispectral Image Compression 787Worku J Sori, Zhao Dongyang, Lou Fang, Fu Yunsheng, Liu Shaohui,
Feng Jiang, and Khan Adil
Automatic Texture Exemplar Extraction Based on a Novel
Textureness Metric 798Huisi Wu, Junrong Jiang, Ping Li, and Zhenkun Wen
In Defense of Fully Connected Layers in Visual Representation Transfer 807Chen-Lin Zhang, Jian-Hao Luo, Xiu-Shen Wei, and Jianxin Wu
Block Cluster Based Dictionary Learning for Image De-noising
and De-blurring 818JianWei Zheng, Ping Yang, Shanshan Fang, and Cong Bai
Content Adaptive Constraint Based Image Upsampling 827Fan Yang, Huizhu Jia, Don Xie, Rui Chen, and Wen Gao
Image Quality Assessment for Video Surveillance System 838Jianhua Shen, Hongyan Zhang, and Ci Wang
Style Transfer Based on Style Primitive Discovery 847Hao Wu, Zhengxing Sun, Shuang Wang, Weihang Yuan,
and Hui-Hsia Chen
Construction of Sampling Two-Channel Nonseparable Wavelet Filter Bank
and Its Fusion Application for Multispectral Image Pansharpening 859Bin Liu, Weijie Liu, and Longxiang Xu
Data Reconstruction Based on Supervised Deep Auto-Encoder 869Ting Rui, Sai Zhang, Tongwei Ren, Jian Tang, and Junhua Zou
A Novel Fragile Watermarking Scheme for 2D Vector
Map Authentication 880Guoyin Zhang, Qingan Da, Liguo Zhang, Jianguo Sun, Qilong Han,
Liang Kou, and WenShan Wang
Hybrid Domain Encryption Method of Hyperspectral Remote
Sensing Image 890Wenhao Geng, Jing Zhang, Lu Chen, Jiafeng Li, and Li Zhuo
Anomaly Detection with Passive Aggressive Online Gaussian
Model Estimation 900Zheran Hong, Bin Liu, and Nenghai Yu
Contents– Part II XXIII
Trang 25Multi-scale Convolutional Neural Networks for Non-blind
Image Deconvolution 911Xuehui Wang, Feng Dai, Jinli Suo, Yongdong Zhang, and Qionghai Dai
Feature-Preserving Mesh Denoising Based on Guided Normal Filtering 920Renjie Wang, Wenbo Zhao, Shaohui Liu, Debin Zhao, and Chun Liu
Visual-Inertial RGB-D SLAM for Mobile Augmented Reality 928Williem, Andre Ivan, Hochang Seok, Jongwoo Lim, Kuk-Jin Yoon,
Ikhwan Cho, and In Kyu Park
ODD: An Algorithm of Online Directional Dictionary Learning
for Sparse Representation 939Dan Xu, Xinwei Gao, Xiaopeng Fan, Debin Zhao, and Wen Gao
A Low Energy Multi-hop Routing Protocol Based on Programming Tree
for Large-Scale WSN 948Feng Xu, Yating Hou, Guozhong Qian, and Yunyu Yao
Sparse Stochastic Online AUC Optimization for Imbalanced
Streaming Data 960Min Yang, Xufen Cai, Ruimin Hu, Long Ye, and Rong Zhu
Traffic Congestion Level Prediction Based on Video
Processing Technology 970Wenyu Xu, Guogui Yang, Fu Li, and Yuanhang Yang
Coarse-to-Fine Multi-camera Network Topology Estimation 981Chang Xing, Sichen Bai, Yi Zhou, Zhong Zhou, and Wei Wu
An Adaptive Tuning Sparse Fast Fourier Transform 991Sheng Shi, Runkai Yang, Xinfeng Zhang, Haihang You,
and Dongrui Fan
Author Index 1001XXIV Contents– Part II
Trang 26Best Paper Candidate
Trang 27Deep Graph Laplacian Hashing for Image
Retrieval
Jiancong Ge1(B), Xueliang Liu1, Richang Hong1, Jie Shao2, and Meng Wang1
1 Hefei University of Technology, Hefei, China
jiancongge@mail.hfut.edu.cn, liuxueliang@hfut.edu.cn
2 University of Electronic Science and Technology, Chengdu, China
Abstract Due to the storage and retrieval efficiency, hashing has been
widely deployed to approximate nearest neighbor search for fast imageretrieval on large-scale datasets It aims to map images to compactbinary codes that approximately preserve the data relations in the Ham-ming space However, most of existing approaches learn hashing func-tions using the hand-craft features, which cannot optimally capturethe underlying semantic information of images Inspired by the fastprogress of deep learning techniques, in this paper we design a novelDeep Graph Laplacian Hashing (DGLH) method to simultaneously learnrobust image features and hash functions in an unsupervised manner.Specifically, we devise a deep network architecture with graph Lapla-cian regularization to preserve the neighborhood structure in the learnedHamming space At the top layer of the deep network, we minimize thequantization errors, and enforce the bits to be balanced and uncorre-lated, which makes the learned hash codes more efficient We furtherutilize back-propagation to optimize the parameters of the networks Itshould be noted that our approach does not require labeled trainingdata and is more practical to real-world applications in comparison tosupervised hashing methods Experimental results on three benchmarkdatasets demonstrate that DGLH can outperform the state-of-the-artunsupervised hashing methods in image retrieval tasks
Keywords: Image retrieval·Deep hashing learning·Graph Laplacian
With the explosive growth of online images in recent years, large-scale imageretrieval has attracted increasing attention [4,16,21], which tries to return visu-ally similar images that match the visual query from a large database However,retrieving the exact nearest neighbors is computationally impracticable when thereference database becomes very large To alleviate this problem, hashing [20]has been widely used to speed up the query process The basic idea of hashingapproach is to transform the high dimensional data into compact binary codes,which preserve the semantic information Then the distance between data points
in high dimensional space can be approximated by Hamming distance
c
Springer International Publishing AG, part of Springer Nature 2018
B Zeng et al (Eds.): PCM 2017, LNCS 10735, pp 3–13, 2018.
Trang 284 J Ge et al.
Existing hashing methods can be classified into two categories: unsupervisedhashing methods and supervised hashing methods Unsupervised hashing meth-ods only exploit unlabeled data to learn hash functions, such as Locality SensitiveHashing (LSH) [1], Spectral Hashing (SH) [22], Iterative Quantization (ITQ) [2],Spherical Hashing (SpH) [9], and Discrete Graph Hashing (DGH) [13] Differ-ent from unsupervised methods, supervised methods utilize label information
to learn the hash codes, which can preserve the similarity relationships amongdata points in Hamming space, like Binary Reconstructive Embedding (BRE)[7], Minimal Loss Hashing (MLH) [17], and Kernel-based Supervised Hashing(KSH) [14] However, these traditional hashing methods learn hash functionsusing hand-crafted features, like GIST [18] or SIFT [15], which are not suffi-cient to represent the visual content, hence these methods result in suboptimalhashing codes
In recent years, with the rapid development of deep learning [6], it has become
a hot topic to combine deep learning and hashing methods [27] Some deephashing methods, like CNNH [23], DNNH [8], DHN [28] and DPSH [10], haveshown better performance than the traditional hashing methods because thedeep architectures generate more discriminative feature representations.However, most existing deep hashing methods are designed in the supervisedscenario, which relies on the label information of data points to preserve thesemantic similarity With the rapid growth of visual data on the web, most
of the online data do not have label annotations In addition, labeling data isalso very expensive, which means it is difficult to acquire sufficient semanticlabels in real applications Thus it is more acceptable to develop unsupervisedhashing methods which can exploit unlabeled data directly But the existingunsupervised hashing methods depend on hand-crafted features which cannoteffectively capture the semantic information of images
To alleviate the above problems, in this paper, we propose a novel vised deep hashing method, named Deep Graph Laplacian Hashing (DGLH),
unsuper-to simultaneously learn the deep feature representation and the compact binarycodes We design a graph-based objective function [24–26] at the top layer of thedeep network to preserve the adjacency relation of images without using the labelinformation We also minimize the quantization loss between the real-valuedEuclidean distance and the Hamming distance Moreover, we enforce the bits to
be balanced and uncorrelated, which makes the learned hash codes more tive Finally, the model parameters are learned by back propagation algorithm.The proposed DGLH is an end-to-end learning method, which means that we candirectly obtain hash codes from the input images by utilizing DGLH directly Wecompare the proposed DGLH method with state-of-the-art unsupervised hash-ing methods on several commonly-used image retrieval benchmarks, and theexperimental results demonstrate the effectiveness of the proposed method.The rest of this paper is organized as follows: The proposed DGLH method
effec-is introduced in Sect.2 Section3presents the experimental results and analysis,followed by a conclusion in Sect.4
Trang 29Deep Graph Laplacian Hashing for Image Retrieval 5
Fig 1 Deep Graph Laplacian Hashing (DGLH) model with a hash layerfch The fc7
layer extracts the deep features for the input images, which are also used for graph structing We enforce four objectives on the neurons at the top layer of the network tolearn compact binary codes: (1) we use graph Laplacian criterion to preserve the adja-cency relation of images from the original feature space to the Hamming space, (2) thebits should be uncorrelated, (3) the bits should be balanced, and (4) the quantizationloss should be minimized
In this paper, we use bold and uppercase letters, like X, to denote matrices Bold and lowercase letters, like x, are used to denote vectors 1 and 0 are used
to denote the vectors with all ones and zeros, respectively Ik is used to denote
the k × k identity matrix Further, the Frobenius norm · F is written as · .
Given a set of n data points X = [x1, x2, , x n] where xi ∈ R D is the featurevector of the i-th data point, we aim to learn a set of nonlinear hash functions
to map the X into compact k-bit hash codes H = [h1, h2, , h n] where hi ∈
the data points in the Hamming space, which means the similar data points
should have the similar hash codes We denote Sij as the similarity between xiand xj So the similarity constraints are preserved as:
where D(h i , h j) is the Hamming distance between hi and hj
In order to simultaneously learn robust feature representation and compacthash codes, we utilize a deep neural network, the AlexNet [6], in this work
Trang 306 J Ge et al.
Figure1 shows the framework of the proposed method AlexNet [6] consists offive convolutional layers (conv1–conv5) and three fully connected layers (fc6– fc8) Each fc layer learns a nonlinear mapping as follows:
where fl, Wl, bl , and p lare the output, the weight and bias parameters and theactivation function of thel-th layer respectively To learn compact hash codes,
we change the fc8 layer with a new full-connected layer fch of k hidden units
to compact the 4096-dimensional representation f7
binary hash code hi Besides, to obtain more robust feature, the learned hashcodes should be balanced and uncorrelated To this end, our objective function
is defined as follows:
arg min
There are three terms in Eq (6) The first term is used to preserve the larity among the data points in Hamming space:
In Eq (8), ρ > 0 is the bandwidth parameter, f7
i and fj7are the 4096-dimensional
representation extracted from fc7 layer for x i and xj respectively
Trang 31Deep Graph Laplacian Hashing for Image Retrieval 7
For similarity, we define the graph Laplacian matrix: L = D− S, where D is
the diagonal degree matrix with Dii =
jSij And Eq (7) could be rewritten
words, an ideal hashing code hihasn
i=1hi= 0 On the learned hashing matrix
H, we define the unbalanced loss as follows:
Moreover, the hash codes should be uncorrelated and different hashing code
is desired to be independent to each other, to minimize the redundancy amongthe bits
We use the last term in Eq (6) to measure the correlation among the hashingcodes, which could be defined as:
where the α > 0 and β > 0 are the balanced parameters of regularization terms.
Note that the above-mentioned optimization problem defined in Eq (12) is
a discrete optimization problem (NP hard problem) that is hard to solve Totackle the challenging problem, inspired by [10], we relax Eq (4) as hi = yi
Then, the discrete optimization problem can be solved by a constraint approach
We reformulate the problem as:
Wfchfi7+ bfch The last term in Eq (13) is to minimize the quantization loss,
and the parameter ϕ > 0 is the regularization parameter.
Trang 32∂Y = 2YL + 2αY11 T + 4β(YY T − nI k )Y + 2ϕ(Y − H). (14)
Then we can update the parameters Wfch, bfch and the parameters of fiveconvolutional layers (conv1–conv5) and two fully connected layers (fc6–fc7) by
back propagation method
To evaluate the proposed method, we conduct large-scale similarity search
exper-iments on three benchmark datasets: CIFAR-10, CIFAR-100, Flickr-25K,
and compare the proposed DGLH method with five unsupervised hashing ods, including Locality Sensitive Hashing (LSH) [1], Spectral Hashing (SH) [22],Spherical Hashing (SpH) [9], Density Sensitive Hashing (DSH) [11] and PCA-Iterative Quantization (PCA-ITQ) [2]
The CIFAR-10 [5] dataset consists of 60000 color images in 10 classes with size
of 32× 32 (6000 images per class) We randomly select 1000 images (100 images
per class) as the query set, and the remaining 59000 images as the database forimage retrieval
The CIFAR-100 [5] dataset consists of 60000 color images in 100 classes(600 images per class) The size of each image is also 32× 32 We randomly
select 10000 images (100 images per class) as the query set, and the remaining
50000 images as the database for image retrieval
The Flickr-25K [3] dataset consists of 25000 images collected from Flickr.The dataset are annotated by 24 topics such as bird, sky, night, and people Werandomly select 2000 images as the test query set, and the remaining images
as the database for image retrieval We resize each image in all dataset into
For DGLH method, we directly use the image pixels as input For othercompared unsupervised hashing methods, we represent each image in 4096-dimensional deep features extracted by the AlexNet [6] with pre-trained model
on ImageNet dataset
Trang 33Deep Graph Laplacian Hashing for Image Retrieval 9
We implement the DGLH model based on the MatConvNet [19] Specifically,
we employ the AlexNet architecture [6], and reuse the parameters of tional layers conv1–conv5 and full-connect layers fc6–fc7 from the pre-trained
convolu-model on ImageNet dataset, and train the last hashing layer fch from scratch.
The bandwidth parameter is set as ρ = 4 The two hyper-parameters α = 0.006,
β = 0.18 We set ϕ = 100 for CIFAR dataset, and ϕ = 1000 for Flickr-25K.
To evaluate the performance of different hashing methods on image retrieval,
we follow [12] to use the mean Average Precision (mAP), precision and recall attop-K positions (Precision@K, Recall@K) as evaluation metrics
Table1 shows the mAP results of DGLH and other comparing unsupervised
hashing methods on CIFAR-10,CIFAR-100 and Flickr-25K when learning
32, 48 and 64 bits hashing codes Figures2, 3 and 4 show the results of cision and recall curves with 48 and 64 bits for DGLH and other comparing
pre-unsupervised hashing methods on CIFAR-10, CIFAR-100 and Flickr-25K
respectively As above experimental results shown, it could be concluded thatthe proposed DGLH approach outperforms the comparing hashing methods
The number of retrieved samples
(b) Precision@K with 48 bits on CIFAR−10
PCA−ITQ LSH SH SpH DSH DGLH
The number of retrieved samples
(d) Precision@K with 64 bits on CIFAR−10
PCA−ITQ LSH SH SpH DSH DGLH
Fig 2 The results of precision curves and recall curves with 48 and 64 bits hashing
on the CIFAR-10.
Trang 34The number of retrieved samples
(b) Precision@K with 48 bits on CIFAR−100
PCA−ITQ LSH SH SpH DSH DGLH
The number of retrieved samples
(d) Precision@K with 64 bits on CIFAR−100
PCA−ITQ LSH SH SpH DSH DGLH
Fig 3 The results of precision curves and recall curves with 48 and 64 bits hashing
The number of retrieved samples
(b) Precision@K with 48 bits on Flickr−25K
PCA−ITQ LSH SH SpH DSH DGLH
The number of retrieved samples
(d) Precision@K with 64 bits on Flickr−25K
PCA−ITQ LSH SH SpH DSH DGLH
Fig 4 The results of precision curves and recall curves with 48 and 64 bits hashing
on the Flickr-25K.
Trang 35Deep Graph Laplacian Hashing for Image Retrieval 11
Table 1. mAP comparison results using Hamming ranking on three datasets with
different hash bits
32 bits 48 bits 64 bits 32 bits 48 bits 64 bits 32 bits 48 bits 64 bits
DGLH 0.2893 0.3326 0.3407 0.0675 0.0811 0.0968 0.0914 0.0926 0.0950
LSH 0.1513 0.1759 0.1737 0.0270 0.0376 0.0356 0.0644 0.0673 0.0698
SH 0.1389 0.1501 0.1459 0.0313 0.0250 0.0259 0.0726 0.0696 0.0757 SpH 0.2205 0.2281 0.2357 0.0508 0.0592 0.0668 0.0858 0.0856 0.0857 DSH 0.2311 0.2197 0.2460 0.0395 0.0466 0.0474 0.0684 0.0735 0.0816 PCA-ITQ 0.2541 0.2611 0.2677 0.0589 0.0672 0.0743 0.0743 0.0760 0.0782
In this paper, we propose a novel unsupervised deep hashing method for scale image retrieval In this method, unlike most previous unsupervised hashingmethods, we design a graph-based deep hashing network to simultaneously learnrobust feature representation and compact hashing codes, which preserves theadjacency relation of images without using the label information When trainingthe model, we relax the discrete optimization problem and reduce quantizationloss when learning the hashing code We also use other two constraints to opti-mize the binary code learning to make the learned hash codes more effective Theproposed DGLH method is an end-to-end model, which can learn the hashingcode from image pixels directly Experimental results on three image retrievaldatasets demonstrate that the DGLH method outperforms other state-of-the-artunsupervised hashing methods
large-Acknowledgments This work was supported in part by the National Natural
Sci-ence Foundation of China (NSFC) under grants 61472116 and 61502139, in part byNatural Science Foundation of Anhui Province under grants 1608085MF128, and inpart by the Open Projects Program of National Laboratory of Pattern Recognitionunder grant 201600006
References
1 Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest
neighbor in high dimensions Commun ACM 51(1), 117–122 (2008)
2 Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learningbinary codes In: IEEE Conference on Computer Vision and Pattern Recognition,
pp 817–824 (2011)
3 Huiskes, M.J., Lew, M.S.: The MIR Flickr retrieval evaluation In: ACM tional Conference on Multimedia Information Retrieval (2008)
Interna-4 Jiang, S., Song, X., Huang, Q.: Relative image similarity learning with
contex-tual information for internet cross-media retrieval Multimed Syst 20(6), 645–657
(2014)
Trang 36con-7 Kulis, B., Darrell, T.: Learning to hash with binary reconstructive embeddings.In: Advances in Neural Information Processing Systems, pp 1042–1050 CurranAssociates Inc (2009)
8 Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash codingwith deep neural networks CoRR, abs/1504.03410 (2015)
9 Lee, Y.: Spherical hashing In: IEEE Conference on Computer Vision and PatternRecognition, pp 2957–2964 (2012)
10 Li, W., Wang, S., Kang, W.: Feature learning based deep supervised hashing withpairwise labels CoRR, abs/1511.03855 (2015)
11 Lin, Y., Cai, D., Li, C.: Density sensitive hashing CoRR, abs/1205.2930 (2012)
12 Liu, H., Ji, R., Wu, Y., Liu, W.: Towards optimal binary code learning via ordinalembedding In: AAAI Conference on Artificial Intelligence, pp 1258–1265 (2016)
13 Liu, W., Mu, C., Kumar, S., Chang, S.-F.: Discrete graph hashing In: Advances
in Neural Information Processing Systems, pp 3419–3427 (2014)
14 Liu, W., Wang, J., Ji, R., Jiang, Y.-G., Chang, S.-F.: Supervised hashing withkernels In: IEEE Conference on Computer Vision and Pattern Recognition, pp.2074–2081 (2012)
15 Lowe, D.G.: Distinctive image features from scale-invariant keypoints Int J
Com-put Vis 60(2), 91–110 (2004)
16 Nie, L., Yan, S., Wang, M., Hong, R., Chua, T.-S.: Harvesting visual concepts forimage search with complex queries In: ACM International Conference on Multi-media, pp 59–68 ACM (2012)
17 Norouzi, M., Blei, D.M.: Minimal loss hashing for compact binary codes In: national Conference on Machine Learning, pp 353–360 (2011)
Inter-18 Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation
of the spatial envelope Int J Comput Vis 42(3), 145–175 (2001)
19 Vedaldi, A., Lenc, K.: MatConvNet: Convolutional neural networks for MATLAB.In: ACM International Conference on Multimedia, pp 689–692 (2015)
20 Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey.CoRR, abs/1408.2927 (2014)
21 Wang, S., Jiang, S.: INSTRE: a new benchmark for instance-level object retrieval
and recognition ACM Trans Multimed Comput Commun Appl 11(3), 37 (2015)
22 Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing In: Advances in Neural mation Processing Systems, pp 1753–1760 (2009)
Infor-23 Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S.: Supervised hashing for image retrievalvia image representation learning In: AAAI Conference on Artificial Intelligence,
pp 2156–2162 (2014)
24 Liu, X., Wang, M., Yin, B.-C., Huet, B., Li, X.: Event-based media enrichment
using an adaptive probabilistic hypergraph model IEEE Trans Cybern 45(11),
Trang 37Deep Graph Laplacian Hashing for Image Retrieval 13
27 Zhang, H., Shen, F., Liu, W., He, X., Luan, H., Chua, T.-S.: Discrete collaborativefiltering In: International ACM SIGIR Conference on Research and Development
in Information Retrieval, pp 325–334 (2016)
28 Zhu, H., Long, M., Wang, J., Cao, Y.: Deep hashing network for efficient similarityretrieval In: AAAI Conference on Artificial Intelligence, pp 2415–2421 (2016)
Trang 38Deep Video Dehazing
Wenqi Ren and Xiaochun Cao(B)
State Key Laboratory of Information Security (SKLOIS), Institute of InformationEngineering, Chinese Academy of Sciences, Beijing 100093, China
{renwenqi,caoxiaochun}@iie.ac.cn
Abstract Haze is a major problem in videos captured in outdoors.
Unlike single-image dehazing, video-based approaches can take tage of the abundant information that exists across neighboring frames
advan-In this work, assuming that a scene point yields highly correlated mission values between adjacent video frames, we develop a deep learningsolution for video dehazing, where a CNN is trained end-to-end to learnhow to accumulate information across frames for transmission estima-tion The estimated transmission map is subsequently used to recover
trans-a htrans-aze-free frtrans-ame vitrans-a trans-atmospheric sctrans-attering model To trtrans-ain this work, we generate a dataset consisted of synthetic hazy and haze-freevideos for supervision based on the NYU depth dataset We show thatthe features learned from this dataset are capable of removing haze thatarises in outdoor scene in a wide range of videos Extensive experimentsdemonstrate that the proposed algorithm performs favorably against thestate-of-the-art methods on both synthetic and real-world videos
net-Keywords: Video dehazing·Defogging·Transmission map
Convolutional Neural Network
Outdoor images and videos often suffer from limited visibility due to haze, fog,smoke and other small particles in the air that scatter the light in the atmosphere.Haze has two effects on the captured videos: it attenuates the signal of theviewed scene, and it introduces an additive component to the image, termed theatmospheric light (the color of a scene point at infinity) The image degradationcaused by haze increases with the distance from the camera, since the sceneradiance decreases and the atmospheric light magnitude increases Thus, a singlehazy image or frame can be modeled as a per-pixel combination of a haze-freeimage, scene transmission map and the global atmospheric light as follow [1],
whereI(x) and J(x) are the observed hazy image and the clear scene radiance,
A is the global atmospheric light, and t(x) is the scene transmission describing
the portion of light that is not scattered and reaches the camera sensors
c
Springer International Publishing AG, part of Springer Nature 2018
B Zeng et al (Eds.): PCM 2017, LNCS 10735, pp 14–24, 2018.
Trang 39Deep Video Dehazing 15
Our goal is to recover the haze-free frames and the corresponding sion maps This is an ill-posed problem that has an under-determined system[2,3] of three equations and at least four unknowns per pixel, with inherentambiguity between haze and object radiance [4] To handle this highly under-constrained problem, some previous works use additional information such asmore images, while others assumed an image prior to solve the problem from asingle image
transmis-The most successful video dehazing approaches use information from boring frames to estimate transmission maps from the input video [5], takingadvantage of a hazy video is temporally coherent and thus the transmissionvalues of an object are similar between adjacent image frames Based on thisassumption, one can design the temporal coherence constraint and add it tothe loss costs Then, the optimal transmission values for each frame can beobtained by minimizing the overall cost [5] Previous work [6] has shown signifi-cant improvement over traditional single image based dehazing approaches One
neigh-of the main challenges associated with aggregating information across multipleframes in previous work is that the differently hazy frames must be aligned Thiscan be done by using optical flow [7] However, warping-based alignment is notrobust around dis-occlusions and areas with low texture, and often yields warp-ing artifacts In addition to the alignment computation cost, methods that rely
on warping have to therefore disregard information from mis-aligned content orwarping artifacts, which can be hard by looking at local image patches alone
To this end, we present the first end-to-end data driven approach to videodehazing, the results of which can be seen in Sect.5 We train specifically hazyvideos synthesized by the NYU depth dataset [8] that comprised of indoorimages, however we show that our dehazing method extends to outdoor hazyvideos as well In addition, we show good dehazing results without any align-ment at all Our main contribution is an end-to-end solution to train a deepneural network to learn how to dehaze videos via a short stack of neighboringvideo frames To train the deep network, we create a hazy video dataset usingthe image sequence and the corresponding depth map from the NYU depthdataset [8] We compare qualitatively to real videos previously used for videodehazing, and quantitatively with synthesized dataset with ground truth
Trang 4016 W Ren and X Cao
a rough estimation of the transmission map Then, they use the expensive ting strategy to refine the final transmission map Zhu et al [11] find that the dif-ference between brightness and saturation in a clear image should be very small.Therefore, they propose a new color attenuation prior based on this observationfor haze removal from a single input hazy image Recently, Berman et al [4] intro-duce a non-local method for single image dehazing This approach is based on theassumption that an image can be faithfully represented with just a few hundreds
mat-of distinct colors
All of the above approaches strongly rely on the accuracy of the assumedimage priors, thus may perform poorly when the hand-crafted priors are insuffi-cient to describe real data As a result, these approaches tend to be more fragilethan aggregation-based methods [12], and often introduce undesirable artifactssuch as amplified noises
Multi-image aggregation: Multi-image aggregation methods directly combine
multiple images in either spatial or other domains (e.g., chromatic, luminanceand saliency) without solving any inverse problem Most existing works mergemultiple low quality images into the final result [5,12] Kim et al [5] assumethat a scene point yields highly correlated transmission values between adjacentimage frames, then add the temporal coherence cost to the contrast cost andthe truncation loss cost to define the overall cost function However, the pixel-level processing increases the computational complexity, therefore, may not besuitable for handling videos Zhang et al [13] first dehaze videos frame by frame,and then use optical flow to improve the temporal coherence based on MarkovRandom Field (MRF) Ancuti et al [12] derive two different inputs (a whitebalanced image and a contrast enhanced image) from the original hazy image,and then filter their important features by computing three measures
All the above approaches have explicit formulations on how to fuse multipleimages In this work, we instead adopt a data-driven approach to learn howmultiple images should be aggregated to generate a transmission map
Data-driven approaches: Recently, CNNs have been applied to achieve
lead-ing results on a wide variety of reconstruction problems These methods tend
to work best when large training datasets are easily constructed Such as imagedenoising [14] and image deblurring [7] However, these approaches address adifferent problem, with its own set of challenges In this work we focus on videodehazing, where neighboring hazy frames can provide abundant information fortransmission map estimation
CNNs have also been used for single image dehazing based on synthetictraining data [15,16] Ren et al [15] propose a multi-scale CNN for transmissionmap estimation Cai et al [16] present an end-to-end system for single imagedehazing However, these algorithms focus on static image dehazing, and mayyield flickering artifacts due to the lack of temporal coherence when applied tovideo dehazing In our experiments, we show that multi frame transmission mapscan be directly estimated by leveraging multiple video frames and our proposeddeep network