Honorary Co-chairsPhan Thanh Binh Vietnam National University, Ho Chi Minh City, VietnamMasaru Kitsuregawa National Institute of Informatics, Japan General Co-chairs Tu-Bao Ho Japan Adva
Trang 1Tru Cao · Ee-Peng Lim
Zhi-Hua Zhou · Tu-Bao Ho
123
19th Pacific-Asia Conference, PAKDD 2015
Ho Chi Minh City, Vietnam, May 19–22, 2015
Proceedings, Part I
Advances in
Knowledge Discovery and Data Mining
Trang 2Subseries of Lecture Notes in Computer Science
LNAI Series Editors
DFKI and Saarland University, Saarbrücken, Germany
LNAI Founding Series Editor
Joerg Siekmann
DFKI and Saarland University, Saarbrücken, Germany
Trang 4Tru Cao · Ee-Peng Lim
Zhi-Hua Zhou · Tu-Bao Ho
David Cheung · Hiroshi Motoda (Eds.)
Advances in
Knowledge Discovery
and Data Mining
19th Pacific-Asia Conference, PAKDD 2015
Ho Chi Minh City, Vietnam, May 19–22, 2015 Proceedings, Part I
ABC
Trang 5Tru Cao
Ho Chi Minh City University of Technology
Ho Chi Minh City
Nomi CityJapanDavid CheungThe University of Hong KongHong Kong
Hong Kong SARHiroshi MotodaOsaka UniversityOsaka
Japan
Lecture Notes in Artificial Intelligence
DOI 10.1007/978-3-319-18038-0
Library of Congress Control Number: 2015936624
LNCS Sublibrary: SL7 – Artificial Intelligence
Springer Cham Heidelberg New York Dordrecht London
c
Springer International Publishing Switzerland 2015
This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broad- casting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known
or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.
Printed on acid-free paper
Springer International Publishing AG Switzerland is part of Springer Science+Business Media
(www.springer.com)
Trang 6After ten years since PAKDD 2005 in Ha Noi, PAKDD was held again in Vietnam,during May 19–22, 2015, in Ho Chi Minh City PAKDD 2015 is the 19th edition of thePacific-Asia Conference series on Knowledge Discovery and Data Mining, a leadinginternational conference in the field The conference provides a forum for researchersand practitioners to present and discuss new research results and practical applications.There were 405 papers submitted to PAKDD 2015 and they underwent a rigorousdouble-blind review process Each paper was reviewed by three Program Committee(PC) members in the first round and meta-reviewed by one Senior Program Committee(SPC) member who also conducted discussions with the reviewers The Program Chairsthen considered the recommendations from SPC members, looked into each paper andits reviews, to make final paper selections At the end, 117 papers were selected for theconference program and proceedings, resulting in the acceptance rate of 28.9%, amongwhich 26 papers were given long presentation and 91 papers given regular presentation.The conference started with a day of six high-quality workshops During the nextthree days, the Technical Program included 20 paper presentation sessions coveringvarious subjects of knowledge discovery and data mining, three tutorials, a data min-ing contest, a panel discussion, and especially three keynote talks by world-renownedexperts.
PAKDD 2015 would not have been so successful without the efforts, contributions,and supports by many individuals and organizations We sincerely thank the HonoraryChairs, Phan Thanh Binh and Masaru Kitsuregawa, for their kind advice and supportduring preparation of the conference We would also like to thank Masashi Sugiyama,Xuan-Long Nguyen, and Thorsten Joachims for giving interesting and inspiring keynotetalks
We would like to thank all the Program Committee members and external reviewersfor their hard work to provide timely and comprehensive reviews and recommenda-tions, which were crucial to the final paper selection and production of the high-qualityTechnical Program We would also like to express our sincere thanks to the followingOrganizing Committee members: Xiaoli Li and Myra Spiliopoulou together with the in-dividual Workshop Chairs for organizing the workshops; Dinh Phung and U Kang withthe tutorial speakers for arranging the tutorials; Hung Son Nguyen, Nitesh Chawla, andNguyen Duc Dung for running the contest; Takashi Washio and Jaideep Srivastava forpublicizing to attract submissions and participants to the conference; Tran Minh-Trietand Vo Thi Ngoc Chau for handling the whole registration process; Tuyen N Huynh forcompiling all the accepted papers and for working with the Springer team to producethese proceedings; and Bich-Thuy T Dong, Bac Le, Thanh-Tho Quan, and Do Phuc forthe local arrangements to make the conference go smoothly
We are grateful to all the sponsors of the conference, in particular AFOSR/AOARD(Air Force Office of Scientific Research/Asian Office of Aerospace Research and Devel-opment), for their generous sponsorship and support, and the PAKDD Steering
Trang 7Committee for its guidance and Student Travel Award and Early Career Research Awardsponsorship We would also like to express our gratitude to John von Neumann Insti-tute, University of Technology, University of Science, and University of InformationTechnology of Vietnam National University at Ho Chi Minh City and Japan AdvancedInstitute of Science and Technology for jointly hosting and organizing this conference.Last but not least, our sincere thanks go to all the local team members and volunteeringhelpers for their hard work to make the event possible.
We hope you have enjoyed PAKDD 2015 and your time in Ho Chi Minh City,Vietnam
Ee-Peng LimZhi-Hua ZhouTu-Bao HoDavid CheungHiroshi Motoda
Trang 8Honorary Co-chairs
Phan Thanh Binh Vietnam National University, Ho Chi Minh City,
VietnamMasaru Kitsuregawa National Institute of Informatics, Japan
General Co-chairs
Tu-Bao Ho Japan Advanced Institute of Science and
Technology, JapanDavid Cheung University of Hong Kong, China
Hiroshi Motoda Institute of Scientific and Industrial Research,
Osaka University, Japan
Program Committee Co-chairs
Tru Hoang Cao Ho Chi Minh City University of Technology,
VietnamEe-Peng Lim Singapore Management University, SingaporeZhi-Hua Zhou Nanjing University, China
Tutorial Co-chairs
Dinh Phung Deakin University, Australia
U Kang Korea Advanced Institute of Science and
Technology, Korea
Workshop Co-chairs
Xiaoli Li Institute for Infocomm Research, A*STAR,
SingaporeMyra Spiliopoulou Otto-von-Guericke University Magdeburg,
Germany
Publicity Co-chairs
Takashi Washio Institute of Scientific and Industrial Research,
Osaka University, JapanJaideep Srivastava University of Minnesota, USA
Trang 9Proceedings Chair
Tuyen N Huynh John von Neumann Institute, Vietnam
Contest Co-chairs
Hung Son Nguyen University of Warsaw, Poland
Nitesh Chawla University of Notre Dame, USA
Nguyen Duc Dung Vietnam Academy of Science and Technology,
Vietnam
Local Arrangement Co-chairs
Bich-Thuy T Dong John von Neumann Institute, Vietnam
Bac Le Ho Chi Minh City University of Science, VietnamThanh-Tho Quan Ho Chi Minh City University of Technology,
Vietnam
Do Phuc University of Information Technology, Vietnam
National University at Ho Chi Minh City,Vietnam
Treasurer
Graham Williams Togaware, Australia
Trang 10Tu-Bao Ho Japan Advanced Institute of Science and
Technology, Japan (Member since 2005,Co-chair 2012–2014, Chair 2015–2017,Life Member since 2013)
Ee-Peng Lim (Co-chair) Singapore Management University, Singapore
(Member since 2006, Co-chair 2015–2017)Jaideep Srivastava University of Minnesota, USA (Member
since 2006)Zhi-Hua Zhou Nanjing University, China (Member since 2007)Takashi Washio Institute of Scientific and Industrial Research,
Osaka University, Japan (Member since 2008)Thanaruk Theeramunkong Thammasat University, Thailand (Member
since 2009)
P Krishna Reddy International Institute of Information Technology,
Hyderabad (IIIT-H), India (Member since 2010)Joshua Z Huang Shenzhen Institutes of Advanced Technology,
Chinese Academy of Sciences, China(Member since 2011)
Longbing Cao Advanced Analytics Institute, University of
Technology, Sydney, Australia(Member since 2013)
Jian Pei School of Computing Science, Simon Fraser
University, Canada (Member since 2013)Myra Spiliopoulou Otto-von-Guericke-University Magdeburg,
Germany (Member since 2013)Vincent S Tseng National Cheng Kung University, Taiwan
(Member since 2014)
Life Members
Hiroshi Motoda AFOSR/AOARD and Institute of Scientific and
Industrial Research, Osaka University, Japan(Member since 1997, Co-chair 2001–2003,Chair 2004–2006, Life Member since 2006)Rao Kotagiri University of Melbourne, Australia
(Member since 1997, Co-chair 2006–2008,Chair 2009–2011, Life Member since 2007)Huan Liu Arizona State University, USA (Member
since 1998, Treasurer 1998–2000, Life Membersince 2012)
Trang 11Ning Zhong Maebashi Institute of Technology, Japan
(Member since 1999, Life member since 2008)Masaru Kitsuregawa Tokyo University, Japan (Member since 2000,
Life Member since 2008)David Cheung University of Hong Kong, China (Member since
2001, Treasurer 2005–2006,chair 2006–2008, Life Member since 2009)Graham Williams Australian National University, Australia
(Member since 2001, Treasurer since 2006,Co-chair 2009–2011, Chair 2012–2014,Life Member since 2009)
Ming-Syan Chen National Taiwan University, Taiwan, ROC
(Member since 2002, Life Member since 2010)Kyu-Young Whang Korea Advanced Institute of Science and
Technology, Korea (Member since 2003,Life Member since 2011)
Chengqi Zhang University of Technology, Sydney, Australia
(Member since 2004, Life Member since 2012)
Senior Program Committee Members
Arbee Chen National Chengchi University, Taiwan
Bart Goethals University of Antwerp, Belgium
Charles Ling University of Western Ontario, Canada
Chih-Jen Lin National Taiwan University, Taiwan
Dacheng Tao University of Technology, Sydney, AustraliaDou Shen Baidu, China
George Karypis University of Minnesota, USA
Haixun Wang Google, USA
Hanghang Tong City University of New York, USA
Hui Xiong Rutgers Univesity, USA
Ian Davidson University of California Davis, USA
James Bailey University of Melbourne, Australia
Jeffrey Yu The Chinese University of Hong Kong, Hong KongJian Pei Simon Fraser University, Canada
Jianyong Wang Tsinghua University, China
Jieping Ye Arizona State University, USA
Jiuyong Li University of South Australia, Australia
Joshua Huang Shenzhen Institutes of Advanced Technology,
Chinese Academy of Sciences, ChinaKyuseok Shim Seoul National University, Korea
Longbing Cao University of Technology, Sydney, AustraliaMasashi Sugiyama University of Tokyo, Japan
Michael Berthold University of Konstanz, Germany
Trang 12Ming Li Nanjing University, China
Ming-Syan Chen National Taiwan University, Taiwan
Min-Ling Zhang Southeast University, China
Myra Spiliopoulou Otto-von-Guericke-University Magdeburg,
GermanyNikos Mamoulis University of Hong Kong, Hong Kong
Ning Zhong Maebashi Institute of Technology, Japan
Osmar Zaiane University of Alberta, Canada
P Krishna Reddy International Institute of Information Technology,
Hyderabad, IndiaPeter Christen Australian National University, Australia
Sanjay Chawla University of Sydney, Australia
Takashi Washio Institute of Scientific and Industrial Research,
Osaka University, JapanVincent S Tseng National Cheng Kung University, Taiwan
Wee Keong Ng Nanyang Technological University, SingaporeWei Wang University of California at Los Angeles, USAWen-Chih Peng National Chiao Tung University, Taiwan
Xiaofang Zhou University of Queensland, Australia
Xiaohua Hu Drexel University, USA
Xifeng Yan University of California, Santa Barbara, USAXindong Wu University of Vermont, USA
Xing Xie Microsoft Research Asia, China
Yanchun Zhang Victoria University, Australia
Yu Zheng Microsoft Research Asia, China
Program Committee Members
Aijun An York University, Canada
Aixin Sun Nanyang Technological University, SingaporeAkihiro Inokuchi Kwansei Gakuin University, Japan
Alfredo Cuzzocrea ICAR-CNR and University of Calabria, ItalyAndrzej Skowron University of Warsaw, Poland
Anne Denton North Dakota State University, USA
Bettina Berendt Katholieke Universiteit Leuven, Belgium
Bin Zhou University of Maryland, Baltimore County, USABing Tian Dai Singapore Management University, Singapore
Bo Zhang Tsinghua University, China
Bolin Ding Microsoft Research, USA
Bruno Cremilleux Université de Caen Basse-Normandie, FranceCarson K Leung University of Manitoba, Canada
Chandan Reddy Wayne State University, USA
Chedy Raissi Inria, France
Chengkai Li The University of Texas at Arlington, USA
Trang 13Chia-Hui Chang National Central University, Taiwan
Chiranjib Bhattacharyya Indian Institute of Science, India
Choochart Haruechaiy National Electronics and Computer Technology
Center, ThailandChun-Hao Chen Tamkang University, Taiwan
Chun-hung Li Hong Kong Baptist University, Hong KongClifton Phua NCS, Singapore
Daoqiang Zhang Nanjing University of Aeronautics and
Astronautics, ChinaDao-Qing Dai Sun Yat-Sen University, China
David Taniar Monash University, Australia
David Lo Singapore Management University, SingaporeDe-Chuan Zhan Nanjing University, China
Dejing Dou University of Oregon, USA
De-Nian Yang Academia Sinica, Taiwan
Dhaval Patel Indian Institute of Technology, Roorkee, IndiaDinh Phung Deakin University, Australia
Dragan Gamberger Ru ¯der Boškovi´c Institute, Croatia
Du Zhang California State University, Sacramento, USADuc Dung Nguyen Institute of Information Technology, VietnamEnhong Chen University of Science and Technology of China,
ChinaFei Liu Carnegie Mellon University, USA
Feida Zhu Singapore Management University, SingaporeFlorent Masseglia Inria, France
Geng Li Oracle Corporation, USA
Giuseppe Manco Università della Calabria, Italy
Guandong Xu University of Technology, Sydney, AustraliaGuo-Cheng Lan Industrial Technology Research Institute, TaiwanGustavo Batista University of São Paulo, Brazil
Hady Lauw Singapore Management University, SingaporeHarry Zhang University of New Brunswick, Canada
Hiroshi Mamitsuka Kyoto University, Japan
Hong Shen University of Adelaide, Australia
Hsuan-Tien Lin National Taiwan University, Taiwan
Hua Lu Aalborg University, Denmark
Hui Wang University of Ulster, UK
Hung Son Nguyen University of Warsaw, Poland
Hung-Yu Kao National Cheng Kung University, Taiwan
Irena Koprinska University of Sydney, Australia
J Saketha Nath Indian Insitiute of Technology, India
Jaakko Hollmén Aalto University, Finland
Jake Chen Indiana University–Purdue University Indianapolis,
USA
Trang 14James Kwok Hong Kong University of Science and Technology,
ChinaJason Wang New Jersey Science and Technology University,
USAJean-Marc Petit Université de Lyon, France
Jeffrey Ullman Stanford University, USA
Jen-Wei Huang National Cheng Kung University, Taiwan
Jerry Chun-Wei Lin Harbin Institute of Technology Shenzhen,
ChinaJia Wu University of Technology, Sydney, AustraliaJialie Shen Singapore Management University, SingaporeJiayu Zhou Samsung Research America, USA
Jia-Yu Pan Google, USA
Jin Soung Yoo Indiana University–Purdue University
Indianapolis, USAJingrui He IBM Research, USA
Jinyan Li University of Technology, Sydney, AustraliaJohn Keane University of Manchester, UK
Jun Huan University of Kansas, USA
Jun Gao Peking University, China
Jun Luo Huawei Noah’s Ark Lab, Hong Kong
Jun Zhu Tsinghua University, China
Junbin Gao Charles Sturt University, Australia
Junjie Wu Beihang University, China
Junping Zhang Fudan University, China
K Selcuk Candan Arizona State University, USA
Keith Chan Hong Kong Polytechnic University, Hong KongKhoat Than Hanoi University of Science and Technology,
VietnamKitsana Waiyamai Kasetsart University, Thailand
Krisztian Buza Semmelweis University, Budapest, HungaryKun-Ta Chuang National Cheng Kung University, Taiwan
Kuo-Wei Hsu National Chengchi University, Taiwan
Latifur Khan University of Texas at Dallas, USA
Ling Chen University of Technology, Sydney, AustraliaLipo Wang Nanyang Technological University, SingaporeManabu Okumura Japan Advanced Institute of Science and
Technology, JapanMarco Maggini Università degli Studi di Siena, Italy
Marian Vajtersic University of Salzburg, Austria
Marut Buranarach National Electronics and Computer Technology
Center, ThailandMary Elaine Califf Illinois State University, USA
Marzena Kryszkiewicz Warsaw University of Technology, Poland
Trang 15Masashi Shimbo Nara Institute of Science and Technology, JapanMeng Chang Chen Academia Sinica, Taiwan
Mengjie Zhang Victoria University of Wellington, New ZealandMichael Hahsler Southern Methodist University, USA
Min Yao Zhejiang University, China
Mi-Yen Yeh Academia Sinica, Taiwan
Muhammad Cheema Monash University Australia
Murat Kantarcioglu University of Texas at Dallas, USA
Ngoc-Thanh Nguyen Wrocław University of Technology, PolandNguyen Le Minh Japan Advanced Institute of Science and
Technology, JapanPabitra Mitra Indian Institute of Technology Kharagpur, IndiaPatricia Riddle University of Auckland, New Zealand
Peixiang Zhao Florida State University, USA
Philippe Lenca Télécom Bretagne, France
Philippe Fournier-Viger University of Moncton, Canada
Qingshan Liu NLPR Institute of Automation, Chinese Academy
of Sciences, ChinaRaymond Chi-Wing Wong Hong Kong University of Science and Technology,
Hong KongRichi Nayak Queensland University of Technology, AustraliaRui Camacho Universidade do Porto, Portugal
Salvatore Orlando University of Venice, Italy
Sanjay Jain National University of Singapore, SingaporeSee-Kiong Ng Institute for Infocomm Research, A*STAR,
SingaporeShafiq Alam University of Auckland, New Zealand
Sheng-Jun Huang Nanjing University of Aeronautics and
Astronautics, ChinaShoji Hirano Shimane University, Japan
Shou-De Lin National Taiwan University, Taiwan
Shuai Ma Beihang University, China
Shu-Ching Chen Florida International University, USA
Shuigeng Zhou Fudan University, China
Silvia Chiusano Politecnico di Torino, Italy
Songcan Chen Nanjing University of Aeronautics and
Astronautics, ChinaTadashi Nomoto National Institute of Japanese Literature, JapanTakehisa Yairi University of Tokyo, Japan
Tetsuya Yoshida Nara Women’s University, Japan
Toshihiro Kamishima National Institute of Advanced Industrial Science
and Technology, Japan
Trang 16Tuyen N Huynh John von Neumann Institute, Vietnam
Tzung-Pei Hong National University of Kaohsiung, Taiwan
Van-Nam Huynh Japan Advanced Institute of Science and
Technology, JapanVincenzo Piuri Università degli Studi di Milano, Italy
Wai Lam The Chinese University of Hong Kong, Hong KongWalter Kosters Universiteit Leiden, The Netherlands
Wang-Chien Lee Pennsylvania State University, USA
Wei Ding University of Massachusetts Boston, USA
Wenjie Zhang University of New South Wales, Australia
Wenjun Zhou University of Tennessee, Knoxville, USA
Wilfred Ng Hong Kong University of Science and Technology,
Hong KongWu-Jun Li Nanjing University, China
Wynne Hsu National University of Singapore, SingaporeXiaofeng Meng Renmin University of China, China
Xiaohui (Daniel) Tao University of Southern Queensland, AustraliaXiaoli Li Institute for Infocomm Research, A*STAR,
SingaporeXiaowei Ying Bank of America, USA
Xin Wang University of Calgary, Canada
Xingquan Zhu Florida Atlantic University, USA
Xintao Wu University of Arkansas, Arkansas
Xuan Vinh Nguyen University of Melbourne, Australia
Xuan-Hieu Phan University of Engineering and
Technology–Vietnam National University,Hanoi, Vietnam
Xuelong Li University of London, UK
Xu-Ying Liu Southeast University, China
Yang Yu Nanjing University, China
Yang-Sae Moon Kangwon National University, Korea
Yasuhiko Morimoto Hiroshima University, Japan
Yidong Li Beijing Jiaotong University, China
Yi-Dong Shen Chinese Academy of Sciences, China
Ying Zhang University of New South Wales, Australia
Yi-Ping Phoebe Chen La Trobe University, Australia
Yiu-ming Cheung Hong Kong Baptist University, Hong KongYong Guan Iowa State University, USA
Yonghong Peng University of Bradford, UK
Yue-Shi Lee Ming Chuan University, Taiwan
Zheng Chen Microsoft Research Asia, China
Zhenhui Li Pennsylvania State University, USA
Zhiyuan Chen University of Maryland, Baltimore County, USAZhongfei Zhang Binghamton University, USA
Zili Zhang Deakin University, Australia
Trang 17External Reviewers
Ahsanul Haque University of Texas at Dallas, USA
Ameeta Agrawal York University, Canada
Anh Kim Nguyen Hanoi University of Science and Technology,
VietnamArnaud Soulet Université François Rabelais, Tours, FranceBhanukiran Vinzamuri Wayne State University, USA
Bin Fu University of Technology, Sydney, AustraliaBing Tian Dai Singapore Management University, SingaporeBudhaditya Saha Deakin University, Australia
Cam-Tu Nguyen Nanjing University, China
Cheng Long Hong Kong University of Science and Technology,
Hong KongChung-Hsien Yu University of Massachusetts Boston, USA
Chunming Liu University of Technology, Sydney, AustraliaDawei Wang University of Massachusetts Boston, USA
Dieu-Thu Le University of Trento, Italy
Dinusha Vatsalan Australian National University, Australia
Doan V Nguyen Japan Advanced Institute of Science and
Technology, JapanEmmanuel Coquery Université Lyon1, CNRS, France
Ettore Ritacco ICAR-CNR, Italy
Fan Jiang University of Manitoba, Canada
Fang Yuan Institute for Infocomm Research A*STAR,
SingaporeFangfang Li University of Technology, Sydney, AustraliaFernando Gutierrez University of Oregon, USA
Fuzheng Zhang University of Science and Technology of China,
ChinaGensheng Zhang University of Texas at Arlington, USA
Gianni Costa ICAR-CNR, Italy
Guan-Bin Chen National Cheng Kung University, Taiwan
Hao Wang University of Oregon, USA
Heidar Davoudi York University, Canada
Henry Lo University of Massachusetts Boston, USA
Ikumi Suzuki National Institute of Genetics, Japan
Jan Bazan University of Rzeszów, Poland
Jan Vosecky Hong Kong University of Science and Technology,
Hong KongJavid Ebrahimi University of Oregon, USA
Jianhua Yin Tsinghua University, China
Jianmin Li Tsinghua University, China
Jianpeng Xu Michigan State University, USA
Jing Ren Singapore Management University, SingaporeJinpeng Chen Beihang University, China
Trang 18Jipeng Qiang University of Massachusetts Boston, USA
Joseph Paul Cohen University of Massachusetts Boston, USA
Junfu Yin University of Technology, Sydney, AustraliaJustin Sahs University of Texas at Dallas, USA
Kai-Ho Chan Hong Kong University of Science and Technology,
Hong KongKazuo Hara National Institute of Genetics, Japan
Ke Deng RMIT University, Australia
Kiki Maulana Adhinugraha Monash University, Australia
Kin-Long Ho Hong Kong University of Science and Technology,
Hong KongLan Thi Le Hanoi University of Science and Technology,
VietnamLei Zhu Huazhong University of Science and Technology,
ChinaLin Li Wuhan University of Technology, China
Linh Van Ngo Hanoi University of Science and Technology,
VietnamLoc Do Singapore Management University, SingaporeMaksim Tkachenko Singapore Management University, SingaporeMarc Plantevit Université de Lyon, France
Marian Scuturici INSA de Lyon, CNRS, France
Marthinus Christoffel du Plessis University of Tokyo, Japan
Md Anisuzzaman Siddique Hiroshima University, Japan
Min Xie Hong Kong University of Science and Technology,
Hong KongMing Yang Binghamton University, USA
Minh Nhut Nguyen Institute for Infocomm Research A*STAR,
SingaporeMohit Sharma University of Minnesota, USA
Morteza Zihayat York University, Canada
Mu Li University of Technology, Sydney, AustraliaNaeemul Hassan University of Texas at Arlington, USA
NhatHai Phan University of Oregon, USA
Nicola Barbieri Yahoo Labs, Spain
Nicolas Béchet Université de Bretagne Sud, France
Nima Shahbazi York University, Canada
Pakawadee Pengcharoen Hong Kong University of Science and Technology,
Hong KongPawel Gora University of Warsaw, Poland
Peiyuan Zhou Hong Kong Polytechnic University, Hong KongPeng Peng Hong Kong University of Science and Technology,
Hong KongPinghua Gong University of Michigan, USA
Trang 19Qiong Fang Hong Kong University of Science and Technology,
Hong KongQuan Xiaojun Institute for Infocomm Research A*STAR,
SingaporeRiccardo Ortale ICAR-CNR, Italy
Sabin Kafle University of Oregon, USA
San Phyo Phyo Institute for Infocomm Research A*STAR,
SingaporeSang The Dinh Hanoi University of Science and Technology,
VietnamShangpu Jiang University of Oregon, USA
Shenlu Wang University of New South Wales, Australia
Shiyu Yang University of New South Wales, Australia
Show-Jane Yen Ming Chuan University, Taiwan
Shuangfei Zhai Binghamton University, USA
Simone Romano University of Melbourne, Australia
Sujatha Das Gollapalli Institute for Infocomm Research A*STAR,
SingaporeSwarup Chandra University of Texas at Dallas, USA
Syed K Tanbeer University of Manitoba, Canada
Tenindra Abeywickrama Monash University, Australia
Thanh-Son Nguyen Singapore Management University, SingaporeThin Nguyen Deakin University, Australia
Tiantian He Hong Kong Polytechnic University, Hong KongTianyu Kang University of Massachusetts Boston, USA
Trung Le Deakin University, Australia
Tuan M V Le Singapore Management University, SingaporeXiaochen Chen Google, USA
Xiaolin Hu Tsinghua University, China
Xin Li University of Science and Technology, ChinaXuhui Fan University of Technology, Sydney, AustraliaYahui Di University of Massachusetts Boston, USA
Yan Li Wayne State University, USA
Yang Jianbo Institute for Infocomm Research A*STAR,
SingaporeYang Mu University of Massachusetts Boston, USA
Yanhua Li University of Minnesota, USA
Yanhui Gu Nanjing Normal University, China
Yathindu Rangana Hettiarachchige Monash University, Australia
Yi-Yu Hsu National Cheng Kung University, Taiwan
Yingming Li Binghamton University, USA
Yu Zong West Anhui University, China
Zhiyong Chen Singapore Management University, SingaporeZhou Zhao Hong Kong University of Science and Technology,
Hong KongZongda Wu Wenzhou University, China
Trang 20Social Networks and Social Media
Maximizing Friend-Making Likelihood for Social Activity
Organization 3Chih-Ya Shen, De-Nian Yang, Wang-Chien Lee, and Ming-Syan Chen
What Is New in Our City? A Framework for Event Extraction Using
Social Media Posts 16Chaolun Xia, Jun Hu, Yan Zhu, and Mor Naaman
Link Prediction in Aligned Heterogeneous Networks 33Fangbing Liu and Shu-Tao Xia
Scale-Adaptive Group Optimization for Social Activity Planning 45Hong-Han Shuai, De-Nian Yang, Philip S Yu, and Ming-Syan Chen
Influence Maximization Across Partially Aligned Heterogenous
Social Networks 58Qianyi Zhan, Jiawei Zhang, Senzhang Wang, Philip S Yu,
and Junyuan Xie
Multiple Factors-Aware Diffusion in Social Networks 70Chung-Kuang Chou and Ming-Syan Chen
Understanding Community Effects on Information Diffusion 82Shuyang Lin, Qingbo Hu, Guan Wang, and Philip S Yu
On Burst Detection and Prediction in Retweeting Sequence 96Zhilin Luo, Yue Wang, Xintao Wu, Wandong Cai, and Ting Chen
#FewThingsAboutIdioms: Understanding Idioms and Its Users
in the Twitter Online Social Network 108Koustav Rudra, Abhijnan Chakraborty, Manav Sethi, Shreyasi Das,
Niloy Ganguly, and Saptarshi Ghosh
Retweeting Activity on Twitter: Signs of Deception 122Maria Giatsoglou, Despoina Chatzakou, Neil Shah,
Christos Faloutsos, and Athena Vakali
Resampling-Based Gap Analysis for Detecting Nodes with High
Centrality on Large Social Network 135Kouzou Ohara, Kazumi Saito, Masahiro Kimura, and Hiroshi Motoda
Trang 21Double Ramp Loss Based Reject Option Classifier 151Naresh Manwani, Kalpit Desai, Sanand Sasidharan,
and Ramasubramanian Sundararajan
Efficient Methods for Multi-label Classification 164Chonglin Sun, Chunting Zhou, Bo Jin, and Francis C.M Lau
A Coupled k-Nearest Neighbor Algorithm for Multi-label Classification 176Chunming Liu and Longbing Cao
Learning Topic-Oriented Word Embedding for Query Classification 188Hebin Yang, Qinmin Hu, and Liang He
Reliable Early Classification on Multivariate Time Series
with Numerical and Categorical Attributes 199Yu-Feng Lin, Hsuan-Hsu Chen, Vincent S Tseng, and Jian Pei
Distributed Document Representation for Document Classification 212Rumeng Li and Hiroyuki Shindo
Prediciton of Emergency Events: A Multi-Task Multi-Label Learning
Approach 226Budhaditya Saha, Sunil Kumar Gupta, and Svetha Venkatesh
Nearest Neighbor Method Based on Local Distribution for Classification 239Chengsheng Mao, Bin Hu, Philip Moore, Yun Su, and Manman Wang
Immune Centroids Over-Sampling Method for Multi-Class Classification 251Xusheng Ai, Jian Wu, Victor S Sheng, Pengpeng Zhao, Yufeng Yao,
and Zhiming Cui
Optimizing Classifiers for Hypothetical Scenarios 264Reid A Johnson, Troy Raeder, and Nitesh V Chawla
Repulsive-SVDD Classification 277Phuoc Nguyen and Dat Tran
Centroid-Means-Embedding: an Approach to Infusing Word Embeddings
into Features for Text Classification 289Mohammad Golam Sohrab, Makoto Miwa, and Yutaka Sasaki
Machine Learning
Collaborating Differently on Different Topics: A Multi-Relational Approach
to Multi-Task Learning 303Sunil Kumar Gupta, Santu Rana, Dinh Phung, and Svetha Venkatesh
Trang 22Multi-Task Metric Learning on Network Data 317Chen Fang and Daniel N Rockmore
A Bayesian Nonparametric Approach to Multilevel Regression 330
Vu Nguyen, Dinh Phung, Svetha Venkatesh, and Hung H Bui
Learning Conditional Latent Structures from Multiple Data Sources 343Viet Huynh, Dinh Phung, Long Nguyen, Svetha Venkatesh,
and Hung H Bui
Collaborative Multi-view Learning with Active Discriminative Prior
for Recommendation 355Qing Zhang and Houfeng Wang
Online and Stochastic Universal Gradient Methods for Minimizing
Regularized Hölder Continuous Finite Sums in Machine Learning 369Ziqiang Shi and Rujie Liu
Context-Aware Detection of Sneaky Vandalism on Wikipedia Across
Multiple Languages 380Khoi-Nguyen Tran, Peter Christen, Scott Sanner, and Lexing Xie
Uncovering the Latent Structures of Crowd Labeling 392Tian Tian and Jun Zhu
Use Correlation Coefficients in Gaussian Process to Train Stable
ELM Models 405Yulin He, Joshua Zhexue Huang, Xizhao Wang, and Rana Aamir Raza
Local Adaptive and Incremental Gaussian Mixture for Online Density
Estimation 418Tianyu Qiu, Furao Shen, and Jinxi Zhao
Latent Space Tracking from Heterogeneous Data with an Application
for Anomaly Detection 429Jiaji Huang and Xia Ning
A Learning-Rate Schedule for Stochastic Gradient Methods to Matrix
Factorization 442Wei-Sheng Chin, Yong Zhuang, Yu-Chin Juan, and Chih-Jen Lin
Trang 23Predicting Smartphone Adoption in Social Networks 472
Le Wu, Yin Zhu, Nicholas Jing Yuan, Enhong Chen, Xing Xie,
and Yong Rui
Discovering the Impact of Urban Traffic Interventions Using Contrast
Mining on Vehicle Trajectory Data 486Xiaoting Wang, Christopher Leckie, Hairuo Xie,
and Tharshan Vaithianathan
Locating Self-collection Points for Last-mile Logistics using Public
Transport Data 498Huayu Wu, Dongxu Shao, and Wee Siong Ng
A Stochastic Framework for Solar Irradiance Forecasting Using Condition
Random Field 511Jin Xu, Shinjae Yoo, Dantong Yu, Hao Huang, Dong Huang,
John Heiser, and Paul Kalb
Online Prediction of Chess Match Result 525Mohammad M Masud, Ameera Al-Shehhi, Eiman Al-Shamsi,
Shamma Al-Hassani, Asmaa Al-Hamoudi, and Latifur Khan
Learning of Performance Measures from Crowd-Sourced Data
with Application to Ranking of Investments 538Greg Harris, Anand Panangadan, and Viktor K Prasanna
Hierarchical Dirichlet Process for Tracking Complex Topical Structure
Evolution and its Application to Autism Research Literature 550Adham Beykikhoshk, Ognjen Arandjelovic´, Svetha Venkatesh,
and Dinh Phung
Automated Detection for Probable Homologous Foodborne Disease
Outbreaks 563Xiao Xiao, Yong Ge, Yunchang Guo, Danhuai Guo, Yi Shen,
Yuanchun Zhou, and Jianhui Li
Identifying Hesitant and Interested Customers for Targeted Social
Marketing 576Guowei Ma, Qi Liu, Le Wu, and Enhong Chen
Activity-Partner Recommendation 591Wenting Tu, David W Cheung, Nikos Mamoulis, Min Yang,
and Ziyu Lu
Iterative Use of Weighted Voronoi Diagrams to Improve Scalability
in Recommender Systems 605Joydeep Das, Subhashis Majumder, Debarshi Dutta,
and Prosenjit Gupta
Trang 24Novel Methods and Algorithms
Principal Sensitivity Analysis 621Sotetsu Koyamada, Masanori Koyama, Ken Nakae, and Shin Ishii
SocNL: Bayesian Label Propagation with Confidence 633Yuto Yamaguchi, Christos Faloutsos, and Hiroyuki Kitagawa
An Incremental Local Distribution Network for Unsupervised Learning 646Youlu Xing, Tongyi Cao, Ke Zhou, Furao Shen, and Jinxi Zhao
Trend-Based Citation Count Prediction for Research Articles 659Cheng-Te Li, Yu-Jen Lin, Rui Yan, and Mi-Yen Yeh
Mining Text Enriched Heterogeneous Citation Networks 672Jan Kralj, Anita Valmarska, Marko Robnik-Šikonja, and Nada Lavracˇ
Boosting via Approaching Optimal Margin Distribution 684Chuan Liu and Shizhong Liao
o-HETM: An Online Hierarchical Entity Topic Model for News Streams 696Linmei Hu, Juanzi Li, Jing Zhang, and Chao Shao
Modeling User Interest and Community Interest in Microbloggings:
An Integrated Approach 708Tuan-Anh Hoang
Minimal Jumping Emerging Patterns: Computation and Practical
Assessment 722Bamba Kane, Bertrand Cuissart, and Bruno Crémilleux
Rank Matrix Factorisation 734Thanh Le Van, Matthijs van Leeuwen, Siegfried Nijssen,
and Luc De Raedt
An Empirical Study of Personal Factors and Social Effects on Rating
Prediction 747Zhijin Wang, Yan Yang, Qinmin Hu, and Liang He
Author Index 759
Trang 25Opinion Mining and Sentiment Analysis
Emotion Cause Detection for Chinese Micro-Blogs Based on ECOCC
Model 3Kai Gao, Hua Xu, and Jiushuo Wang
Parallel Recursive Deep Model for Sentiment Analysis 15Changliang Li, Bo Xu, Gaowei Wu, Saike He, Guanhua Tian,
and Yujun Zhou
Sentiment Analysis in Transcribed Utterances 27Nir Ofek, Gilad Katz, Bracha Shapira, and Yedidya Bar-Zev
Rating Entities and Aspects Using a Hierarchical Model 39Xun Wang, Katsuhito Sudoh, and Masaaki Nagata
Sentiment Analysis on Microblogging by Integrating Text and Image
Features 52Yaowen Zhang, Lin Shang, and Xiuyi Jia
TSum4act: A Framework for Retrieving and Summarizing Actionable
Tweets during a Disaster for Reaction 64Minh-Tien Nguyen, Asanobu Kitamoto, and Tri-Thanh Nguyen
Clustering
Evolving Chinese Restaurant Processes for Modeling Evolutionary
Traces in Temporal Data 79Peng Wang, Chuan Zhou, Peng Zhang, Weiwei Feng, Li Guo,
and Binxing Fang
Small-Variance Asymptotics for Bayesian Nonparametric Models
with Constraints 92Cheng Li, Santu Rana, Dinh Phung, and Svetha Venkatesh
Spectral Clustering for Large-Scale Social Networks via a Pre-Coarsening
Sampling Based NystrÖm Method 106Ying Kang, Bo Yu, Weiping Wang, and Dan Meng
pcStream: A Stream Clustering Algorithm for Dynamically Detecting
and Managing Temporal Contexts 119Yisroel Mirsky, Bracha Shapira, Lior Rokach, and Yuval Elovici
Trang 26Clustering Over Data Streams Based on Growing Neural Gas 134Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag
Computing and Mining ClustCube Cubes Efficiently 146Alfredo Cuzzocrea
Outlier and Anomaly Detection
Contextual Anomaly Detection Using Log-Linear Tensor Factorization 165Alpa Jayesh Shah, Christian Desrosiers, and Robert Sabourin
A Semi-Supervised Framework for Social Spammer Detection 177Zhaoxing Li, Xianchao Zhang, Hua Shen, Wenxin Liang,
Christos Faloutsos, and Athena Vakali
An Embedding Scheme for Detecting Anomalous Block Structured
Graphs 215Lida Rashidi, Sutharshan Rajasegarar, and Christopher Leckie
A Core-Attach Based Method for Identifying Protein Complexes
in Dynamic PPI Networks 228Jiawei Luo, Chengchen Liu, and Hoang Tu Nguyen
Mining Uncertain and Imprecise Data
Mining Uncertain Sequential Patterns in Iterative MapReduce 243Jiaqi Ge, Yuni Xia, and Jian Wang
Quality Control for Crowdsourced POI Collection 255Shunsuke Kajimura, Yukino Baba, Hiroshi Kajino,
and Hisashi Kashima
Towards Efficient Sequential Pattern Mining in Temporal Uncertain
Databases 268Jiaqi Ge, Yuni Xia, and Jian Wang
Preference-Based Top-k Representative Skyline Queries on Uncertain
Databases 280
Ha Thanh Huynh Nguyen and Jinli Cao
Trang 27Cluster Sequence Mining: Causal Inference with Time and Space
Proximity under Uncertainty 293Yoshiyuki Okada, Ken-ichi Fukui, Koichi Moriyama,
and Masayuki Numao
Achieving Accuracy Guarantee for Answering Batch Queries
with Differential Privacy 305Dong Huang, Shuguo Han, and Xiaoli Li
Mining Temporal and Spatial Data
Automated Classification of Passing in Football 319Michael Horton, Joachim Gudmundsson, Sanjay Chawla,
and Joël Estephan
Stabilizing Sparse Cox Model Using Statistic and Semantic Structures
in Electronic Medical Records 331Shivapratap Gopakumar, Tu Dinh Nguyen, Truyen Tran,
Dinh Phung, and Svetha Venkatesh
Predicting Next Locations with Object Clustering and Trajectory
Clustering 344Meng Chen, Yang Liu, and Xiaohui Yu
A Plane Moving Average Algorithm for Short-Term Traffic Flow
Prediction 357Lei Lv, Meng Chen, Yang Liu, and Xiaohui Yu
Recommending Profitable Taxi Travel Routes Based on Big Taxi
Trajectories Data 370Wenxin Yang, Xin Wang, Seyyed Mohammadreza Rahimi,
and Jun Luo
Semi Supervised Adaptive Framework for Classifying Evolving Data
Stream 383Ahsanul Haque, Latifur Khan, and Michael Baron
Feature Extraction and Selection
Cost-Sensitive Feature Selection on Heterogeneous Data 397Wenbin Qian, Wenhao Shu, Jun Yang, and Yinglong Wang
A Feature Extraction Method for Multivariate Time Series Classification
Using Temporal Patterns 409Pei-Yuan Zhou and Keith C.C Chan
Trang 28Scalable Outlying-Inlying Aspects Discovery via Feature Ranking 422Nguyen Xuan Vinh, Jeffrey Chan, James Bailey, Christopher Leckie,
Kotagiri Ramamohanarao, and Jian Pei
A DC Programming Approach for Sparse Optimal Scoring 435Hoai An Le Thi and Duy Nhat Phan
Graph Based Relational Features for Collective Classification 447Immanuel Bayer, Uwe Nagel, and Steffen Rendle
A New Feature Sampling Method in Random Forests for Predicting
High-Dimensional Data 459Thanh-Tung Nguyen, He Zhao, Joshua Zhexue Huang, Thuy Thi Nguyen,
and Mark Junjie Li
Mining Heterogeneous, High Dimensional, and Sequential Data
Seamlessly Integrating Effective Links with Attributes for Networked
Data Classification 473Yangyang Zhao, Zhengya Sun, Changsheng Xu, and Hongwei Hao
Clustering on Multi-source Incomplete Data via Tensor Modeling
and Factorization 485Weixiang Shao, Lifang He, and Philip S Yu
Locally Optimized Hashing for Nearest Neighbor Search 498Seiya Tokui, Issei Sato, and Hiroshi Nakagawa
Do-Rank: DCG Optimization for Learning-to-Rank in Tag-Based Item
Recommendation Systems 510Noor Ifada and Richi Nayak
Efficient Discovery of Recurrent Routine Behaviours in Smart Meter Time
Series by Growing Subsequences 522Jin Wang, Rachel Cardell-Oliver, and Wei Liu
Convolutional Nonlinear Neighbourhood Components Analysis for Time
Series Classification 534
Yi Zheng, Qi Liu, Enhong Chen, J Leon Zhao, Liang He,
and Guangyi Lv
Entity Resolution and Topic Modelling
Clustering-Based Scalable Indexing for Multi-party Privacy-Preserving
Record Linkage 549Thilina Ranbaduge, Dinusha Vatsalan, and Peter Christen
Trang 29Efficient Interactive Training Selection for Large-Scale Entity Resolution 562Qing Wang, Dinusha Vatsalan, and Peter Christen
Unsupervised Blocking Key Selection for Real-Time Entity Resolution 574Banda Ramadan and Peter Christen
Incorporating Probabilistic Knowledge into Topic Models 586Liang Yao, Yin Zhang, Baogang Wei, Hongze Qian, and Yibing Wang
Learning Focused Hierarchical Topic Models with Semi-Supervision
in Microblogs 598Anton Slutsky, Xiaohua Hu, and Yuan An
Predicting Future Links Between Disjoint Research Areas Using
Heterogeneous Bibliographic Information Network 610Yakub Sebastian, Eu-Gene Siew, and Sylvester Olubolu Orimaye
Itemset and High Performance Data Mining
CPT+: Decreasing the Time/Space Complexity of the Compact Prediction
Tree 625Ted Gueniche, Philippe Fournier-Viger, Rajeev Raman,
and Vincent S Tseng
Mining Association Rules in Graphs Based on Frequent Cohesive
Itemsets 637Tayena Hendrickx, Boris Cule, Pieter Meysman, Stefan Naulaerts,
Kris Laukens, and Bart Goethals
Mining High Utility Itemsets in Big Data 649Ying Chun Lin, Cheng-Wei Wu, and Vincent S Tseng
Decomposition Based SAT Encodings for Itemset Mining Problems 662Said Jabbour, Lakhdar Sais, and Yakoub Salhi
A Comparative Study on Parallel LDA Algorithms in MapReduce
Framework 675Yang Gao, Zhenlong Sun, Yi Wang, Xiaosheng Liu, Jianfeng Yan,
and Jia Zeng
Distributed Newton Methods for Regularized Logistic Regression 690Yong Zhuang, Wei-Sheng Chin, Yu-Chin Juan, and Chih-Jen Lin
Recommendation
Coupled Matrix Factorization Within Non-IID Context 707Fangfang Li, Guandong Xu, and Longbing Cao
Trang 30Complementary Usage of Tips and Reviews for Location
Recommendation in Yelp 720Saurabh Gupta, Sayan Pathak, and Bivas Mitra
Coupling Multiple Views of Relations for Recommendation 732Bin Fu, Guandong Xu, Longbing Cao, Zhihai Wang, and Zhiang Wu
Pairwise One Class Recommendation Algorithm 744Huimin Qiu, Chunhong Zhang, and Jiansong Miao
RIT: Enhancing Recommendation with Inferred Trust 756Guo Yan, Yuan Yao, Feng Xu, and Jian Lu
Author Index 769
Trang 31Social Networks and Social Media
Trang 32for Social Activity Organization
Chih-Ya Shen1(B), De-Nian Yang2, Wang-Chien Lee3, and Ming-Syan Chen1,2
1 Research Center for Information Technology Innovation,
Academia Sinica, Taipei, Taiwan
{chihya,mschen}@citi.sinica.edu.tw
2 Institute of Information Science, Academia Sinica, Taipei, Taiwan
dnyang@iis.sinica.edu.tw
3 Department of Computer Science and Engineering,
The Pennsylvania State University, University Park, USA
wlee@cse.psu.edu
Abstract The social presence theory in social psychology suggests that
computer-mediated online interactions are inferior to face-to-face, person interactions In this paper, we consider the scenarios of organiz-ing in person friend-making social activities via online social networks(OSNs) and formulate a new research problem, namely, Hop-boundedMaximum Group Friending (HMGF), by modeling both existing friend-ships and the likelihood of new friend making To find a set of atten-dees for socialization activities, HMGF is unique and challenging due
in-to the interplay of the group size, the constraint on existing friendshipsand the objective function on the likelihood of friend making We provethat HMGF is NP-Hard, and no approximation algorithm exists unless
P = N P We then propose an error-bounded approximation algorithm
to efficiently obtain the solutions very close to the optimal solutions Weconduct a user study to validate our problem formulation and performextensive experiments on real datasets to demonstrate the efficiency andeffectiveness of our proposed algorithm
of these activities shows that OSNs have been widely used as a convenient meansfor initiating real-life activities among friends
1 http://www.skout.com/
2 http://newsroom.fb.com/products/
3 http://www.meetup.com/about/
c
Springer International Publishing Switzerland 2015
T Cao et al (Eds.): PAKDD 2015, Part I, LNAI 9077, pp 3–15, 2015.
Trang 33On the other hand, to help users expand their circles of friends in thecyberspace, friend recommendation services have been provided in OSNs to sug-gest candidates to users who may likely become mutual friends in the future.Many friend recommendation services employ link prediction algorithms, e.g.,[10,11], to analyze the features, similarity or interaction patterns of users inorder to derive potential future friendship between some users By leveragingthe abundant information in OSNs, link prediction algorithms show high accu-racy for recommending online friends in OSNs.
As social presence theory [16] in social psychology suggests, computer-mediatedonline interactions are inferior to face-to-face, in-person interactions, off-line friend-making activities may be favorable to their on-line counterparts in cyberspace.Therefore, in this paper, we consider the scenarios of organizing face-to-face friend-making activities via OSN services Notice that finding socially cohesive groups ofparticipants is essential for maintaining good atmosphere for the activity More-over, the function of making new friends is also an important factor for the success ofsocial activities, e.g., assigning excursion groups in conferences, inviting attendees
to housewarming parties, etc Thus, for organizing friend-making social activities,both activity organization and friend recommendation services are fundamental.However, there is a gap between existing activity organization and friend recom-mendation services in OSNs for the scenarios under consideration Existing activityorganization approaches focus on extracting socially cohesive groups from OSNsbased on certain cohesive measures, density, diameter, of social networks or otherconstraints, e.g., time, spatial distance, and interests, of participants [5 8] On the
other hand, friend recommendation services consider only the existing friendships
to recommend potential new friends for an individual (rather than finding a group
of people for engaging friend-making) We argue that in addition to themes of mon interests, it is desirable to organize friend-making activities by mixing the
com-”potential friends”, who may be interested in knowing each other (as indicated by alink prediction algorithm), with existing friends (as lubricators) To the best knowl-edge of the authors, the following two important factors, 1) the existing friendshipamong attendees, and 2) the potential friendship among attendees, have not beenconsidered simultaneously in existing activity organization services To bridge thegap, it is desirable to propose a new activity organization service that carefullyaddresses these two factors at the same time
In this paper, we aim to investigate the problem of selecting a set of didate attendees from the OSN by considering both the existing and potentialfriendships among the attendees To capture the two factors for activity organi-zation, we propose to include the likelihood of making new friends in the socialnetwork As such, we formulate a new research problem to find groups withtight social relationships among existing friends and potential friends (i.e., whoare not friends yet) Specifically, we model the social network in the OSN as
can-a heterogeneous socican-al grcan-aph G = (V, E, R) with edge weight w : R → (0, 1],
where V is the set of individuals, E is the set of friend edges, and R is the set
of potential friend edges (or potential edges for short) Here a friend edge (u, v) denotes that individuals u and v are mutual friends, while a potential edge [u , v ]
Trang 34f
b c a
(a) Input Graph G.
g f a
e
0.7
0.9 0.8
0.4 0.5
0.7
0.9 0.8
(d) H3
Fig 1 Illustrative Example
indicates that individuals u and v are likely to become friends (the edge weight
w[u , v ] quantifies the likelihood) The potential edges and the correspondingedge weights can be obtained by employing a link prediction algorithm in friendrecommendation
Given a heterogeneous social graph G = (V, E, R) as described above, we formulate a new problem, namely, Hop-bounded Maximum Group Friending
(HMGF), to find a group that 1) maximizes the likelihood of making new friends
among the group, i.e., the group has the highest ratio of total potential edgeweight to group size, 2) ensures that the social tightness, i.e., hop count on
friend edges in G between each pair of individuals is small, and 3) is a
suffi-ciently large group, i.e., too small a group may not work well for socializationactivities
Figure 1 illustrates the social graph and the interplay of the above factors.Figure1(a)shows a social graph, where a dash line, e.g., [a, b] with weight 0.6, is
a potential edge and a solid line, e.g., (c, d), is a friend edge Figure 1(b)shows
a group H1:{a, e, f, g} which has many potential edges and thus a high total
weight However, not all the members of this group have common friends associal lubricators Figure1(c)shows a group H2:{c, d, f, g} tightly connected by
friend edges While H2 may be a good choice for gathering of close friends, thegoal of friend-making in socialization activities is missed Finally, Figure 1(d)
shows H3:{d, e, f, g} which is a better choice than H1 and H2 for socialization
activities because each member of H3 is within 2 hops of another member via
friend edges in G Moreover, the average potential edge weight among them is
high, indicating members are likely to make some new friends
Processing HMGF to find the best solution is very challenging because thereare many important factors to consider, including hop constraint, group size andthe total weight of potential edges in a group Indeed, we prove that HMGF is anNP-Hard problem with no approximation algorithm Nevertheless, we prove that
if the hop constraint can be slightly relaxed to allow a small error, there exists a3-approximation algorithm for HMGF Theoretical analysis and empirical resultsshow that our algorithm can obtain good solutions efficiently
The contributions made in this study are summarized as follows
– For socialization activity organization, we propose to model the existingfriendship and the potential friendship in a heterogeneous social graph andformulate a new problem, namely, Hop-bounded Maximum Group Friending(HMGF), for finding suitable attendees To our best knowledge, HMGF is
Trang 35the first problem that considers these two important relationships betweenattendees for activity organization.
– We prove that HMGF is NP-Hard and there exists no approximation
rithm for HMGF unless P = N P We then propose an approximation
algo-rithm, called MaxGF, with a guaranteed error bound for solving HMGFefficiently
– We conduct a user study on 50 users to validate our argument for ing both existing and potential friendships in activity organization We alsoperform extensive experiments on real datasets to evaluate the proposedalgorithm Experimental results manifest that HMGF can obtain solutionsvery close to the optimal ones, very efficiently
consider-The rest of this paper is organized as follows Section2 formulates HMGFand proves it NP-Hard with no approximation algorithm Section3reviews therelated works, and Section 4 details the algorithm design Section 5 reports auser study and experimental results Section6 concludes this paper
Based on the description of heterogeneous social graph described earlier, here we
formulate the Hop-bounded Maximum Group Friending (HMGF) tackled in this paper Given two individuals u and v, let d E (u, v) be the shortest path between
u and v via friend edges in G Moreover, given H ⊆ G, let w(H) denote the total
weight of potential edges in H and let average weight, σ(H) = w(H) |H| denote the
average weight of potential edges connected to each individual in H4 HMGF isformulated as follows
Problem: Hop-bounded Maximum Group Friending (HMGF).
Given: Social network G = (V, E, R), hop constraint h, and size constraint p.
|H| ≥ p and d E (u, v) ≤ h, ∀u, v ∈ H.
Efficient processing of HMGF is very challenging due to the following reasons:
1) The interplay of the total weight w(H) and the size of H To maximize σ(H), finding a small H may not be a good choice because the number of edges in
a small graph tends to be small as well On the other hand, finding a large H (which usually has a high w(H)) may not lead to an acceptable σ(H), either.
Therefore, the key is to strike a good balance between the graph size|H| and the
total weight w(H) 2) HMGF includes a hop constraint (say h = 2) on friend
edges to ensure that every pair of individuals is not too distant socially from
each other However, selecting a potential edge [u, v] with a large weight w[u, v] may not necessarily satisfy the hop constraint, i.e., d E (u, v) > h which is defined
based on existing friend edges In this case, it may not always be a good strategy
to prioritize on large-weight edges in order to maximize σ(H), especially when
u and v do not share a common friend nearby via the friend edges.
4 Note that σ(H) = 0 if H = ∅.
Trang 36In the following, we prove that HMGF is NP-Hard and not approximable
within any factor In other words, there exists no approximation algorithm forHMGF
Theorem 1 HMGF is NP-Hard and there is no approximation algorithm for
HMGF unless P = N P
Proof Due to the space constraints, we prove this theorem in the full version of
this paper (available online [1])
The above theorem manifests that HMGF has no approximation algorithm
Nevertheless, we show that HMGF becomes approximable if a small error h is
allowed in the hop constraint More specifically, in Section 4, we first propose
an error-bounded approximation algorithm for HMGF, which returns a solution
with guaranteed σ(H), while d E (u, v) for any two vertices u and v in H may exceed h but is always bounded by 2h Afterward, we present a post-processing
procedure to tailor the solution for satisfying the hop constraint
Extracting dense subgraphs or social cohesive groups among social networks is
a natural way for selecting a set of close friends for a gathering Various socialcohesive measures have been proposed for finding dense social subgraphs, e.g.,diameter [2], density [3], clique and its variations [4] Although these social cohe-sive measures cover a wide range of application scenarios, they focus on deriv-ing groups based only on existing friendship in the social network In contrast,the HMGF studied in this paper aims to extract groups by considering boththe existing and potential friendships for socialization activities Therefore, theexisting works mentioned above cannot be directly applied to HMGF tackled inthis paper
Research on finding a set of attendees for activities based on the social ness among existing friends [5 9] have been reported in the literature Social-Temporal Group Query [5] checks the available times of attendees to find thesocial cohesive group with the most suitable activity time Geo-Social GroupQuery [6,7] extracts socially tight groups while considering certain spatial prop-erties The willingness optimization for social group problem in [8] selects a set
tight-of attendees for an activity while maximizing their willingness to participate.Finally, [9] finds a set of compatible members with tight social relationships
in the collaboration network Although these works find suitable attendees foractivities based on existing friendship among the attendees, they ignore the like-lihood of making new friends among the attendees Therefore, these works maynot be suitable for socialization activities discussed in this paper
Link prediction analyzes the features, similarity or interaction patterns amongindividuals in order to recommend possible friends to the users [10–14] Link pre-diction algorithms employ different approaches including graph-topological fea-tures, classification models, hierarchical probabilistic model, and linear algebraicmethods These works show good prediction accuracy for friend recommendation
Trang 37in social networks In this paper, to estimate the likelihood of how individuals maypotentially become friends in the future, we employ link prediction algorithms forderiving the potential edges among the individuals.
To the best knowledge of the authors, there exists no algorithm for activityorganization that considers both the existing friendship and the likelihood ofmaking new friends when selecting activity attendees The HMGF studied in thispaper examines the social tightness among existing friends and the likelihood ofbecoming friends for non-friend attendees We envisage that our research resultcan be employed in various social network applications for activity organization
To tackle HMGF, a naive approach is to enumerate all possible combinations of
vertices, and extracts the subgraph H with the maximum σ(H) following the hop
and group size constraints However, this approach is computationally expensiveand thus not applicable for a large-scale social network To efficiently answer
HMGF, we propose an algorithm, called MaxGF, which is a 3-approximation algorithm with a guaranteed error bound h MaxGF limits the search space of
candidate solutions by dividing the graph into different hop-bounded subgraphssuch that their sizes are much smaller than|V | Then, it employs a greedy app-
roach on the hop-bounded subgraphs to iteratively remove the vertices that are
inclined to generate a small σ(H) Specifically, we define the incident weight of a vertex v in an induced subgraph H ⊆ G as τ H (v), where τ H (v) =
u∈H w[v, u],
i.e., the incident weight of v is the total weight of the potential edges incident
to v in H By carefully examining the incident weights of the vertices, we can
remove from the hop-bounded subgraph those vertices that contribute no gain
in the objective function Moreover, we propose an effective pruning strategy fortrimming redundant search Finally, a post-processing procedure is proposed toensure that the returned solution follows the hop constraint
The pseudo code of MaxGF is presented in Algorithm 1 Basically, to obtainthe hop-bounded subgraphs, MaxGF sorts the vertices in terms of their incident
weights and iteratively selects a vertex v with the maximum incident weight from G as a reference vertex A hop-bounded subgraph H v is constructed from
v by including every vertex u with at most h hops from v on the friend edges,
i.e., H v ={u|d E (u, v) ≤ h} Moreover, if |H v | < p, it is no longer necessary to
examine H v because any subgraph in H vwill never be a feasible solution due tothe size constraint Therefore, redundant search space is effectively pruned
In addition, another pruning condition is also proposed to further prune the
resulted subgraph H v Let S AP X denote the best solution obtained so far If
half of the maximum incident weight among the vertices u in H v , i.e., (1/2) ·
maxu∈H v τ H v (u), does not exceed σ(S AP X), there will never be any solution
Trang 38Algorithm 1 MaxGF
Input: Social graph G = (V, E, R), hop constraint h, and size constraint p
2: while U = ∅ do
3: v ← arg max u∈U τ G (u), U ← U − {v}
4: let H v be the induced subgraph of G with vertices as {u|d E G (u, v) ≤ h}
2· max u∈H v τ H v (u) ≤ σ(S AP X) holds, there exists no subgraph in
H v with the average weight larger than σ(S AP X ), and H v can be pruned
Next, MaxGF starts to find the solution in H v with the maximized averageweight, which includes|H v | steps Let S i+1 denote the subgraph after removing
a vertex ˆv i from S i in step i That is, we set S1= H v initially, and at each step i afterwards, S i+1 is the subgraph S i −{ˆv i } During each step i, ˆv iis selected as the
vertex which has the lowest incident weight in S i, i.e., ˆv i = arg minu∈S i τ i (u).
This is based on the intuition that excluding vertices with low incident weights
is more inclined to increase the average weight of the the remaining subgraph.Then, ˆv i and its incident potential edges are removed from S iand the remaining
graph is S i+1 Then, S i+1 is processed in the next step i+1 The above procedure ends until S i is empty
To maximize the objective function σ(H) = w(H) |H| , after a hop-bounded
sub-graph H v is processed, S ∗ is extracted as the subgraph S i with the maximum
σ(S i ) in H v where |S i | ≥ p If σ(S ∗ ) > σ(S AP X ), we replace S AP X with S ∗.
Then, we continue to extract the next vertex v for examining the corresponding
hop-bounded subgraph H v until all vertices have been examined Afterward,
a post-processing procedure (detailed in Section 4.3) is employed on the best
solution obtained in the algorithm, i.e., S AP X, to ensure that the hop constraint
is satisfied and to further maximize σ(S AP X ) Finally, S AP X is output as thesolution
Trang 394.2 Theoretical Bound
In the following, given the hop-bounded subgraph H v, we first prove that there
exists a subgraph F ⊆ H v such that 3· w(F ) is an upper bound of the total
potential edge weight of the optimal solution to the HMGF instance on H v
Then, we prove that for each H v , the average weight of S ∗ obtained in the
algorithm, i.e., σ(S ∗), is at least 13 the average weight of the optimal solution
of HMGF on H v Finally, based on the properties of the hop-bounded subgraph
and S AP X, we prove that the proposed algorithm is a 3-approximation algorithmwith guaranteed error bound to HMGF
Let S v OP T denote the optimal solution of the HMGF instance on H v with
σ(S v OP T ) > 0, we first prove that the largest subgraph F in H v , where τ F (u) ≥
2
3σ(S v OP T),∀u ∈ F , is not an empty graph.
3σ(S v OP T ), ∀u ∈ F ,
is not an empty graph.
Proof The proof is presented in the online version [1]
With the existence of F proven above, we now derive an upper bound of the total potential edge weight of S v OP T , i.e., w(S v OP T ), according to w(F ).
Proof The proof is presented in the online version [1]
Then, with the properties derived above, we turn our attention to analyzing
MaxGF proposed in this section In MaxGF, given H v and when we are tively extracting ˆv i which has the minimum incident weight in S i, if ˆv i is thefirst extracted vertex such that ˆv i ∈ F (i.e., step i is the earliest step such that
itera-ˆi ∈ F ), then we have the following lemma.
ˆi from S i is in F , then τ S i (u) ≥ 2
3σ(S v OP T ), ∀u ∈ S i Moreover, F = S i Proof The proof is presented in the online version [1]
We combine the results obtained above, and derive the bound on σ(S ∗), where
S ∗ is the group S i which has the maximum σ(S i ) among all S i with |S i | ≥ p
obtained by MaxGF in H v Please note that Lemma 3 proves that during thesteps of extracting ˆv i from S i, there exists ˆv i with τ S i(ˆv i)≥2
Proof The proof is presented in the online version [1]
Finally, let S OP T denote the optimal solution of HMGF on G, the following theorem proves that the solution obtained by MaxGF, i.e., S AP X , has σ(S AP X)
at least 1· σ(S OP T ), and the error is bounded by h.
Trang 40Theorem 3 MaxGF returns the solution S AP X with σ(S AP X)≥ σ(S OP T)
d E (u, v) ≤ 2 · h, ∀u, v ∈ S AP X .
Proof The proof is presented in the online version [1]
A post-processing procedure is designed to tailor S AP X for meeting the hopconstraint and further maximizing the average weight More specifically, given
S AP X obtained in the algorithm, we first define the notion of boundary vertices.
A vertex u in S AP X is a boundary vertex if there exists at least one other vertex
v in S AP X such that the shortest path from u to v via friend edges contains more than h edges Let B denote the set of boundary vertices MaxGF includes the
following adjustment steps in the post-processing procedure 1) Expand: a vertex
v ∈ (V \S AP X ) can be added into S AP X if adding v does not increase |B| and
increases σ(S AP X ) We give priority to the v which maximizes σ(S AP X ∪{v}) 2)
Shrink: given a boundary vertex u ∈ B, u can be safely removed if after removing
u from S AP X,|B| decreases but σ(S AP X ) does not We give priority to the u that
maximizes σ(S AP X −{u}) Please note that the above post-processing procedure
minimizes maxu,v∈S AP X d E (u, v) while increasing σ(S AP X) Therefore, after postprocessing, the performance and error bounds in Theorem3 still hold
The detailed analysis is presented in the online version of this paper [1]
We implement HMGF in Facebook and invite 50 users to participate in our userstudy Each user, given 12 test cases of HMGF using her friends in Facebook asthe input graph, is asked to solve the HMGF cases, and compare her results withthe solutions obtained by MaxGF In addition to the user study, we evaluate theperformance of MaxGF on two real social network datasets, i.e., FB [15] andthe MS dataset from KDD Cup 20135 The FB dataset is extracted from Face-book with 90K vertices, and MS is a co-author network with 1.7M vertices Weextract the friend edges from these datasets and identify the potential edges with
a link prediction algorithm [11] The weight of a potential edge is ranged within(0,1] Moreover, we compare MaxGF with two algorithms, namely, Baseline andDkS [3] Baseline finds the optimal solution of HMGF by enumerating all the
subgraphs satisfying the constraints, while DkS is an O( |V |1/3)-approximation
algorithm for finding a p-vertex subgraph H ⊆ G with the maximum density
on E ∪ R without considering the potential edges and the hop constraint The
algorithms are implemented in an IBM 3650 server with Quadcore Intel X54503.0 GHz CPUs We measure 30 samples in each scenario In the following, Fea-Ratio and ObjRatio respectively denote the ratio of feasibility (i.e., the portion
5 https://www.kaggle.com/c/kdd-cup-2013-author-paper-identification-challenge/
data