Handbook of Research on Advanced Data Mining Techniques and Applications for Business Intelligence Shrawan Kumar Trivedi BML Munjal University, India Shubhamoy Dey Indian Institute of M
Trang 2Handbook of Research
on Advanced Data
Mining Techniques and Applications for Business Intelligence
Shrawan Kumar Trivedi
BML Munjal University, India
Shubhamoy Dey
Indian Institute of Management Indore, India
Anil Kumar
BML Munjal University, India
Tapan Kumar Panda
Jindal Global Business School, India
A volume in the Advances in Business
Information Systems and Analytics (ABISA)
Book Series
Trang 3Published in the United States of America by
Web site: http://www.igi-global.com
Copyright © 2017 by IGI Global All rights reserved No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher Product or company names used in this set are for identification purposes only Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material The views expressed in this book are those of the authors, but not necessarily of the publisher.
For electronic access to this publication, please contact: eresources@igi-global.com
CIP Data Pending
ISBN: 978-1-5225-2031-3
eISBN: 978-1-5225-2032-0
This book is published in the IGI Global book series Advances in Business Information Systems and Analytics (ABISA) (ISSN: 2327-3275; eISSN: 2327-3283)
Trang 4The Advances in Business Information Systems and Analytics (ABISA) Book Series (ISSN 2327-3275) is published by IGI Global, 701
E Chocolate Avenue, Hershey, PA 17033-1240, USA, www.igi-global.com This series is composed of titles available for purchase ally; each title is edited to be contextually exclusive from any other title within the series For pricing and ordering information please visit http://www.igi-global.com/book-series/advances-business-information-systems-analytics/37155 Postmaster: Send all address changes to above address Copyright © 2017 IGI Global All rights, including translation in other languages reserved by the publisher No part of this series may be reproduced or used in any form or by any means – graphics, electronic, or mechanical, including photocopying, recording, taping,
individu-or infindividu-ormation and retrieval systems – without written permission from the publisher, except findividu-or non commercial, educational use, including classroom teaching purposes The views expressed in this series are those of the authors, but not necessarily of IGI Global.
IGI Global is currently accepting manuscripts for publication within this series To submit a pro-posal for a volume in this series, please contact our Acquisition Editors at Acquisitions@igi-global.com
or visit: http://www.igi-global.com/publish/
• Decision Support Systems
• Legal information systems
• Business Intelligence
• Data Analytics
• Business Process Management
• Business Information Security
• Management information systems
• Data Management
• Strategic Information Systems
• Statistics
Coverage
The successful development and management of information systems and business analytics is crucial
to the success of an organization New technological developments and methods for data analysis have allowed organizations to not only improve their processes and allow for greater productivity, but have also provided businesses with a venue through which to cut costs, plan for the future, and maintain competitive advantage in the information age
The Advances in Business Information Systems and Analytics (ABISA) Book Series aims to present
diverse and timely research in the development, deployment, and management of business information systems and business analytics for continued organizational development and improved business value
Mission
ISSN:2327-3275 EISSN:2327-3283
Madjid Tavana
La Salle University, USA
Advances in Business Information Systems and Analytics (ABISA) Book Series
Trang 5Titles in this Series
For a list of additional titles in this series, please visit: www.igi-global.com
Business Analytics and Cyber Security Management in Organizations
Rajagopal (EGADE Business School, Tecnologico de Monterrey, Mexico City, Mexico & Boston University, USA) and Ramesh Behl (International Management Institute, Bhubaneswar, India)
Business Science Reference • copyright 2017 • 346pp • H/C (ISBN: 9781522509028) • US $215.00 (our price)
Handbook of Research on Intelligent Techniques and Modeling Applications in Marketing Analytics
Anil Kumar (BML Munjal University, India) Manoj Kumar Dash (ABV-Indian Institute of Information ogy and Management, India) Shrawan Kumar Trivedi (BML Munjal University, India) and Tapan Kumar Panda (BML Munjal University, India)
Technol-Business Science Reference • copyright 2017 • 428pp • H/C (ISBN: 9781522509974) • US $275.00 (our price)
Applied Big Data Analytics in Operations Management
Manish Kumar (Indian Institute of Information Technology, Allahabad, India)
Business Science Reference • copyright 2017 • 251pp • H/C (ISBN: 9781522508861) • US $160.00 (our price)
Eye-Tracking Technology Applications in Educational Research
Christopher Was (Kent State University, USA) Frank Sansosti (Kent State University, USA) and Bradley Morris (Kent State University, USA)
Information Science Reference • copyright 2017 • 370pp • H/C (ISBN: 9781522510055) • US $205.00 (our price)
Strategic IT Governance and Alignment in Business Settings
Steven De Haes (Antwerp Management School, University of Antwerp, Belgium) and Wim Van Grembergen (Antwerp Management School, University of Antwerp, Belgium)
Business Science Reference • copyright 2017 • 298pp • H/C (ISBN: 9781522508618) • US $195.00 (our price)
Organizational Productivity and Performance Measurements Using Predictive Modeling and Analytics
Madjid Tavana (La Salle University, USA) Kathryn Szabat (La Salle University, USA) and Kartikeya Puranam (La Salle University, USA)
Business Science Reference • copyright 2017 • 400pp • H/C (ISBN: 9781522506546) • US $205.00 (our price)
Data Envelopment Analysis and Effective Performance Assessment
Farhad Hossein Zadeh Lotfi (Islamic Azad University, Iran) Seyed Esmaeil Najafi (Islamic Azad University, Iran) and Hamed Nozari (Islamic Azad University, Iran)
Business Science Reference • copyright 2017 • 365pp • H/C (ISBN: 9781522505969) • US $160.00 (our price)
701 E Chocolate Ave., Hershey, PA 17033Order online at www.igi-global.com or call 717-533-8845 x100
To place a standing order for titles released in this series, contact: cust@igi-global.com
Mon-Fri 8:00 am - 5:00 pm (est) or fax 24 hours a day 717-533-8661
Trang 6Editorial Advisory Board
AnkitaTripathi,Amity University Gurgaon – Haryana, India
List of Reviewers
A.SheikAbdullah,Thiagarajar College of Engineering, India
A.M.Abirami,Thiagarajar College of Engineering, India
M.AfsharAlam,Jamia Hamdard University, India
TamizhArasi,VIT University, India
A.Askarunisa,KLN College of Information Technology, India
BalamuruganBalusamy,VIT University, India
YiChai,Chongqing University, China
A.A.Chari,Rayalaseema University, India
T.K.Das,VIT University, India
HirakDasgupta,Symbiosis International University, India
SanjivaShankarDubey,SSD Consulting, India
G.R.Gangadharan,University of Hyderabad, India
BelayGebremeskel,Chongqing University, China
RashikGupta,BML Munjal University, India
ZhongshiHe,Chongqing University, China
PriyaJha,VIT University, India
PonnuruRamalingaKarteek,BML Munjal University, India
KaushikKumar,Birla Institute of Technology Mesra, India
RaghvendraKumar,Lakshmi Narain College of Technology Jabalpur, India
C.Mahalakshmi,Thiagarajar College of Engineering, India
AmirManzoor,Bahria University, Pakistan
VinodKumarMishra,Bipin Tripathi Kumaon Institute of Technology, India
PriyankaPandey,Lakshmi Narain College of Technology Jabalpur, India
PrasantKumarPattnaik,KIIT University, India
S.Rajaram,Thiagarajar College of Engineering, India
VadlamaniRavi,University of Hyderabad, India
SupriyoRoy,Birla Institute of Technology Mesra, India
Trang 7HannaSawicka,Poznan University of Technology, Poland
S.Selvakumar,G K M College of Engineering and Technology, India
NitaH.Shah,Gujarat University, India
PouryaShamsolmoali,CMCC, Italy
AruneshSharan,AS Consulting, India
G.Sreedhar,Rastriya Sanskrit Vidhya Pheet University, India
TimmarajuSrimanyu,University of Hyderabad, India
R.Suganya,Thiagarajar College of Engineering, India
K.Suneetha,Jawaharlal Nehru Technological University, India
HimanshuTiruwa,Bipin Tripathi Kumaon Institute of Technology, India
KhadijaAliVakeel,Indian Institute of Management Indore, India
MalathiVelu,VIT University, India
MasoumehZareapoor,Shanghai Jiao Tong University, China
Trang 8List of Contributors
Abdullah, A Sheik/Thiagarajar College of Engineering, India 1,34,162 Abirami, A M./Thiagarajar College of Engineering, India 1,162
Alam, M Afshar/Jamia Hamdard University, India 62
Arasi, Tamizh /VIT University, India 259
Askarunisa, A /KLN College of Information Technology, India 162
Balusamy, Balamurugan /VIT University, India 259
Biswas, Animesh /University of Kalyani, India 353
Chari, A Anandaraja/Rayalaseema University, India 298
Das, T K./VIT University, India 142
Dasgupta, Hirak /Symbiosis Institute of Management Studies, India 15
De, Arnab Kumar/Government College of Engineering and Textile Technology, India 353
Dubey, Sanjiva Shankar/BIMTECH Greater Noida, India 209
Gangadharan, G R./Institute for Development and Research in Banking Technology, India 379
Gebremeskel, Gebeyehu Belay/Chongqing University, China 90
Gupta, Rashik /BML Munjal University, India 192
He, Zhongshi /Chongqing University, China 90
Jha, Priya /VIT University, India 259
Kumar, Kaushik /Birla Institute of Technology, India 284
Kumar, Raghvendra /LNCT College, India 52
Mahalakshmi, C /Thiagarajar College of Engineering, India 162
Manzoor, Amir /Bahria University, Pakistan 128,225 Mishra, Vinod Kumar/Bipin Tripathi Kumaon Institute of Technology, India 175
Pandey, Priyanka /LNCT College, India 52
Pattnaik, Prasant Kumar/KIIT University, India 52
Ponnuru, Karteek Ramalinga/BML Munjal University, India 192
Rajaram, S /Thiagarajar College of Engineering, India 34
Ravi, Vadlamani /Institute for Development and Research in Banking Technology, India 379
Roy, Supriyo /Birla Institute of Technology, India 284
Sawicka, Hanna /Poznan University of Technology, Poland 315
Selvakumar, S /G K M College of Engineering and Technology, India 1,34,162 Shah, Nita H./Gujarat University, India 341
Shamsolmoali, Pourya /CMCC, Italy 62
Sharan, Arunesh /AS Consulting, India 209
Sreedhar, G /Rashtriya Sanskrit Vidyapeetha (Deemed University), India 298
Suganya, R /Thiagarajar College of Engineering, India 34
Trang 9Suneetha, Keerthi /SVEC, India 240
Timmaraju, Srimanyu /Institute for Development and Research in Banking Technology, India 379
Tiruwa, Himanshu /Bipin Tripathi Kumaon Institute of Technology, India 175
Trivedi, Shrawan Kumar/BML Munjal University, India 192
Vakeel, Khadija Ali/Indian Institute of Management Indore, India 250
Velu, Malathi /VIT University, India 259
Yi, Chai /Chongqing University, China 90
Zareapoor, Masoumeh /Shanghai Jiao Tong University, China 62
Trang 10Table of Contents
Preface xxii Acknowledgment xxvi
Section 1 Business Intelligence With Data Mining: Process and Applications Chapter 1
AnIntroductiontoDataAnalytics:ItsTypesandItsApplications 1
A Sheik Abdullah, Thiagarajar College of Engineering, India
S Selvakumar, G K M College of Engineering and Technology, India
A M Abirami, Thiagarajar College of Engineering, India
A Sheik Abdullah, Thiagarajar College of Engineering, India
R Suganya, Thiagarajar College of Engineering, India
S Selvakumar, G K M College of Engineering and Technology, India
S Rajaram, Thiagarajar College of Engineering, India
Chapter 4
SecureDataAnalysisinClusters(IrisDatabase) 52
Raghvendra Kumar, LNCT College, India
Prasant Kumar Pattnaik, KIIT University, India
Priyanka Pandey, LNCT College, India
Chapter 5
DataMiningforSecureOnlinePaymentTransaction 62
Masoumeh Zareapoor, Shanghai Jiao Tong University, China
Pourya Shamsolmoali, CMCC, Italy
M Afshar Alam, Jamia Hamdard University, India
Trang 11Chapter 6
TheIntegralofSpatialDataMiningintheEraofBigData:AlgorithmsandApplications 90
Gebeyehu Belay Gebremeskel, Chongqing University, China
Chai Yi, Chongqing University, China
Zhongshi He, Chongqing University, China
Section 2 Social Media Analytics With Sentiment Analysis: Business Applications and Methods Chapter 7
A M Abirami, Thiagarajar College of Engineering, India
A Sheik Abdullah, Thiagarajar College of Engineering, India
A Askarunisa, KLN College of Information Technology, India
S Selvakumar, G K M College of Engineering and Technology, India
C Mahalakshmi, Thiagarajar College of Engineering, India
Chapter 10
Aspect-BasedSentimentAnalysisofOnlineProductReviews 175
Vinod Kumar Mishra, Bipin Tripathi Kumaon Institute of Technology, India
Himanshu Tiruwa, Bipin Tripathi Kumaon Institute of Technology, India
Chapter 11
SentimentAnalysiswithSocialMediaAnalytics,Methods,Process,andApplications 192
Karteek Ramalinga Ponnuru, BML Munjal University, India
Rashik Gupta, BML Munjal University, India
Shrawan Kumar Trivedi, BML Munjal University, India
Chapter 12
OrganizationalIssueforBISuccess:CriticalSuccessFactorsforBIImplementationswithintheEnterprise 209
Sanjiva Shankar Dubey, BIMTECH Greater Noida, India
Arunesh Sharan, AS Consulting, India
Chapter 13
EthicsofSocialMediaResearch 225
Amir Manzoor, Bahria University, Pakistan
Trang 12Section 3 Big Data Analytics: Its Methods and Applications Chapter 14
Balamurugan Balusamy, VIT University, India
Priya Jha, VIT University, India
Tamizh Arasi, VIT University, India
Malathi Velu, VIT University, India
Chapter 17
StrategicBest-in-ClassPerformanceforVoicetoCustomer:IsBigDatainLogisticsaPerfect
Match? 284
Supriyo Roy, Birla Institute of Technology, India
Kaushik Kumar, Birla Institute of Technology, India
Section 4 Advanced Data Analytics: Decision Models and Business Applications
Chapter 18
FirstLookonWebMiningTechniquestoImproveBusinessIntelligenceofE-Commerce
Applications 298
G Sreedhar, Rashtriya Sanskrit Vidyapeetha (Deemed University), India
A Anandaraja Chari, Rayalaseema University, India
Animesh Biswas, University of Kalyani, India
Arnab Kumar De, Government College of Engineering and Textile Technology, India
Trang 14Detailed Table of Contents
Preface xxii Acknowledgment xxvi
Section 1 Business Intelligence With Data Mining: Process and Applications Chapter 1
AnIntroductiontoDataAnalytics:ItsTypesandItsApplications 1
A Sheik Abdullah, Thiagarajar College of Engineering, India
S Selvakumar, G K M College of Engineering and Technology, India
A M Abirami, Thiagarajar College of Engineering, India
Dataanalyticsmainlydealswiththescienceofexaminingandinvestigatingrawdatatoderiveusefulpatternsandinference.Dataanalyticshasbeendeployedinmanyoftheindustriestomakedecisionsatproperlevels.Itfocusesupontheassumptionandevaluationofthemethodwiththeintentionofderivingaconclusionatvariouslevels.Varioustypesofdataanalyticaltechniquessuchaspredictiveanalytics,prescriptiveanalytics,descriptiveanalytics,textanalytics,andsocialmediaanalyticsareusedbyindustrialorganizations,educationalinstitutionsandbygovernmentassociations.Thiscontextmainlyfocusestowardstheillustrationofcontextualexamplesforvarioustypesofanalyticaltechniquesanditsapplications
Chapter 2
DataMiningandStatistics:ToolsforDecisionMakingintheAgeofBigData 15
Hirak Dasgupta, Symbiosis Institute of Management Studies, India
Intheageofinformation,theworldaboundswithdata.Inordertoobtainanintelligentappreciationofcurrentdevelopments,weneedtoabsorbandinterpretsubstantialamountsofdata.Theamountofdatacollectedhasgrownataphenomenalrateoverthepastfewyears.Thecomputeragehasgivenusboththepowertorapidlyprocess,summarizeandanalysedataandtheencouragementtoproduceandstoremoredata.Theaimofdataminingistomakesenseoflargeamountsofmostlyunsuperviseddata,insomedomain.DataMiningisusedtodiscoverthepatternsandrelationshipsindata,withanemphasisonlargeobservationaldatabases.ThischapteraimstocomparetheapproachesandconcludethatStatisticiansandDataminerscanprofitbystudyingeachother’smethodsbyusingthecombinationofmethodsjudiciously.Thechapteralsoattemptstodiscussdatacleaningtechniquesinvolvedindatamining
Trang 15Chapter 3
DataClassification:ItsTechniquesandBigData 34
A Sheik Abdullah, Thiagarajar College of Engineering, India
R Suganya, Thiagarajar College of Engineering, India
S Selvakumar, G K M College of Engineering and Technology, India
S Rajaram, Thiagarajar College of Engineering, India
Classificationisconsideredtobetheoneofthedataanalysistechniquewhichcanbeusedovermanyapplications.Classificationmodelpredictscategoricalcontinuousclasslabels.Clusteringmainlydealswithgroupingofvariablesbaseduponsimilarcharacteristics.Classificationmodelsareexperiencedbycomparingthepredictedvaluestothatoftheknowntargetvaluesinasetoftestdata.Dataclassificationhasmanyapplicationsinbusinessmodeling,marketinganalysis,creditriskanalysis;biomedicalengineeringanddrugretortmodeling.Theextensionofdataanalysisandclassificationmakestheinsightintobigdatawithanexplorationtoprocessingandmanaginglargedatasets.Thischapterdealswithvarioustechniques,methodologiesthatcorrespondtotheclassificationproblemindataanalysisprocessanditsmethodologicalimpactstobigdata
Chapter 4
SecureDataAnalysisinClusters(IrisDatabase) 52
Raghvendra Kumar, LNCT College, India
Prasant Kumar Pattnaik, KIIT University, India
Priyanka Pandey, LNCT College, India
This chapter used privacy preservation techniques (Data Modification) to ensure Privacy. Privacypreservationisanotherimportantissue.Apicture,wherenumberofclientsowningtheirclustereddatabases(IrisDatabase)wishtorunadataminingalgorithmontheunionoftheirdatabases,withoutrevealinganyunnecessaryinformationandrequirestheprivacyoftheprivilegedinformation.Therearenumbersofefficientprotocolsarerequiredforprivacypreservingindatamining.Thischapterpresentedvariousprivacypreservingprotocolsthatareusedforsecurityinclustereddatabases.TheXln(X)protocolandthesecuresumprotocolareusedinmutualcomputing,whichcandefendprivacyefficiently.Itsfocusesonthedatamodificationtechniques,whereithasbeenmodifiedourdistributeddatabaseandafterthatsandedthatmodifieddatasettotheclientadminforsecuredatacommunicationwithzeropercentageofdataleakageandalsoreducethecommunicationandcomputationcomplexity
Chapter 5
DataMiningforSecureOnlinePaymentTransaction 62
Masoumeh Zareapoor, Shanghai Jiao Tong University, China
Pourya Shamsolmoali, CMCC, Italy
M Afshar Alam, Jamia Hamdard University, India
Thefrauddetectionmethodrequiresaholisticapproachwheretheobjectiveistocorrectlyclassifythetransactionsaslegitimateorfraudulent.Theexistingmethodsgiveimportancetodetectallfraudulenttransactionssinceitresultsinmoneyloss.Forthismostofthetime,theyhavetocompromiseonsomegenuinetransactions.Thus,themajorissuethatthecreditcardfrauddetectionsystemsfacetodayisthatasignificantpercentageoftransactionslabelledasfraudulentareinfactlegitimate.These“falsealarms”delaythetransactionsandcreatesinconvenienceanddissatisfactiontothecustomer.Thus,theobjective
Trang 16ofthisresearchistodevelopanintelligentdataminingbasedfrauddetectionsystemforsecureonlinepaymenttransactionsystem.Theperformanceevaluationoftheproposedmodelisdoneonrealcreditcarddatasetanditisfoundthattheproposedmodelhashighfrauddetectionrateandlessfalsealarmratethanotherstate-of-the-artclassifiers
Chapter 6
TheIntegralofSpatialDataMiningintheEraofBigData:AlgorithmsandApplications 90
Gebeyehu Belay Gebremeskel, Chongqing University, China
Chai Yi, Chongqing University, China
Zhongshi He, Chongqing University, China
DataMining(DM)isarapidlyexpandingfieldinmanydisciplines,anditisgreatlyinspiringtoanalyzemassivedatatypes,whichincludesgeospatial,imageandotherformsofdatasets.Suchthefastgrowthsofdatacharacterizedashighvolume,velocity,variety,variability,valueandothersthatcollectedandgeneratedfromvarioussourcesthataretoocomplexandbigtocapturing,storing,andanalyzingandchallengingtotraditionaltools.TheSDMis,therefore,theprocessofsearchinganddiscoveringvalu-ableinformationandknowledgeinlargevolumesofspatialdata,whichdrawsbasicprinciplesfromconceptsindatabases,machinelearning,statistics,patternrecognitionand‘soft’computing.UsingDMtechniquesenablesamoreefficientuseofthedatawarehouse.ItisthusbecominganemergingresearchfieldinGeosciencesbecauseoftheincreasingamountofdata,whichleadtonewpromisingapplica-tions.TheintegralSDMinwhichwefocusedinthischapteristheinferencetogeospatialandGISdata
Section 2 Social Media Analytics With Sentiment Analysis: Business Applications and Methods Chapter 7
SocialMediaasMirrorofSociety 128
Amir Manzoor, Bahria University, Pakistan
Overthelastdecade,socialmediausehasgainedmuchattentionofscholarlyresearchers.Onespecificreasonofthisinterestistheuseofsocialmediaforcommunication;atrendthatisgainingtremendouspopularity.Everysocialmediaplatformhasdevelopeditsownsetofapplicationprogramminginterface(API).ThroughtheseAPIs,thedataavailableonaparticularsocialmediaplatformcanbeaccessed.However,thedataavailableislimitedanditisdifficulttoascertainthepossibleconclusionsthatcanbedrawnaboutsocietyonthebasisofthisdata.Thischapterexploresthewayssocialresearchersandscientistscanusesocialmediadatatosupporttheirresearchandanalysis
Chapter 8
BusinessIntelligencethroughOpinionMining 142
T K Das, VIT University, India
Businessorganizationshavebeenadoptingdifferentstrategiestoimpressupontheircustomersandattractthemtowardstheirproductsandservices.Ontheotherhand,theopinionsofthecustomersgatheredthroughcustomerfeedbackshavebeenagreatsourceofinformationforcompaniestoevolvebusinessintelligencetorightlyplacetheirproductsandservicestomeettheever-changingcustomerrequirements.Inthiswork,wepresentanewapproachtointegratecustomers’opinionsintothetraditionaldatawarehousemodel.WehavetakenTwitterasthedatasourceforthisexperiment.First,wehavebuiltasystemwhichcanbe
Trang 17usedforopinionanalysisonaproductoraservice.Thesecondprocessistomodeltheopiniontablesoobtainedasadimensionaltableandtointegrateitwithacentraldatawarehouseschemasothatreportscanbegeneratedondemand.Furthermore,wehaveshownhowbusinessintelligencecanbeelicitedfromonlineproductreviewsbyusingcomputationalintelligencetechniquelikeroughsetbasedataanalysis
Chapter 9
SentimentAnalysis 162
A M Abirami, Thiagarajar College of Engineering, India
A Sheik Abdullah, Thiagarajar College of Engineering, India
A Askarunisa, KLN College of Information Technology, India
S Selvakumar, G K M College of Engineering and Technology, India
C Mahalakshmi, Thiagarajar College of Engineering, India
It requires sophisticated streaming of big data processing to process the billions of daily socialconversationsacrossmillionsofsources.Datasetneedsinformationextractionfromthemanditrequirescontextualsemanticsentimentmodelingtocapturetheintelligencethroughthecomplexityofonlinesocialdiscussions.SentimentanalysisisoneofthetechniquestocapturetheintelligencefromSocialNetworksbasedontheusergeneratedcontent.Therearemoreandmoreresearchesevolvingaboutsentimentclassification.Aspectextractionisthecoretaskinvolvedinaspectbasedsentimentanalysis.TheproposedmodelingusesLatentSemanticAnalysistechniqueforaspectextractionandevaluatessenti-scoresofvariousproductsunderstudy
Chapter 10
Aspect-BasedSentimentAnalysisofOnlineProductReviews 175
Vinod Kumar Mishra, Bipin Tripathi Kumaon Institute of Technology, India
Himanshu Tiruwa, Bipin Tripathi Kumaon Institute of Technology, India
Sentimentanalysisisapartofcomputationallinguisticsconcernedwithextractingsentimentandemotionfromtext.Itisalsoconsideredasataskofnaturallanguageprocessinganddatamining.Sentimentanalysismainlyconcentrateonidentifyingwhetheragiventextissubjectiveorobjectiveandifitissubjective,thenwhetheritisnegative,positiveorneutral.Thischapterprovideanoverviewofaspectbasedsentimentanalysiswithcurrentandfuturetrendofresearchonaspectbasedsentimentanalysis.ThischapteralsoprovideaaspectbasedsentimentanalysisofonlinecustomerreviewsofNokia6600.Toperformaspectbasedclassificationweareusinglexicalapproachoneclipseplatformwhichclassifythereviewasapositive,negativeorneutralonthebasisoffeaturesofproduct.TheSentiwordnetisusedasalexicalresourcetocalculatetheoverallsentimentscoreofeachsentence,postaggerisusedforpartofspeechtagging,frequencybasedmethodisusedforextractionoftheaspects/featuresandusednegationhandlingforimprovingtheaccuracyofthesystem
Chapter 11
SentimentAnalysiswithSocialMediaAnalytics,Methods,Process,andApplications 192
Karteek Ramalinga Ponnuru, BML Munjal University, India
Rashik Gupta, BML Munjal University, India
Shrawan Kumar Trivedi, BML Munjal University, India
Firmsareturningtheireyetowardssocialmediaanalyticstogettoknowwhatpeoplearereallytalkingabouttheirfirmortheirproduct.Withthehugeamountofbuzzbeingcreatedonlineaboutanythingand
Trang 18everythingsocialmediahasbecome‘the’platformofthedaytounderstandwhatpubliconawholearetalkingaboutaparticularproductandtheprocessofconvertingallthetalkingintovaluableinformationiscalledSentimentAnalysis.SentimentAnalysisisaprocessofidentifyingandcategorizingapieceoftextintopositiveornegativesoastounderstandthesentimentoftheusers.Thischapterwouldtakethereaderthroughbasicsentimentclassifierslikebuildingwordclouds,commonalityclouds,dendrogramsandcomparisoncloudstoadvancedalgorithmslikeKNearestNeighbour,NạveBiasedAlgorithmandSupportVectorMachine
Chapter 12
OrganizationalIssueforBISuccess:CriticalSuccessFactorsforBIImplementationswithintheEnterprise 209
Sanjiva Shankar Dubey, BIMTECH Greater Noida, India
Arunesh Sharan, AS Consulting, India
ThischapterwillfocusonthetransformativeeffectBusinessIntelligence(BI)bringstoanorganizationdecisionmaking,enhancingitsperformance,reducingoverallcostofoperationsandimprovingitscompetitiveposture.Thischapterwillenunciatethekeyprinciplesandpracticestobridgethegapbetweenorganizationrequirementsvs.capabilitiesofanyBItool(s)byproposingaframeworkoforganizationalfactorssuchasuser’srole,theiranalyticalneeds,accesspreferencesandtechnical/analyticalliteracyetc.EvaluationmethodologytoselectbestBItoolsproperlyalignedtotheorganizationinfrastructurewillalsobediscussed.SofterissuesandorganizationalchangeforsuccessfulimplementationofBIwillbefurtherexplained
Chapter 13
EthicsofSocialMediaResearch 225
Amir Manzoor, Bahria University, Pakistan
Overthelastdecade,socialmediaplatformshavebecomeaverypopularchannelofcommunication.Thispopularityhassparkedanincreasinginterestamongresearcherstoinvestigatethesocialmediacommunication. Many studies have been done that collected the publicly available social mediacommunicationdatatounearthsignificantpatterns.However,onesignificantconcernraisedoversuchpracticeistheprivacyoftheindividual’ssocialmediacommunicationdata.Assuchitisimportantthatspecificethicalguidelinesareinplaceforfutureresearchesonsocialmediasites.Thischapterexploresvariousethicalissuesrelatedtoresearchesrelatedtosocialnetworkingsites.Thechapteralsoprovidesasetofethicalguidelinesthatfutureresearchesonsocialmediasitescanusetoaddressvariousethicalissues
Section 3 Big Data Analytics: Its Methods and Applications Chapter 14
BigDataAnalyticsinHealthCare 240
Keerthi Suneetha, SVEC, India
Withthearrivaloftechnologyandrisingamountofdata(BigData)thereisaneedtowardsimplementationofeffectiveanalyticaltechniques(BigDataAnalytics)inhealthsectorwhichprovidesstakeholderswithnewinsightsthathavethepotentialtoadvancepersonalizedcaretoimprovepatientoutcomesandavoid
Trang 19unnecessarycosts.Thischaptercovershowtoevaluatethisbigvolumeofdataforunknownandusefulfacts,associations,patterns,trendswhichcangivebirthtonewlineofhandlingofdiseasesandprovidehighqualityhealthcareatlowercosttoallcitizens.ThischaptergivesawideinsightofintroductiontoBigDataAnalyticsinhealthdomain,processingstepsofBDA,ChallengesandFuturescopeofresearchinhealthcare
Chapter 15
MiningBigDataforMarketingIntelligence 250
Khadija Ali Vakeel, Indian Institute of Management Indore, India
Thischapterelaboratesonminingtechniquesusefulinbigdataanalysis.Specifically,itwillelaborateonhowtouseassociationrulemining,selforganizingmaps,wordcloud,sentimentextraction,networkanalysis,classification,andclusteringformarketingintelligence.Theapplicationofthesewouldbeondecisionsrelatedtomarketsegmentation,targetingandpositioning,trendanalysis,sales,stockmarketsandwordofmouth.Thechapterisdividedintwosectionsofdatacollectionandcleaningwhereweelaborateonhowtwitterdatacanbeextractedandminedformarketingdecisionmaking.Secondpartdiscussesvarioustechniquesthatcanbeusedinbigdataanalysisforminingcontentandinteractionnetwork
Chapter 16
PredictiveAnalysisforDigitalMarketingUsingBigData:BigDataforPredictiveAnalysis 259
Balamurugan Balusamy, VIT University, India
Priya Jha, VIT University, India
Tamizh Arasi, VIT University, India
Malathi Velu, VIT University, India
Bigdataanalyticsinrecentyearshaddevelopedlightningfastapplicationsthatdealwithpredictiveanalysisofhugevolumesofdataindomainsoffinance,health,weather,travel,marketingandmore.Businessanalyststaketheirdecisionsusingthestatisticalanalysisoftheavailabledatapulledinfromsocialmedia,usersurveys,blogsandinternetresources.Customersentimenthastobetakenintoaccountfordesigning,launchingandpricingaproducttobeinductedintothemarketandtheemotionsoftheconsumerschangesandisinfluencedbyseveraltangibleandintangiblefactors.ThepossibilityofusingBigdataanalyticstopresentdatainaquicklyviewableformatgivingdifferentperspectivesofthesamedataisappreciatedinthefieldoffinanceandhealth,wheretheadventofdecisionsupportsystemispossibleinallaspectsoftheirworking.Cognitivecomputingandartificialintelligencearemakingbigdataanalyticalalgorithmstothinkmoreontheirown,leadingtocomeoutwithBigdataagentswiththeirownfunctionalities
Chapter 17
StrategicBest-in-ClassPerformanceforVoicetoCustomer:IsBigDatainLogisticsaPerfect
Match? 284
Supriyo Roy, Birla Institute of Technology, India
Kaushik Kumar, Birla Institute of Technology, India
Foranyforward-lookingperspective,organizationalinformationwhichistypicallyhistorical,incompleteandmostofthetimeinaccurate,needstobeenrichedwithexternalinformation.However,traditionalsystemsandapproachesareslow,inflexibleandcannothandlenewvolumeandcomplexityofinformation.
Trang 20Bigdata,anevolvingterm,basicallyreferstovoluminousamountofstructured,semi-structuredorunstructuredinformationintheformofdatawithapotentialtobeminedfor‘bestinclassinformation’.Primarily,bigdatacanbecategorizedby3V’s:volume,varietyandvelocity.Recenthypearoundbigdataconceptspredictsthatitwillhelpcompaniestoimproveoperationsandmakesfasterandintelligentdecisions.Consideringthecomplexitiesinrealmsofsupplychain,inthisstudy,anattempthasbeenmadetohighlighttheproblemsinstoringdatainanybusiness,especiallyunderIndianscenariowherelogisticsarenaismostunstructuredandcomplicated.Conclusionmaybesignificanttoanystrategicdecisionmaker/managerworkingwithdistributionandlogistics
Section 4 Advanced Data Analytics: Decision Models and Business Applications
Chapter 18
FirstLookonWebMiningTechniquestoImproveBusinessIntelligenceofE-Commerce
Applications 298
G Sreedhar, Rashtriya Sanskrit Vidyapeetha (Deemed University), India
A Anandaraja Chari, Rayalaseema University, India
WebDataMiningistheapplicationofdataminingtechniquestoextractusefulknowledgefromwebdatalikecontentsofweb,hyperlinksofdocumentsandwebusagelogs.Thereisalsoastrongrequirementoftechniquestohelpinbusinessdecisionine-commerce.WebDataMiningcanbebroadlydividedintothreecategories:Webcontentmining,WebstructureminingandWebusagemining.Webcontentdataarecontentavailedtouserstosatisfytheirrequiredinformation.Webstructuredatarepresentslinkageandrelationshipofwebpagestoothers.Webusagedatainvolveslogdatacollectedbywebserverandapplicationserverwhichisthemainsourceofdata.ThegrowthofWWWandtechnologieshasmadebusinessfunctionstobeexecutedfastandeasier.Aslargeamountoftransactionsareperformedthroughe-commercesitesandthehugeamountofdataisstored,valuableknowledgecanbeobtainedbyapplyingtheWebMiningtechniques
Chapter 19
ArtificialIntelligenceinStochasticMultipleCriteriaDecisionMaking 315
Hanna Sawicka, Poznan University of Technology, Poland
Thischapterpresentstheconceptofstochasticmultiplecriteriadecisionmaking(MCDM)methodtosolvecomplexrankingdecisionproblems.Thisapproachiscomposedofthreemainareasofresearch,i.e.classicalMCDM,probabilitytheoryandclassificationmethod.Themostimportantstepsoftheideaarecharacterizedandspecificfeaturesoftheappliedmethodsarebrieflypresented.TheapplicationofElectreIIIcombinedwithprobabilitytheory,andPrometheeIIcombinedwithBayesclassifieraredescribedindetails.Twocasestudiesofstochasticmultiplecriteriadecisionmakingarepresented.Thefirstoneshowsthedistributionsystemofelectrotechnicalproducts,composedof24distributioncenters(DC),whilethecorebusinessofthesecondoneistheproductionandwarehousingofpharmaceuticalproducts.BasedontheapplicationofpresentedstochasticMCDMmethod,differentwaysofimprovementsofthesecomplexsystemsareproposedandthefinali.e.thebestpathsofchangesarerecommended
Trang 21Chapter 21
OnDevelopmentofaFuzzyStochasticProgrammingModelwithItsApplicationtoBusiness
Management 353
Animesh Biswas, University of Kalyani, India
Arnab Kumar De, Government College of Engineering and Textile Technology, India
Thischapterexpressesefficiencyoffuzzygoalprogrammingformultiobjectiveaggregateproductionplanning in fuzzy stochastic environment. The parameters of the objectives are taken as normallydistributedfuzzyrandomvariablesandthechanceconstraintsinvolvejointCauchydistributedfuzzyrandomvariables.Inmodelformulationprocessthefuzzychanceconstrainedprogrammingmodelisconvertedintoitsequivalentfuzzyprogrammingusingprobabilistictechnique,α-cutoffuzzynumbersandtakingexpectationofparametersoftheobjectives.Defuzzificationtechniqueoffuzzynumbersisusedtofindmultiobjectivelinearprogrammingmodel.Membershipfunctionofeachobjectiveisconstructeddependingontheiroptimalvalues.Afterwardsaweightedfuzzygoalprogrammingmodelisdevelopedtoachievethehighestdegreeofeachofthemembershipgoalstotheextentpossiblebyminimizinggroupregretsinamultiobjectivedecisionmakingcontext.Toexplorethepotentialityoftheproposedapproach,productionplanningofahealthdrinksmanufacturingcompanyhasbeenconsidered
Trang 22Compilation of References 397 About the Contributors 427 Index 436
Trang 23Preface
Thecompleteworkofthisbookisdividedintofoursections.Thefirstsectiontitled“BusinessIntelligencewithDataMining:ProcessandApplications”includesallthechaptersrelatedtobusinessAnalyticswithdatamininganditsapplications.Thesecondsectiontitled“SocialMediaAnalyticswithSentimentAnalysis:BusinessApplicationsandMethods”containsallthechaptersrelatedtosocialmediaanalyticstechniquesanditsapplicationsofbusinessintelligence.Inthethirdsectiontitled“BigDataAnalytics:ItsMethodsandApplications”coversallthechaptersrelatedtobigdataprocessesanditsapplications.Thelastsectionincludesthechaptersrelatedtoadvancedecisionmodelsforbusinessanalyticstitledas
“AdvanceDataAnalytics:DecisionModelsandBusinessApplications”.Thebriefdescriptionofeachsectionasfollows:
Thefirstsectionofthisbookis“BusinessIntelligenceWithDataMining:ProcessandApplications”wherethechaptersrelatedtodataminingmethodsanditsapplicationshavebeendiscussed.ThefirstchapterofthissectionauthoredbyA.SheikAbdullah,S.Selvakumar,andA.M.Abirami,explainsaboutdataanalyticswheretheyexplainDataanalyticsmainlydealswiththescienceofexaminingandinvestigatingrawdatatoderiveusefulpatternsandinference.Dataanalyticshasbeendeployedinmanyoftheindustriestomakedecisionsatproperlevels.Itfocusesupontheassumptionandevaluationofthemethodwiththeintentionofderivingaconclusionatvariouslevels.Varioustypesofdataanalyticaltechniquessuchaspredictiveanalytics,prescriptiveanalytics,descriptiveanalytics,textanalytics,andsocialmediaanalyticsareusedbyindustrialorganizations,educationalinstitutionsandbygovernmentassociations.Thiscontextmainlyfocusestowardstheillustrationofcontextualexamplesforvarioustypesofanalyticaltechniquesanditsapplications.Inthesecondchapter,HirakDasguptaaimstocomparetheapproachesandconcludethatstatisticiansanddataminerscanprofitbystudyingeachother’smethodsbyusingthecombinationofmethodsjudiciously.Thechapteralsoattemptstodiscussdatacleaningtechniquesinvolvedindatamining.ThethirdchapterofthissectionauthoredbyA.SheikAbdullah,R.Suganya,S.Selvakumar,andS.Rajaram,dealswithvarioustechniques,methodologiesthatcorrespondtotheclassificationproblemindataanalysisprocessanditsmethodologicalimpactstobigdata.ThefourthchapterwrittenbyRaghvendraKumar,PrasantKumarPattnaikandPriyankaPandey,presentedvariousprivacypreservingprotocolsthatareusedforsecurityinclustereddatabases.TheXln(X)pro-tocolandthesecuresumprotocolareusedinmutualcomputing,whichcandefendprivacyefficiently.Itsfocusesonthedatamodificationtechniques,whereithasbeenmodifiedourdistributeddatabaseandafterthatsandedthatmodifieddatasettotheclientadminforsecuredatacommunicationwithzeropercentageofdataleakageandalsoreducethecommunicationandcomputationcomplexity.ThefifthchapterofthissectionauthoredbyMasoumehZareapoor,PouryaShamsolmoaliandM.AfsharAlam,showstheperformanceofnewcreditcardfrauddetectiontechniquewhichisbasedon,firstlybalancingxxii
Trang 24thetransactionrecords,andthenappliestheproposedalgorithmtodetectthefraudulenttransactions.Attheend,weconductaseriesofexperimentstoevaluatetheeffectivenessofourproposedtechniques.
Inthe chapter sixauthoredbyBelayGebremeskel,YiChai,andZhongshiHe,incorporatestremendous
novelideasandmethodologiesastheintegralofspatialdatamining(SDM),whichishighlypertinentandserveasasingleinferencematerialforresearchers,experts,andotherusers
Thesecondsectionofthisbookis“SocialMediaAnalyticsWithSentimentAnalysis:BusinessApplicationsandMethods”wherethechaptersrelatedtosocialmediaanalyticsmethodsandrelatedapplicationshavebeendiscussed.InChapter7authoredbyAmirManzoor,exploresthewayssocialresearchersandscientistscanusesocialmediadatatosupporttheirresearchandanalysis.Chapter8writtenbyT.K.Das,presentsanewapproachtointegratecustomers’opinionsintothetraditionaldatawarehousemodel.HehastakenTwitterasthedatasourceforthisexperimentwhereatfirst,asystemwhichcanbeusedforopinionanalysisonaproductoraservicehasbeenbuilt.Thesecondprocessistomodeltheopiniontablesoobtainedasadimensionaltableandtointegrateitwithacentraldatawarehouseschemasothatreportscanbegeneratedondemand.Furthermore,hehasshownhowbusinessintelligencecanbeelicitedfromonlineproductreviewsbyusingcomputationalintelligencetechniquelikeroughsetbasedataanalysis.Chapter9authoredbyA.M.Abirami,A.SheikAbdullah,A.Aska-runisa,S.Selvakumar,andC.Mahalakshmiproposesamodelingtechniquethatuseslatentsemanticanalysis(LSA)techniqueforaspectextractionandevaluatessenti-scoresofvariousproductsunderstudy.InChapter10,VinodKumarMishra,andHimanshuTiruwaprovideanoverviewofaspectbasedsentimentanalysiswithcurrentandfuturetrendofresearchonaspectbasedsentimentanalysis.ThischapteralsoprovidesanaspectbasedsentimentanalysisofonlinecustomerreviewsofNokia6600.Toperformaspectbasedclassificationtheyareusinglexicalapproachoneclipseplatformwhichclassifythereviewasapositive,negativeorneutralonthebasisoffeaturesofproduct.Thesenti-wordnetisusedasalexicalresourcetocalculatetheoverallsentimentscoreofeachsentence,postaggerisusedforpartofspeechtagging,frequencybasedmethodisusedforextractionoftheaspects/featuresandusednegationhandlingforimprovingtheaccuracyofthesystem.Chapter11writtenbyPonnuruRamalingaKarteek,RashikGupta,andShrawanKumarTrivedi,takethereaderthroughbasicsentimentclassi-fierslikebuildingwordclouds,commonalityclouds,dendrogramsandcomparisoncloudstoadvancedalgorithmslikeKNearestNeighbour,NạveBiasedAlgorithmandSupportVectorMachine.InChapter12,SanjivaShankarDubeyandAruneshSharanenunciatethekeyprinciplesandpracticestobridgethegapbetweenorganizationrequirementsvs.capabilitiesofanyBItool(s)byproposingaframeworkoforganizationalfactorssuchasuser’srole,theiranalyticalneeds,accesspreferencesandtechnical/analyticalliteracyetc.Chapter13authoredbyAmirManzoorexploresvariousethicalissuesrelatedtoresearchesrelatedtosocialnetworkingsites.Thischapteralsoprovidesasetofethicalguidelinesthatfutureresearchesonsocialmediasitescanusetoaddressvariousethicalissues
Thethirdsectionofthisbookis“BigDataAnalytics:ItsMethodsandApplications”wherethechaptersrelatedtoBigdataanalyticsmethodsandtheirapplicationshavebeendiscussed.Inthissection,Chapter14,writtenbyK.Suneetha,covershowtoevaluatethisbigvolumeofdataforunknownandusefulfacts,associations,patterns,trendswhichcangivebirthtonewlineofhandlingofdiseasesandprovidehighqualityhealthcareatlowercosttoallcitizens.ThischaptergivesawideinsightofintroductiontoBigDataAnalyticsinhealthdomain,processingstepsofBDA,ChallengesandFuturescopeofresearchinhealthcare.Chapter15authoredbyKhadijaAliVakeelelaboratesonminingtechniquesusefulinbigdataanalysis.Specifically,itwillelaborateonhowtouseassociationrulemining,self-organizingmaps,wordcloud,sentimentextraction,networkanalysis,classification,andclusteringformarketing
xxiii
Trang 25intelligence.Theapplicationofthesewouldbeondecisionsrelatedtomarketsegmentation,targetingandpositioning,trendanalysis,sales,stockmarketsandwordofmouth.Thechapterisdividedintwosectionsofdatacollectionandcleaningwhereweelaborateonhowtwitterdatacanbeextractedandminedformarketingdecisionmaking.Secondpartdiscussesvarioustechniquesthatcanbeusedinbigdataanalysisforcontentandinteractionnetwork.InChapter16,BalamuruganBalusamy,PriyaJha,TamizhArasi,andMalathiVeludiscusstheBigdataanalyticsinrecentyearshaddevelopedlightningfastapplicationsthatdealwithpredictiveanalysisofhugevolumesofdataindomainsoffinance,health,weather,travel,marketingandmore.Businessanalyststaketheirdecisionsusingthestatisticalanalysisoftheavailabledatapulledinfromsocialmedia,usersurveys,blogsandinternetresources.Customersentimenthastobetakenintoaccountfordesigning,launchingandpricingaproducttobeinductedintothemarketandtheemotionsoftheconsumers’changesandisinfluencedbyseveraltangibleandintangiblefactors.Thepossibilityofusingbigdataanalyticstopresentdatainaquicklyviewableformatgivingdifferentperspectiveofthesamedataisappreciatedinthefieldoffinanceandhealth,wheretheadventofdecisionsupportsystemispossibleinallaspectsoftheirworking.Cognitivecomputingandartificialintelligencearemakingbigdataanalyticalalgorithmstothinkmoreontheirown,leadingtocomeoutwithbigdataagentswiththeirownfunctionalities.InChapter17,SupriyoRoyandKaushikKumar,exploretheusefulnessofapplyingbigdataconceptsintheseemergingareasoflogisticsareexploredwithdifferentdimensions.Conclusionofthispapermayseemtobesignificanttoanystrategicdecisionmaker/managerworkingwithspecificfieldofdistributionandlogistics
tions”wherethechaptersrelatedtoadvancedataanalyticstechniquesandtheirapplicationshavebeendiscussed.Chapter18,writtenbyG.SreedharandA.A.Chari,considerstheimportantelementofPageloadtimeofawebsiteforassessingtheperformanceofsomewell-knownonlineBusinesswebsitesthroughstatisticaltools.AlsothisresearchworkconsiderstheoptimumdesignaspectofBusinessweb-sitesleadingtoimprovementandbettermentofonlinebusinessprocess.Chapter19,writtenbyHannaSawicka,presentstheconceptofstochasticmultiplecriteriadecisionmaking(MCDM)methodtosolvecomplexrankingdecisionproblems.Thisapproachiscomposedofthreemainareasofresearch,i.e.classicalMCDM,probabilitytheoryandclassificationmethod.Themostimportantstepsoftheideaarecharacterizedandspecificfeaturesoftheappliedmethodsarebrieflypresented.TheapplicationofElectreIIIcombinedwithprobabilitytheory,andPrometheeIIcombinedwithBayesclassifieraredescribedindetails.Twocasestudiesofstochasticmultiplecriteriadecisionmakingarepresented.Thefirstoneshowsthedistributionsystemofelectro-technicalproducts,composedof24distributioncenters(DC),whilethecorebusinessofthesecondoneistheproductionandwarehousingofpharmaceuticalproducts.BasedontheapplicationofpresentedstochasticMCDMmethod,differentwaysofimprovementsofthesecomplexsystemsareproposedandthefinali.e.thebestpathsofchangesarerecommended.InChapter20,NitaH.Shahdiscussestheproblemthatanalyzesasupplychaincomprisedoftwofront-runnerretailersandonesupplier.Theretailers’offercustomersdelayinpaymentstosettletheaccountsagainstthepurchaseswhichisreceivedbythesupplier.Themarketdemandoftheretailerdependsontime,retailpriceandacreditperiodofferedtothecustomerswiththatoftheotherretailer.Thesuppliergivesitemswithsamewholesalepriceandcreditperiodtotheretailers.Thejointandindependentdeci-sionsareanalyzedandvalidatednumerically.Chapter21,writtenbyAnimeshBiswasandArnabKumarDe,expressesefficiencyoffuzzygoalprogrammingtechniqueformulti-objectiveaggregateproductionplanninginfuzzystochasticenvironment.Theparametersoftheobjectivesaretakenasnormallydistrib-utedfuzzyrandomvariablesandthechanceconstraintsinvolvejointCauchydistributedfuzzyrandom
Trang 26variables.Inmodelformulationprocessthefuzzychanceconstrainedprogrammingmodelisconvertedintoitsequivalentfuzzyprogrammingformusingtheconceptsofprobabilistictechnique,α-cutoffuzzynumbersandtakingexpectationofparametersoftheobjectives.De-fuzzificationtechniqueoffuzzynumbersisusedtofindmulti-objectivelinearprogrammingmodel.Membershipfunctionofeachobjec-tiveisconstructeddependingontheiroptimalvalues.Afterwardsaweightedfuzzygoalprogrammingmodelisdevelopedtoachievethehighestdegreeofeachofthemembershipgoalstotheextentpossiblebyminimizinggroupregretsinamulti-objectivedecisionmakingcontext.Toexplorethepotentialityoftheproposedapproach,productionplanningofahealthdrinksmanufacturingcompanyhasbeenconsidered.Chapter22,writtenbyTimmarajuSrimanyu,VadlamaniRavi,andG.R.Gangadharan,focusesonrecommendationofcloudservicesbyrankingthemwiththehelpofopinionminingofusers’reviewsandmulti-attributedecisionmakingmodels(TOPSISandFMADMwereappliedseparately)intandemonbothquantitativeandqualitativedata.Surprisingly,bothTOPSISandFMADMyieldedthesamerankingsforthecloudservices
xxv
Trang 27Acknowledgment
cifically,totheauthorsandreviewersthattookpartinthereviewprocess.Withouttheirsupport,thisbookwouldnothavebecomeareality
Theeditorswouldliketoacknowledgethehelpofallthepeopleinvolvedinthisprojectand,morespe-Wewouldliketothankeachoneoftheauthorsfortheircontributions.Theeditorswishtoacknowledgethevaluablecontributionsofthereviewersregardingtheimprovementofquality,coherence,andcontentpresentationofchapters.Mostoftheauthorsalsoservedasreferees;wehighlyappreciatetheirdoubletask.WearegratefultoallmembersofIGIpublishinghousefortheirassistanceandtimelymotivationinproducingthisvolume
WehopethereaderswillshareourexcitementwiththisimportantscientificcontributionthebodyofknowledgeaboutvariousapplicationsofHandbookofResearchonAdvancedDataMiningTechniquesandApplicationsforBusinessIntelligence
Shrawan Kumar Trivedi
BML Munjal University, India
Shubhamoy Dey
Indian Institute of Management Indore, India
Anil Kumar
BML Munjal University, India
Tapan Kumar Panda
Jindal Global Business School, India
xxvi
Trang 29a conclusion at various levels Various types of data analytical techniques such as predictive ics, prescriptive analytics, descriptive analytics, text analytics, and social media analytics are used by industrial organizations, educational institutions and by government associations This context mainly focuses towards the illustration of contextual examples for various types of analytical techniques and its applications.
analyt-INTRODUCTION: DATA ANALYTICS
Data analytics is the knowledge of investigating raw data with the intention of deriving solution for a specified problem analysis Nowadays analytics has been used by many corporate, industries and institu-tions for making exact decision at various levels The mechanism of drawing solutions during analysis
of large datasets with the intention of determining hidden patterns and its relationship Analytics differs from mining with the mechanism of determining the new patterns, scope, techniques and its purpose
An Introduction to Data Analytics:
Its Types and Its Applications
Trang 30An Introduction to Data Analytics
ANALYTICS PROCESS MODEL
The Mechanism of analytics has been used variantly with machine learning, data science and knowledge discovery The process model initially starts with the data source which is in raw form of representation The data needed for analysis has to be selected with accordance to the problem need for data interpreta-tion The identified data may contain various missing fields, irrelevant data items This has to be resolved and cleaned Then the data has to be transformed accordingly to the necessary format for evaluation and this can be made by the data standardization techniques such as min-max normalization, Z-score normalization and normalization by decimal scaling As an outcome the final evaluated pattern provides the visualized data representation of the data which can be fed up for evaluation and interpretation The workflow of the process model is depicted in Figure 1
ANALYTIC REQUIREMENTS
The Analytical model should actually solve the chosen problem in which it has to be developed In order
to achieve or to solve the defined problem it should be properly defined The model to be developed must have predictive capabilities in order to determine the patterns and interpretations from the observed data Then the model should resemble an interpretable power and it should be justifiable in nature Even though the model is to be interpretable it should adhere to its statistical performance The efficiency in collect-ing the data, processing, analyzing it also plays a role in the requirement of the analytical requirements
Figure 1 Analytics process model
Trang 31An Introduction to Data Analytics
TYPES OF ANALYTICS
Predictive Analytics
Predictive analytics mainly deals with the mechanism of predicting or observing the target value of measure The value of measure signifies the performance of the analytical model which is being devel-oped There by the nature of the developed model can be ascertained with the measured value Hence the term predictive analytics if often said to be supervised learning because the target variable will be known in prior with accordance to the definition of the tuple of record (T Hastie, R Tibshirani, & Friedman, 2001) There are various sorts of algorithms used in predicting the nature of a data or a real world problem, such as:
6 Ensemble methods such as boosting and bagging
Let us discuss about one of the techniques in predictive analytics such as linear Regression
Linear Regression
The working of simple linear regression involves a response variable and a predictor variable In simple straight line linear regression, it involves a single predictor variable but in case of logistic regression it involves more than one predictor variable (Jiawei Han, Micheline Kamber & Jian Pei, 2011) The straight line regression is represented in the Equation 1 as follows:
The co-efficient W0 and W1 are referred to as the weights of the predictive function Consider let D
be the dataset which contains the values for predictor and response variable X and Y, which is sented of the form:
x is the mean value of x1, x2……… xD
y is the mean value of y1, y2……… yD
Trang 32An Introduction to Data Analytics
From the determined value of W1 the value of W0 can be obtained There by if any predictor variable has to be identified its response value can be determined
Descriptive Analytics
Descriptive analytics mainly deals with the intention of describing the patterns of a customer behavior
In predictive analytics the label (target) which will be known in advance but in descriptive analytics there will be no such target measure or a target variable (Srikant R & Agarwal R, 1995) This technique is also referred to as unsupervised learning because the target variable is not known to influence the learn-ing phenomenon (Bart Baesens, 2014) There are about various techniques that deals with descriptive analytics such as:
1 Association rule mining
2 Sequence rule mining
3 Data clustering
Let us discuss about one of the techniques in Descriptive analytics such as sequence rule mining
Sequence Rule Mining
The mechanism of sequence rule mining is to determine the maximum sequences among the set of all sequence that has been determined from the given transactional data It must possess a certain degree of support and confidence level Considering the market based analysis of a transactional data the number
of maximal sequences determined for the item set signifies the frequency level of that sequence among all the items Consider the following example of transactional data which contains the sequence of the items purchased, the session time of purchase, and the items as depicted in Table 1
Table 1 Sequence rule for a transactional dataset
Trang 33An Introduction to Data Analytics
Table 1 can be represented in the form of sequence rule as follows:
ap-Text Analytics
Text analysis is also known as text mining Text analysis involves in extracting structured information from the collection of documents Generally, it is performed using Natural Language Processing (NLP) techniques It includes sub processes like pre-processing, Part-of-Speech (PoS) tagging, feature extrac-tion, classification and so on
Text pre-processing includes removal of stop words, stemming, etc Usually the words like ‘a’ an’, the’ do not convey any meaning The words which are of type articles, prepositions, and pronouns are removed Stemming is the process of removing derived words to their word stem, base or root form For example, the words ‘construction’, ‘constructing’, are reduced to ‘construct’ There are many stemming algorithms available The Porter Stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and in flexional endings from words in English Its main use is as part of
a term normalization process that is usually done when setting up Information Retrieval systems
In PoS tagging, the sentences in the data set collection are tokenized using the PoS tagger During this process, a part of speech such as noun, verb, adverb, adjective, conjunctions, negations and the like are assigned to every word in the sentences For example, “Place is very good” is tagged as shown in Table 2.There are many free PoS taggers available Python library functions, Stanford NLP tools are readily available to do PoS tagging
Table 2 PoS tagging of sentence
Trang 34Latent Semantic Indexing (LSI)
Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique
called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text LSI understands the patterns among words
in an intelligent way It considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant This enables the document classification more or less similar to human being action
Latent Dirichlet Allocation
Latent Dirichlet allocation is a way of automatically discovering topics from the sentences It is used for topic modeling or feature extraction in text documents It is the best technique used for document classification
Social Media Analytics
Writing in social media to express one’s views becomes common now-a-days The enormous growth
of social media through online reviews, discussions, blogs, micro-blogs, twitter, etc on the Web, enable individuals and organizations to make decisions using these content In the same way, there has been
an increase in attention on social media as a source of research data in areas such as decision making, recommender systems, etc The power of social media is described by (Jagadeesh Kumar, 2014) and it can be effectively used to influence public opinion or research behavior Social media analysis can be defined as the analysis of user generated content written in the common or public discussion forums, blogs and other social media networking sites to make or improve business decisions
Social media has its foot print in almost all types of industries like tourism, healthcare, and so on (Ting, 2014) analyzed blogs and showed the necessity of travel blogs for sharing of experience among people (Zeng, 2014) suggested and demonstrated the impact of social media analytics to the economic contribution of tourism industry and thereby to the country His work also justified that the user generated content in social media web sites are perceived as recommendations from like-minded friends mostly
by the younger generations of this century (Jacobsen, 2014) made a study on Mallorca (Spain) tourism using the destination-specific surveys and showed how visual content and types of content creators make differences in the holiday decision-making
Social media analysis adds value to improve health care through patient engagement by increasing access to information and ability to receive the information in real-time It provides opportunities for
Trang 35An Introduction to Data Analytics
patients to share their experiences with others It enables health care industry to improve their service quality
Much of social media text research has been undertaken in marketing and retail sector to improve customer satisfaction, recommend new products and so on Social media, as a data source, contains valu-able consumer insights and enable business intelligence Social media text analysis helps the business
to take marketing decisions based on the most discussed topics in the social media It also enables them drill into the data to see what is causing the dissatisfaction among users It provides multi-dimensional insight of a brand and its features, promotions, shoppers, consumers, and influencers It delivers trend analysis, behavior tracking, and overall understanding
The sentiment analysis framework identifies the key discussion concepts from the user reviews or comments Feature based or aspect level sentiment analysis is to classify sentiment with respect to specific aspects of the entities For example, the sentence “The camera’s picture quality is good, but its battery life is short” evaluates two aspects, picture quality and battery life, of camera (entity) The sentiment on camera’s picture quality is positive, but the sentiment on its battery life is negative The picture quality and battery life of camera are the opinion targets The user generated data provides a higher degree of accuracy about what exactly the user feels about a particular place Feature based analysis recommends the feature of a particular product whether it is positive or negative This information can be used to improve business outcomes and ensure a very high level of user satisfaction
Survival Analytics
Survival analytics is one of the categories of statistics which mainly deals with the happening of an event with respect to time It provides the justification of the reliability of the event in accordance with time For example, predicting the behavior of market based analysis, web catalog visits by customer and so
on The techniques that deals with the survival analysis measurements are;
to that of the results in rapid miner tool
In accordance with the weather dataset with numeric attributes as shown in Table 3, the attribute temperature and humidity are numerical Therefore, an efficient classification algorithm must have the capability to be dealt with numerical data rather than categorical The decision tree classification algorithm resolves this problem by making binary split among the range of the attribute values Let us consider the attribute humidity the values of the attributes are sorted in ascending order as in Table 4.Discretization among the numeric attribute values involves the partitioning of the values by adopting the strategy of breakpoints i.e halfway between the either side of the data values by ensuring the split is made in accordance with the majority of the class values on one side and remaining in the other Therefore, when applying this in the above values we get 9 sort of breakpoints between them such as in Table 5
Trang 36An Introduction to Data Analytics
The values obtained by considering the halfway split among the values are 67.5, 72.5, 82.5, 85.5,
88, 90.5, and 95.5 While considering the halfway split if the instances with same values fall into the different class label then the split at those points cannot be considered as in Table 6
In this partition, if the preceding class values is of same then there occurs no problem in merging those partitions which belongs to same classes as in Table 7
If the adjacent partition consists of the same sort of the majority of a particular class label then they can be merged together without affecting the rule Therefore, the final discretization as in Table 8 is:The split value for the attribute Humidity is:
Humidity: ≤ 82.5 (Yes)
> 82.5 and ≤ 95.5 (No)
> 95.5 (Yes)
Table 3 Dataset description
Table 5 Ordering level 2
Trang 37An Introduction to Data Analytics
The split at the point 95.5 makes the partition to fall most of the labels to fall in one split and only one yes tuple in the other split thereby it won’t be considered to be as a binary split while adapting the halfway binary split among the class labels Hence the split value 82.5 is considered to be as the break-point among the class labels of yes and no tuples
The attribute temperature is also found to be numeric, therefore while adopting the same procedure
we get the following result as in Table 9
While choosing breakpoints i.e halfway binary split the values are found to be 64.5, 66.5, 70.5, 72, 77.5, 80.5 and 84 Here, in this partition if the first and the second split are removed then the majority of the class label is found to be Yes therefore the split at those point can be removed Accordingly, if there are number of occurrences of the same values of the labels then the split at those points can be removed without causing any problems for errors The resultant partition is shown in Table 10 as follows,
If the adjacent partitions in the split seems to have same sort of majority in their class label values, then they can be merged together Hence the resultant partition is shown in Table 11 as follows:
At this point, the split value for the attribute temperature is found to be 77.5
i.e ≤ 77.5 (Yes)
> 77.5 (No)
Table 6 Ordering level 3
Table 7 Ordering level 4
Table 8 Ordering level 5
Table 9 Ordering level 6
Table 11 Ordering level 8
Table 10 Ordering level 7
Trang 38An Introduction to Data Analytics
The resultant values produced by the splitting criterion method with numerical dataset are similar to that of the categorical attribute values Hence for any sort of real world problem decision tree classification algorithm handles both categorical and numerical attributes in a similar way of generating decision trees
An Example: Weather Dataset
Importing a dataset in rapid miner can be made in variety of formats such as csv, excel, xml, access, database, arff, xrff, spss, sparse, Dasylab, Url etc The format that we are importing the dataset is of csv format as described in Figure 2 The dataset contains four attributes and a class label play The class label is of binominal which contains nine yes’s and five no’s tuple
Open a new project in rapid miner as depicted in Figure 3 and then import read csv operator as in Figure 4 Each of the process that we are creating contains one of the operators of this type hence referred
to be as the root operator This operator provides a set of parameters that are of global relevance to the process like initialization of parameters
After loading the csv file as in Figure 5 i.e., the weather dataset from the dataset folder which contains the weather dataset in csv format and select the type for the attribute that has been chosen for classifica-tion and select the class label option for the attribute that has to be fixed as label as in Figure 6 after all these steps are complete then click finish to end the import configuration wizard
After the importing wizard is complete then click and drag the decision tree operator as in Figure 7 this operator learns decision tree for both of the numerical and categorical data Decision tree classifica-tion method is considered to be one of the best classification techniques which can be easily understood Each node in the decision tree is labeled with an attribute and the outcomes of the attribute is mapped
to the next attribute with maximal gain values The evaluation of the tree stops when the leaf node has been reached
Figure 2 Dataset in csv format
Trang 39An Introduction to Data Analytics
From the tree as in Figure 8 we can observe that the root node is selected to be outlook because it has the maximal gain value among all the attributes that are in accordance with the weather dataset Hence from the distributions made by the attribute outlook the prediction has been made and the next attribute with maximal gain is chosen for the next level of classification and the final destination is reached when the maximal depth is reached
Figure 3 Creating a new project
Figure 4 Importing csv operator
Trang 40An Introduction to Data Analytics
FUTURE SCOPE AND DIRECTIONS
This chapter mainly focuses towards methods available for analytics with an example illustration Each type
of analytics has its applications towards various fields mapping to real world scenarios The future scope and enhancement can be made with accordance to the applicability to solve use cases concerned with the type
of analytical requirement across various domains Meanwhile, the nature of the environment, the type of data plays a significant role in the way of processing analytical procedures However, each type of analytics has broad variance in analysis and determination of results Proper and suitable analytical technique has to
be selected and adhered for the type of the data and the environment that has been chosen
Figure 5 Loading data into the operator
Figure 6 Selecting the label attribute