1. Trang chủ
  2. » Thể loại khác

Data analytics for intelligent healthcare management academic press (2019)

298 38 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Data Analytics for Intelligent Healthcare Management
Tác giả Nilanjan Dey, Amira S. Ashour, Simon James Fong, Himansu Das, Bighnaraj Naik, Himansu Sekhar Behera
Trường học Techno India College of Technology, Rajarhat
Chuyên ngành Healthcare Management
Thể loại book
Năm xuất bản 2019
Thành phố London
Định dạng
Số trang 298
Dung lượng 16,9 MB
File đính kèm 25. Data Analytics for Intelligent.rar (13 MB)

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

describes the intelligent health recom-mendation system HRS that provides an insight into the use of big data analytics for implementing aneffective health recommendation engine and show

Trang 2

Intelligent Healthcare

Management

Trang 3

Big Data Analytics for Intelligent Healthcare

Management

Volume Three Series Editors Nilanjan Dey Amira S Ashour Simon James Fong

Volume Editors Nilanjan DeyTechno India College of Technology, Rajarhat, India

Himansu DasKIIT, Bhubaneswar, IndiaBighnaraj NaikVSSUT, Burla, IndiaHimansu Sekhar Behera

VSSUT, Burla, India

Trang 4

The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

# 2019 Elsevier Inc All rights reserved

No part of this publication may be reproduced or transmitted in any form or by any means, electronic ormechanical, including photocopying, recording, or any information storage and retrieval system, withoutpermission in writing from the publisher Details on how to seek permission, further information about thePublisher’s permissions policies and our arrangements with organizations such as the Copyright ClearanceCenter and the Copyright Licensing Agency, can be found at our website:www.elsevier.com/permissions.This book and the individual contributions contained in it are protected under copyright by the Publisher(other than as may be noted herein)

Notices

Knowledge and best practice in this field are constantly changing As new research and experience broadenour understanding, changes in research methods, professional practices, or medical treatment may becomenecessary

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and usingany information, methods, compounds, or experiments described herein In using such information or methodsthey should be mindful of their own safety and the safety of others, including parties for whom they have aprofessional responsibility

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liabilityfor any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise,

or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

ISBN: 978-0-12-818146-1

For information on all Academic Press publications

visit our website athttps://www.elsevier.com/books-and-journals

Publisher: Mara Conner

Acquisition Editor: Chris Katsaropoulos

Editorial Project Manager: Ana Claudia A Garcia

Production Project Manager: Punithavathy Govindaradjane

Cover Designer: Christian Bilbow

Typeset by SPi Global, India

Trang 5

Satyabrata Aich

Department of Computer Engineering, Inje University, Gimhae, South Korea

Navneet Arora

Indian Institute of Technology, Roorkee, India

Rabindra Kumar Barik

KIIT, Bhubaneswar, India

Akalabya Bissoyi

Department of Biomedical Engineering, National Institute of Technology, Raipur, India

Dibya Jyoti Bora

School of Computing Sciences, Kaziranga University, Jorhat, India

KIIT, Bhubaneswar, India

Satya Ranjan Dash

School of Computer Applications, Kalinga Institute of Industrial Technology, Bhubaneswar, IndiaPandit Byomakesha Dash

Department of Computer Application, Veer Surendra Sai University of Technology, Burla, IndiaSukhpal Singh Gill

Cloud Computing and Distributed Systems (CLOUDS) Laboratory, School of Computing andInformation Systems, The University of Melbourne, Parkville, VIC, Australia

Trang 6

Hee-Cheol Kim

Institute of Digital Anti-Aging Healthcare, Inje University, Gimhae, South Korea

Pradeep Kumar Maharana

Department of Physics, Silicon Institute of Technology, Bhubaneswar, India

Sitikantha Mallik

KIIT, Bhubaneswar, India

Sushma Rani Martha

Orissa University of Agriculture and Technology, Bhubaneswar, India

Bhabani Shankar Prasad Mishra

KIIT, Bhubaneswar, India

Md Nuruddin Qaisar Bhuiyan

Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh

Md Mehedi Hassan Onik

Department of Computer Engineering, Inje University, Gimhae, South Korea

Trang 7

Farhin Haque Proma

Department of Computer Science and Engineering, East West University, Dhaka, BangladeshRohit Rastogi

ABES Engineering College, Ghaziabad, India

Shamim H Ripon

Department of Computer Science and Engineering, East West University, Dhaka, BangladeshAbhaya Kumar Sahoo

KIIT, Bhubaneswar, India

Satya Narayan Sahu

Orissa University of Agriculture and Technology, Bhubaneswar, India

Trang 8

Nowadays, the biggest technological challenge in big data is to provide a mechanism for storage,manipulation, and retrieval of information on large amounts of data In this context, the healthcare in-dustry is also being challenged with difficulties in capturing data, storing data, analyzing data, and datavisualization Due to the rapid growth of the large volume of information generated on a daily basis, theuse of existing infrastructure has become impracticable to handle this issue So, it is essential to developbetter intelligent techniques, skills, and tools to automatically deal with patient data and its inherentinsights Intelligent healthcare management technologies can play an effective role in tackling thischallenge and change the future for improving our lives Therefore, there are increasing interests inexploring and unlocking the value of the massively available data within the healthcare domain.Healthcare organizations also need to continuously discover useful and actionable knowledge and gaininsight from raw data for various purposes such as saving lives, reducing medical errors, increasingefficiency, reducing costs, and improving patient outcome Thus, data analytics in intelligent health-care management brings a great challenge and also plays an important role in intelligent healthcaremanagement systems.

In the last decade, huge advances in the large scale of data due to the smart devices has led to thedevelopment of various intelligent technologies These smart devices continuously produce very largeamounts of structured and unstructured data in healthcare, which is difficult to manage in real lifescenarios Big data analytics generally use statistical and machine learning techniques to analyze hugeamounts of data These high dimensional data with multiobjective problems in healthcare is an openissue in big data Healthcare data is rapidly growing in volume and multidimensional data Heteroge-neous healthcare data in various forms such as text, images, video, etc., are required to be effectivelystored, processed, and analyzed to avoid the increasing cost of healthcare and medical errors This rapidexpansion of data leads to urgent development of intelligent healthcare management systems foranalysis

The main objective of this edited book is to cover both the theory and applications of hardwareplatforms and architectures, development of software methods, techniques and tools, applicationsand governance, and adoption strategies for the use of big data in healthcare and clinical research

It aims to provide an intellectual forum for researchers in academia, scientists, and engineers from

a wide range of applications to present their latest research findings in this area and to identify futurechallenges in this fledging research area

To achieve the objectives, this book includes eleven chapters, contributed to by promising authors

InChapter 1, Gill et al highlighted a broad methodical literature analysis of bio-inspired algorithmsfor big data analytics This chapter will also help in choosing the most appropriate bio-inspiredalgorithm for big data analytics in a specific type of data along with promising directions for futureresearch InChapter 2, the author’s objective is to examine the potential impact of immense data chal-lenges, open research issues, and distinctive instrument identification in big data analytics InChapter 3, the author includes every possible terminology related to the idea of big data, healthcaredata, and the architectural context for big data analytics, different tools, and platforms are discussed

in details

xvii

Trang 9

Chapter 4addresses a machine learning model to automate the classification of benign and nant tissue image InChapter 5, the author describes the use of multimedia and IoT to detect TTH and toanalyze the chronicity It also includes the concept of big data for the storage and processing the data,which will be generated while analyzing the TTH stress through the Internet of Things (IoT).Chapter 6discusses how to train a fMRI dataset with different machine learning algorithms such as LogisticRegression and Support Vector Machine towards the enhancement of the precision of classification.

malig-InChapter 7, the authors developed a prototype model for healthcare monitoring systems use theIoT and cloud computing These technologies allow for monitoring and analyzing of various healthparameters in real time InChapter 8, Onik et al includes an overview, architecture, existing issues,and future scope of blockchain technology for successfully handling privacy and management ofcurrent and future medical records InChapter 9, Sahoo et al describes the intelligent health recom-mendation system (HRS) that provides an insight into the use of big data analytics for implementing aneffective health recommendation engine and shows a path of how to transform the healthcare industryfrom the traditional scenario to a more personalized paradigm in a tele-health environment.Chapter 10discussed the interactions between drugs and proteins that was carried out by means of moleculardocking process.Chapter 11integrates the kidney inspired optimization and fuzzy c-means algorithm

to solve nonlinear problems of data mining

Topics presented in each chapter of this book are unique to this book and are based on unpublishedwork of contributing authors In editing this book, we attempted to bring into the discussion all the newtrends and experiments that have been performed in intelligent healthcare management systems usingbig data analytics We believe this book is ready to serve as a reference for a larger audience such assystem architects, practitioners, developers, and researchers

Nilanjan DeyTechno India College of Technology, Rajarhat, India

Himansu DasKIIT, Bhubaneswar, India

Bighnaraj NaikVSSUT, Burla, IndiaHimansu Sekhar BeheraVSSUT, Burla, India

Trang 10

Completing this edited book successfully was similar to a journey that we had undertaken for severalmonths We would like to take the opportunity to express our gratitude to the following people First ofall, we wish to express our heartfelt gratitude to our families, friends, colleagues, and well-wishers fortheir constant support and cooperation throughout this journey We also express our gratitude to all thechapter contributors, who allowed us to quote their work in this book In particular, we would like toacknowledge the hard work of authors and their cooperation during the revisions of their chapters Weindebted to and grateful for the valuable comments of the reviewers that have enabled us to selectthese chapters out of the many chapters and also improve the quality of the chapters We are gratefulfor the help that was extended from the Elsevier publisher team for their continuous support throughoutthe entire process of publication.

xix

Trang 11

BIO-INSPIRED ALGORITHMS FOR

BIG DATA ANALYTICS: A SURVEY,

TAXONOMY, AND OPEN

CHALLENGES

1

Sukhpal Singh Gill, Rajkumar Buyya

Cloud Computing and Distributed Systems (CLOUDS) Laboratory, School of Computing and Information Systems,

The University of Melbourne, Parkville, VIC, Australia

Cloud computing is now the spine of the modern economy, which offers on-demand services to cloudcustomers through the Internet To improve the performance and effectiveness of cloud computing sys-tems, new technologies, such as internet of things (IoT) applications (healthcare services, smart citiesetc.) and big data, are emerging, which further requires effective data processing to process data[1].However, there are two problems in existing big data processing approaches, which degrade the per-formance of computing systems such as large response time and delay due to data being transferredtwice[2]: (1) computing systems to cloud and (2) cloud to IoT applications Presently, IoT devicescollect data with a huge amount of volume (big data) and variety and these systems are growing withthe velocity of 500 MB/seconds or more[3]

For IoT based smart cities, the transfer of data is used to make effective decisions for big data alytics Data is stored and processed on cloud servers after collection and aggregation of data fromsmart devices on IoT networks Further, to process the large volume of data, there is a need for auto-matic highly scalable cloud technology, which can further improve the performance of the systems[4].Literature reported that existing cloud-based data processing systems are not able to satisfy the perfor-mance requirements of IoT applications when a low response time and latency is needed Moreover,other reasons for a large response time and latency are: geographical distribution of data and commu-nication failures during transfer of data[5] Cloud computing systems become bottlenecked due to con-tinually receiving raw data from IoT devices[6] Therefore, a bio-inspired algorithm based big dataanalytics is an alternative paradigm that provides a platform between computing systems and IoT de-vices to process user data in an efficient manner[7]

an-Big Data Analytics for Intelligent Healthcare Management https://doi.org/10.1016/B978-0-12-818146-1.00001-5

Trang 12

1.1.1 DIMENSIONS OF DATA MANAGEMENT

As identified from existing literature[1–6], there are five kinds of dimensions of data, which are quired for effective management.Fig 1.1shows the dimensions of data management for big data an-alytics: (1) volume, (2) variety, (3) velocity, (4) veracity, and (5) variability

re-TheVolume represents the magnitude of data in terms of data sizes (terabytes or petabytes) Forexample, Facebook processes a large amount of data such as millions of photographs and videos.Va-riety refers to heterogeneity in a dataset, which can be different types of data.Fig 1.2shows the variety

of data, which can be text, audio, video, social, transactional, operational, cloud service, or machine tomachine data (M2M data)

Velocity refers to the rate of production of data and analysis for processing a huge amount of data.For example, velocity can be 250 MB/minute or more[3].Veracity refers to abnormality, noise, andbiases in data, whilevariability refers to the change in the rate of flow of data for generation andanalysis

The rest of the chapter is organized as follows InSection 1.2, we present the big data analyticalmodel InSection 1.3, we propose the taxonomy of bio-inspired algorithms for big data analytics InSection 1.4, we analyze research gaps and present some promising directions toward future research inthis area Finally, we summarize the findings and conclude the chapter inSection 1.5

Big data analytics is a term, which is a combination of “big data” and “deep analysis” as shown inFig 1.3 Every minute, a large amount of user data is being transferred from one device to anotherdevice, which needs high processing power to perform data mining for the extraction of useful infor-mation from the database.Fig 1.3shows the model for big data analytics, which shows that an OLTP(on-line transaction processing) system creates data (txn data) A data cube represents a big data, out of

Dimensions of data management

Trang 13

which required information can be extracted using data mining Initially, different types of data comefrom different users or devices and the process of data cleansing is performed to remove the irrelevantdata and stores the clean data in the database[8] Further, data aggregation is performed to store the data

in an efficient manner because incoming data contains a variety of data and a report is generated foreasy use in future The aggregated data is further stored in data cubes using large storage devices Fordeep analysis, feature extraction is performed using data sampling, which generates the required type ofdata The deep analysis includes data visualization, model learning (e.g., K-nearest-neighbor, Linearregression), and model evaluation[9]

Fig 1.4 shows the process of big data, which has two main components: data managementand analytics There are five different stages in processing big data: (1) acquisition and recording

Report generator

Data

clustering

Dimension aggregation

Clean data

Data sampling

Feature extraction

Training data

Data partition

Draft model

Model generation

Model validation

Model

Testing data

Training data

Data cubes

Report Txn

data

FIG 1.3

Big data analytical model

Big data process

Analytics Data management

Acquisition and

recording

Extraction and cleaning

Integration and aggregation

Modeling and analysis

Data interpretationFIG 1.4

Big data process

Trang 14

(to store data), (2) extraction and cleaning (cleansing of data), (3) integration and aggregation(compiling of required data), (4) modeling and analysis (study of data), and (5) data interpretation(represent data in required form).

This section presents the existing literature of inspired algorithms for big data analytics The inspired algorithms for big data analytics are categorized into three categories: ecological, swarm-based, and evolutionary.Fig 1.5shows the taxonomy of bio-inspired algorithms for big data analyticsalong with focus of study (FoS)

Kune et al.[10]proposed a genetic algorithm (GA) based data-aware family scheduling approach foranalytics of big data, which focuses on bandwidth utilization, computational resources, and data de-pendencies Moreover, the GA algorithm decoupled data and computational services are provided

as cloud services The results demonstrate that the GA algorithm gives effective results in terms ofturnaround time because the GA algorithm processes data using parallel processing Gandomi et al.[11]proposed a multiobjective genetic programming (GP) algorithm-based approach for big data min-ing, which is used to develop the concrete creep model to provide unbiased and accurate predictions.The GP model works with high and normal strength Elsayed and Sarker[12]proposed a differentialevolution (DE) algorithm-based big data analytics approach, which uses local search to increase theexploitation capability of the DE algorithm This approach optimizes the big data 2015 benchmarkproblems with both multi- and single-objective problems but it exhibits large computational time Ka-shan et al.[13]proposed an evolutionary strategy (ES) algorithm-based big data analytics technique,which processes data efficiently and accurately using parallel scheduling of cloud resources Further,the ES algorithm minimizes the execution time by partitioning a group of jobs into disjointed sets, inwhich the same resources execute all the jobs in the same set

Mafarja and Mirjalili[14]proposed a simulated annealing (SA) algorithm-based big data zation technique, which uses the whale optimization algorithm (WOA) to architect various feature se-lection approaches to reduce the manipulation by probing the most capable regions The proposedapproach helps to improve the classification accuracy and selects the most useful features for catego-rization tasks Further, Barbu et al.[15]proposed an SA algorithm-based feature selection (SAFS) tech-nique for big data learning and computer vision Based on a criterion, the SAFS algorithm removesvariables and tightens a sparsity constraint, which reduces the problem size gradually during the iter-ations and this makes it mainly fit for big data learning Tayal and Singh[16]proposed big data an-alytics based on the FSO and SA-based hybrid (FSOSAH) technique for a stochastic dynamic facilitylayout-based multiobjective problem to manage data effectively Saida et al.[17]proposed the cuckoosearch optimization (CO) algorithm-based big data analytics approach for clustering data Further, dif-ferent datasets from the UCI Machine Learning Repository are considered to validate the CO algorithmthrough experimental results and these datasets perform better in terms of computational efficiency andconvergence stability

Trang 15

optimi-Bio-inspired algorithms for big data analytics

Swarm-based

[25]

Genetic algorithm (GA)

FoS: Group scheduling

[16]

Simulated annealing (SA)

FoS: Feature selection

[24]

Cuckoo search optimization (CO)

FoS: Convergence stability

[30]

Evolutionary strategy (ES)

FoS: Cloud resources

[29]

Genetic programming (GP)

FoS: Concrete creep model

[28]

Differential evolution (DE)

FoS: Local search

Particle swarm optimization (PSO)

FoS: Group scheduling

Firefly swarm optimization (FSO)

FoS: Social network

2015

[36]

Group searcher optimization (GSO)

FoS: Data clustering

Trang 16

1.3.2 SWARM-BASED ALGORITHMS

Ilango et al.[9]proposed an artificial bee colony (ABC) algorithm-based clustering technique for agement of big data, which identifies the best cluster and performs the optimization for different datasetsizes The ABC algorithm approach minimizes the execution time and improves the accuracy AMapReduce-based Hadoop environment is used for implementation and results demonstrate that theABC algorithm delivers a more effective outcome than the differential evolution and particle swarmoptimization (PSO) in terms of execution time Raj and Babu[18]proposed a firefly swarm optimi-zation (FSO) algorithm for big data analytics for establishing novel connections in social networks tocalculate the possibility of sustaining a social network In this technique, a mathematical model is in-troduced to test the stability of the social network and this reduces the cost of big data management.Wang et al.[19]proposed an FSO algorithm-based hybrid (FSOH) approach for big data optimization

man-to focus on six multiobjective problems It reduces execution costs but it has high computationaltime complexity

Wang et al.[20]proposed a PSO algorithm-based big data optimization approach to improve onlinedictionary learning and introduced a dictionary-learning model using the atom-updating stage ThePSO algorithm reduces the heavy computational burdens and improves the accuracy Hossain et al.[21]proposed a parallel clustered PSO algorithm (PCPSO)-based approach for big data-driven servicecomposition The PCPSO algorithm handles huge amounts of heterogeneous data and process datausing parallel processing with MapReduce in the Hadoop platform Lin et al.[22]proposed a cat swarmoptimization (CSO) algorithm-based approach for big data classification to choose characteristics dur-ing classification of text for big data analytics The CSO algorithm uses the term frequency-inversedocument occurrence to improve accuracy of feature selection

Cheng et al.[23]proposed a swarm intelligence (SI) algorithm-based big data analytics approachfor the economic load dispatch problem and the SI algorithm handles the high dimensional data, whichimproves the accuracy of the data processing Banerjee and Badr[24]proposed the ant colony opti-mization (ACO) algorithm-based approach for mobile big data using rough set The ACO algorithmhelps to select an optimal feature for resolved decisions, which aids in effectively managing big datafrom social networks (tweets and posts) Pan[25] proposed the improved ACO algorithm (IACO)-based big data analytical approach for management of medical data such as patient data, operation dataetc., which helps doctors retrieve the required data quickly

Hu et al.[26]proposed a shuffled frog leaping (SFL) approach to perform the selection of the ture for improved high-dimensional biomedical data For improved high-dimensional biomedical data,the SFL algorithm maximizes the predictive accuracy by exploring the space of probable subsets toobtain the group of characteristics and reduce irrelevant features Manikandan and Kalpana[27]pro-posed a fish swarm optimization (FSW) algorithm for feature selection in big data The FSO algorithmreduces the combinatorial problems by employing the fish swarming behavior and this is effective fordiverse applications Social interactions among big data have been designed using the movement offish in their search for food This algorithm provides effective output in terms of fault toleranceand data accuracy Elsherbiny et al.[28]proposed the intelligent water drops (IWD) algorithm forworkflow scheduling to effectively manage big data The workflows simulation toolkit is used to testthe effectiveness of the IWD-based approach and results show that the IWD-based approach is per-formed effectively in terms of cost and makespan when compared to the FCFS, Round Robin, andPSO algorithm

Trang 17

fea-Neeba and Koteeswaran[29]proposed a bacterial foraging optimization (BFO) algorithm to sify the informative and affective content from medical weblogs MAYO clinic data is used as a med-ical data source to evaluate the accuracy to retrieve the relevant information Ahmad et al [30]proposed a BFO algorithm for network-traffic (BFON) to detect and prevent intrusions during thetransfer of big data Further, it controls the intrusions using a resistance mechanism Schmidt et al.[31]proposed an artificial immune system (AIS) algorithm-based big data optimization technique

clas-to manage and classify flow-based Internet traffic data To improve the classification performance,the AIS algorithm used Euclidian distance and the results demonstrate that this technique producesmore accurate results when compared to the Naı¨ve Bayes classifier George and Parthiban[32]pro-posed the group search optimization (GSO) algorithm-based big data analytics technique usingFSO to perform data clustering for the high dimensional dataset This technique replaces the worstfitness values in every iteration of the GSO with the improved values from FSO to test the performance

of clustering data

Pouya et al.[33]proposed the invasive weed optimization (IWO) algorithm-based big data tion technique to resolve the multiobjective portfolio optimization task Further, the uniform design andfuzzy normalization method are used to transform the multiobjective portfolio selection model into asingle-objective programming model The IWO algorithm manages big data more quickly than PSO

optimiza-Pu et al.[34]proposed a hybrid biogeography-based optimization (BBO) algorithm for multilayer ceptron training under the challenge of analysis and processing of big data Experimental results showthat BBO is effective in providing training to multilayer perceptron and performs better in terms ofconvergence when compared to the GA and PSO algorithm Fong et al.[35]proposed the multispeciesoptimizer (PS2O) algorithm-based approach for data stream mining big data to select features An in-cremental classification algorithm is used in the PS2O algorithm to classify the collected data streamspertaining to big data, which enhanced the analytical accuracy within a reasonable processing time.Fig 1.6shows the evolution of bio-inspired algorithms for big data analytics based on existing lit-erature as discussed above

per-Fig 1.7shows the number of papers published for each category of bio-inspired algorithm per year.This helps to recognize the important types of bio-inspired algorithms[14–23, 35] [11–13, 25–29]thatwere highlighted from 2014 to 2018

Trang 18

The literature reported that there are five types of analytics for big data management using inspired algorithms: predictive analytics, social media analytics, video analytics, audio analytics,and text analytics as shown inFig 1.8.

bio-Text analytics is a method to perform text mining for an extraction of required data from the tabase such as news, corporate documents, survey responses, online forums, blogs, emails, and socialnetwork feeds There are four methods for text analytics: (1) sentimental analysis, (2) question answer-ing, (3) text summarization, and (4) information extraction Theinformation extraction technique ex-tracts structured data from unstructured data, for example, an extraction of tablet name, type, andexpiry date from patient’s medical data Thetext summarization method extracts a concise summary

da-of various documents related to a specific topic Thequestion answering method uses a natural guage processing to find answers to the questions Thesentiment analysis method examines the view-point of people regarding events or products

lan-Audio analytics or speech analytics is a process of extraction of structured data from unstructuredaudio data and examples of an audio analytics are healthcare or call center data Audio analytics has

0 1 2 3 4 5 6 7

FIG 1.7

Time count of bio-inspired algorithms for big data analytics

Type of analytics for inspired algorithms

bio-Audio analytics Text analytics Video analytics Social media analytics Predictive analytics

Server-based architecture Edge-based

Content-based analytics Structure-based analytics

Heterogeneity

Noise accumulation Spurious correlation Incidental endogeneity

FIG 1.8

Type of analytics for bio-inspired algorithms

Trang 19

two types: large-vocabulary continuous speech recognition (LVCSR) and phonetic-based technique.LVCSR performs indexing (to transliterate the speech content of audio) followed by searching (to find

an index-term) The phonetic-based technique deals with phonemes or sounds and performs phoneticindexing and searching

Video analytics visualize, examine, and extract meaningful information from video streams such asCCTV footage, live streaming of sport matches etc Video analytics can be performed at end devices(edge) or centralized systems (server)

Social media analytics examines the unstructured or structured data of social media websites(a platform that enables an exchange of information among users) such as Facebook, Twitter etc Thereare two kinds of social media analytics: (1) content-based (data posted by users) or (2) structure-based(synthesizing the structural attributes).Predictive analytics is a method that uses historical and currentdata to predict future outcomes, which can be done based on: heterogeneity (data coming from differentsources), noise accumulation (an estimation error during interpretation of data), spurious correlation(uncorrelated variable due to huge size of dataset), or incidental endogeneity (predictors or explanatoryvariables, which are independent of the residual term)

Fig 1.9shows the different parameters that are considered in different bio-inspired algorithms forbig data analytics

There are four types ofdata mining techniques as studied from literature: classification, prediction,clustering, or association Inclassification, model attributes are used to arrange the data in a differentset of categories Theprediction technique is used to find out the unknown values Clustering is an

Parameters for big

data analytics

Storage NoSQL server

Fault tolerance Agility Virtualization

Cost Analytical technique

Ease of use Mechanism Data management

Scalability

Hbase Cassandra MongodB Couchbase Neo4J

Classification Prediction Clustering Association Reactive Proactive

FIG 1.9

Parameters of different bio-inspired algorithms for big data analytics

Trang 20

unsupervised technique, which clusters the data based on related attributes Theassociation technique

is used to establish a relationship among different datasets There are five types ofNoSQL databasemanagement systems (DBMS) that are used in existing techniques: Hbase, Cassandra, MongodB,Couchbase, and Neo4J The two different types ofmechanism for making decisions in bio-inspiredalgorithms for big data analytics are: proactive (working on forward-looking decisions, which requiresforecasting or text mining) and reactive (decisions based on the requirement for data analytics).Scalability refers to the mechanism of a computing system to scale-up or scale-down its nodes based

on the amount of transfer of data for analytics Big data analytics techniques use a large amount ofstorage space to store the information to perform the different types of analytics to extract the requiredinformation.Fault tolerance of a system is the ability to process the user data within the required timeframe The type of data that is requiring analytics is continually changing, so there is a need foragility-based big data analytical models to process user data in a required format.Virtualization is a techniquethat is required for cloud-based systems to create virtual machines for processing user data in a dis-tributed manner.Execution cost is the amount of efforts that are required to perform big data analytics.Ease of use is defined as the mechanism that explains how easily the system can be used to perform bigdata analytics.Data management is discussed inSection 1.2.Table 1.1shows the comparisons of bio-inspired algorithms for big data analytics based on different parameters

Table 1.1shows the comparisons of bio-inspired algorithms for big data analytics based on differentparameters, which helps the reader to choose the most appropriate bio-inspired algorithm In thecurrent scenario, cloud computing has emerged as the fifth utility of computing and has capturedthe significant attention of industries and academia for big data analytics Virtualization technology

is progressing continuously, and new models, mechanisms, and approaches are emerging foreffective management of big data using cloud infrastructure

Fog computing uses network switches and routers, gateways, and mobile base stations to providecloud service with minimal possible network latency and response time Therefore, fog or edge devicescan also perform big data analytics at the edge device instead of at a decentralized database or server

Bio-inspired algorithm-based big data analytics has several challenges that need to be addressed, such

as resource management, usability, data processing, elasticity, resilience, heterogeneity in nected clouds, sustainability, energy efficiency, data security, privacy protection, edge computing,and networking

Cloud resource management is the ability of a computing system to schedule available resources toprocess user data over the Internet The cloud uses virtual resources for big data analytics to processuser data quickly and cheaply The virtualization technology provides effective management of cloudresources using bio-inspired algorithms to improve user satisfaction and resource utilization There is a

Trang 21

ABC [9] ✖ √ ✖ √ ✖ ✖ √ Audio Cassandra Reactive Audio Volume,

variety, velocity

[16]

veracity, variability

Classification

velocity

Prediction PCPSO

[21]

service

Variety, veracity

Association

velocity, variability

Clustering

velocity, variability

Trang 22

ACO [24] ✖ √ ✖ √ ✖ ✖ √ Predictive MongodB Proactive Transactional Volume,

variability, velocity

Clustering

variability, velocity

Association

velocity, variety

Clustering

veracity, velocity

Association

service

Velocity, variability

Classification

veracity, variability

Clustering

variability, velocity

Classification

service

Volume, variety, veracity, variability

Association

velocity, veracity

Clustering

Trang 23

need to optimize provisioning of cloud resources in existing bio-inspired algorithms for big data alytics To solve this challenge, a quality of service (QoS)-aware bio-inspired algorithm-based resourcemanagement approach is required for the efficient management of big data to optimize the QoSparameters.

There is a challenge of data synchronization in bio-inspired algorithms due to data processing that istaking place geographically, which increases overprovisioning and underprovisioning of cloud re-sources There is a need to identify the overloaded resources using rapid elasticity, which can handlethe data received from different IoT devices To improve the recoverability of data, there is a need for adata backup technique for big data analytics, which can provide the service during server downtime

The cloud providers such as Microsoft, Amazon, Facebook, and Google are delivering reliable andefficient cloud service by utilizing various cloud resources such as disk drives, storage devices, net-work cards, and processors for big data analytics The complexity of computing systems is increasingwith an increasing size of cloud data centers (CDCs), which increases the resource failures during bigdata analytics The resource failure can be premature termination of execution, data corruption, andservice level agreement (SLA) violation There is a need to find out more information about the failures

to make the system more reliable There is a need for replication of cloud services to analyze the bigdata in an efficient and reliable manner

To reduce energy consumption, there is a need to migrate user data to more reliable servers for efficientexecution of cloud resources Moreover, introducing the concept of resource consolidation can increasethe sustainability and energy efficiency of a cloud service by consolidating the multiple independentinstances of IoT applications

To improve the reliability of distributed cloud services, there is a need to integrate security protocols inthe process of big data analytics Further, there is a need to incorporate authentication modules at dif-ferent levels of data management

There are a large number of edge devices participating in the IoT-based Fog environment to improvethe computation and reduce the latency and response time, which can further increase the energy con-sumption Fog devices are not able to offer resource capacity in spite of additional computation andstorage power There is a need to process the user data at an edge device instead of at the server, whichcan reduce execution time and cost

Trang 24

1.5 EMERGING RESEARCH AREAS IN BIO-INSPIRED ALGORITHM-BASED BIG DATA ANALYTICS

In addition to future research directions, there are various hotspot research areas in bio-inspired based big data analytics that need to be addressed in the future such as containers, serverless computing,blockchain, software-defined clouds, bitcoin, deep learning, and quantum computing In this section, wediscuss hotspot research areas in the context of bio-inspired algorithm- based big data analytics

Docker is a container-based virtualization technology that can be used for bio-inspired algorithm-basedbig data analytics in multiple clouds using a lightweight web server that is, HUE (Hadoop user expe-rience) web interface The HUE-based Docker container provides a robust and lightweight container as

a service (CaaS) data processing facility using a virtual multicloud environment

Serverless computing can be used for bio-inspired algorithm-based big data analytics without ing the cloud infrastructure and it is effective in processing user data without configuration of the net-work and resource provisioning Serverless computing as a service (SCaaS) has two different services:backend-as-a-service (BaaS) and function-as-a-service (FaaS), which can improve the efficiency,robustness, and scalability of big data processing systems and analyze the data quickly

Blockchain is a distributed database system, which can manage a huge amount of data at a low cost and

it provides instant risk-free transaction Blockchain as a service (BaaS) decreases the time of processing

a transaction dramatically and increases security, quality, and integrity of data in bio-inspiredalgorithm-based big data analytics

A huge amount of data originates from different IoT devices and it is necessary to transfer data fromsource to destination without any data loss Software-defined cloud as a service (SCaaS) is a new par-adigm, which provides the effective network architecture to move data from IoT devices to a clouddatacenter in a reliable and efficient manner by making intelligent infrastructure decisions Further,SCaaS offers other advantages for bio-inspired algorithm-based big data analytics such as failurerecovery, optimization of network resources, and fast computing power

Deep learning is a new paradigm for bio-inspired algorithm-based big data analytics to process the userdata with high accuracy and efficiency in a real time manner using hybrid learning and training mech-anisms Deep learning as a service (DLaaS) uses a hierarchical learning process to get high-level, com-plex abstractions as representations of data for analysis and learning of huge chunks ofunsupervised data

Trang 25

1.5.6 BITCOIN AS A SERVICE (BIaaS)

Cryptocurrencies are a very popular technology used to provide secure and reliable service for a hugenumber of financial transactions Bitcoin as a service (BiaaS) performs real-time data extraction fromthe blockchain ledger and stores the big data in an efficient manner for bio-inspired algorithm-based bigdata analytics BiaaS-based big data analytics provides interesting benefits such as trend prediction,theft prevention, and identification of malicious users

The new trend of quantum computing helps bio-inspired algorithm-based big data analytics to solvecomplex problems by handling massive digital datasets in an efficient and quick manner Quantum com-puting as a service (QCaaS) allows for quick detection, analysis, integration, and diagnosis from largescattered datasets Further, QCaaS can search extensive, unsorted datasets to quickly uncover patterns

This chapter presents a review of inspired algorithms for big data analytics The comparison of inspired algorithms has been presented based on taxonomy, focus of study, and identified demerits.Bio-inspired algorithms are categorized into three different categories and we investigated the existingliterature on big data analytics towards finding the open issues Further, promising research directionsare proposed for future research

bio-GLOSSARY

Big data this is the set of huge datasets, which contains different types of data such as video, audio,

text, social etc

Data management there are five types of dimensions of data management for big data analytics: volume,

va-riety, velocity, veracity, and variability

Big data analytics the process of an extraction of required data from unstructured data There are five types of

analytics for big data management using bio-inspired algorithms: text analytics, audio alytics, video analytics, social media analytics, and predictive analytics

an-Bio-inspired

optimization

the bio-inspired algorithms are used for big data analytics, which can be ecological,swarm-based, and evolutionary algorithms

Cloud computing cloud computing offers three types of main service models: software, platform, and

infra-structure At the software level, the cloud user can utilize the application in a flexible ner, which is running on cloud datacenters The cloud user can access infrastructure todevelop and deploy cloud applications at platform level Infrastructure as a service offersaccess to computing resources such as a processor, networking, and storage and enablesvirtualization-based computing

man-ACKNOWLEDGMENTS

One of the authors, Dr Sukhpal Singh Gill [Postdoctoral Research Fellow], gratefully acknowledges the CloudComputing and Distributed Systems (CLOUDS) Laboratory, School of Computing and Information Systems,The University of Melbourne, Australia, for awarding him the Fellowship to carry out this research work Thisresearch work is supported by Discovery Project of Australian Research Council (ARC), Grant/Award Number:DP160102414

Trang 26

[8] I Singh, K.V Singh, S Singh, Big data analytics based recommender system for value added services (VAS),in: Proceedings of Sixth International Conference on Soft Computing for Problem Solving, Springer,Singapore, 2017, pp 142–150.

[9] S.S Ilango, S Vimal, M Kaliappan, P Subbulakshmi, Optimization using artificial bee colony based tering approach for big data, Clust Comput (2018) 1–9,https://doi.org/10.1007/s10586-017-1571-3.[10] R Kune, P.K Konugurthi, A Agarwal, R.R Chillarige, R Buyya, Genetic algorithm based data-aware groupscheduling for big data clouds, in: Big Data Computing (BDC), 2014 IEEE/ACM International Symposium,IEEE, 2014, pp 96–104

clus-[11] A.H Gandomi, S Sajedi, B Kiani, Q Huang, Genetic programming for experimental big data mining: a casestudy on concrete creep formulation, Autom Constr 70 (2016) 89–97

[12] S Elsayed, R Sarker, Differential evolution framework for big data optimization, Memetic Comput 8 (1)(2016) 17–33

[13] A.H Kashan, M Keshmiry, J.H Dahooie, A Abbasi-Pooya, A simple yet effective grouping evolutionarystrategy (GES) algorithm for scheduling parallel machines, Neural Comput & Applic 30 (6) (2018)1925–1938

[14] M.M Mafarja, S Mirjalili, Hybrid whale optimization algorithm with simulated annealing for feature tion, Neurocomputing 260 (2017) 302–312

selec-[15] A Barbu, Y She, L Ding, G Gramajo, Feature selection with annealing for computer vision and big datalearning, IEEE Trans Pattern Anal Mach Intell 39 (2) (2017) 272–286

[16] A Tayal, S.P Singh, Integrating big data analytic and hybrid firefly-chaotic simulated annealing approachfor facility layout problem, Ann Oper Res 270 (1–2) (2018) 489–514

[17] I.B Saida, K Nadjet, B Omar, A new algorithm for data clustering based on cuckoo search optimization,in: Genetic and Evolutionary Computing, Springer, Cham, 2014, pp 55–64

[18] E.D Raj, L.D Babu, A firefly swarm approach for establishing new connections in social networks based onbig data analytics, Int J Commun Netw Distrib Syst 15 (2-3) (2015) 130–148

[19] H Wang, W Wang, L Cui, H Sun, J Zhao, Y Wang, Y Xue, A hybrid multi-objective firefly algorithm forbig data optimization, Appl Soft Comput 69 (2018) 806–815

[20] L Wang, H Geng, P Liu, K Lu, J Kolodziej, R Ranjan, A.Y Zomaya, Particle swarm optimization baseddictionary learning for remote sensing big data, Knowl.-Based Syst 79 (2015) 43–50

Trang 27

[21] M.S Hossain, M Moniruzzaman, G Muhammad, A Ghoneim, A Alamri, Big data-driven service sition using parallel clustered particle swarm optimization in mobile environment, IEEE Trans Serv Com-put 9 (5) (2016) 806–817.

compo-[22] K.C Lin, K.Y Zhang, Y.H Huang, J.C Hung, N Yen, Feature selection based on an improved cat swarmoptimization algorithm for big data classification, J Supercomput 72 (8) (2016) 3210–3221

[23] S Cheng, Q Zhang, Q Qin, Big data analytics with swarm intelligence, Ind Manag Data Syst 116 (4)(2016) 646–666

[24] S Banerjee, Y Badr, Evaluating decision analytics from mobile big data using rough set based ant colony,in: Mobile Big Data, Springer, Cham, 2018, pp 217–231

[25] X Pan, Application of improved ant colony algorithm in intelligent medical system: from the perspective ofbig data, Chem Eng 51 (2016) 523–528

[26] B Hu, Y Dai, Y Su, P Moore, X Zhang, C Mao, J Chen, L Xu, Feature selection for optimized dimensional biomedical data using the improved shuffled frog leaping algorithm, IEEE/ACM Trans Com-put Biol Bioinform 15 (2016) 1765–1773

high-[27] R.P.S Manikandan, A.M Kalpana, Feature selection using fish swarm optimization in big data Clust put (2017) 1–13,https://doi.org/10.1007/s10586-017-1182-z

Com-[28] S Elsherbiny, E Eldaydamony, M Alrahmawy, A.E Reyad, An extended intelligent water drops algorithmfor workflow scheduling in cloud computing environment, Egypt Inform J 19 (1) (2018) 33–55

[29] E.A Neeba, S Koteeswaran, Bacterial foraging information swarm optimizer for detecting affective and formative content in medical blogs, Clust Comput (2017) 1–14,https://doi.org/10.1007/s10586-017-1169-9.[30] K Ahmad, G Kumar, A Wahid, M.M Kirmani, Intrusion detection and prevention on flow of Big Datausing bacterial foraging, in: Handbook of Research on Securing Cloud-Based Databases With Biometric Ap-plications, IGI Global, 2014, p 386

in-[31] B Schmidt, A Al-Fuqaha, A Gupta, D Kountanis, Optimizing an artificial immune system algorithm insupport of flow-based internet traffic classification, Appl Soft Comput 54 (2017) 1–22

[32] G George, L Parthiban, Multi objective hybridized firefly algorithm with group search optimization for dataclustering, in: Research in Computational Intelligence and Communication Networks (ICRCICN), 2015IEEE International Conference, IEEE, 2015, pp 125–130

[33] A.R Pouya, M Solimanpur, M.J Rezaee, Solving multi-objective portfolio optimization problem using vasive weed optimization, Swarm Evol Comput 28 (2016) 42–57

in-[34] X Pu, S Chen, X Yu, L Zhang, Developing a novel hybrid biogeography-based optimization algorithm formultilayer perceptron training under big data challenge, Sci Program 2018 (2018) 1–7

[35] S Fong, R Wong, A.V Vasilakos, Accelerated PSO swarm search feature selection for data stream miningbig data, IEEE Trans Serv Comput 9 (1) (2016) 33–45

FURTHER READING

S.S Gill, I Chana, R Buyya, IoT based agriculture as a cloud and big data service: the beginning of digital India,JOEUC 29 (4) (2017) 1–23

Trang 28

BIG DATA ANALYTICS

pub-The first venture is to run analytics on differing information to meet the commercial enterpriseneed for reading records from many sources, such as relational databases, Excel files, Twitter, andFacebook The calls for different established codecs, semi-dependent and unstructured, are disbursedthroughout various information centers relational databases, NoSQL databases, and file systems, andthe need is to place them in a format that can be processed by the facts-mining algorithms[1] Most ofthe existing libraries use an extract-remodel-load operation to extract the records from the uniquestores and to remodel their layout to an appropriate schema[2] This approach is time-consumingand requires that all facts be acquired in advance.Fig 2.1 shows how blood pressure monitoring,intelligent pillbox, and blood sugar monitoring services are using the Internet of things and usingservers in hospitals, shopping malls, bus stops, and restaurants for ad hoc service in cases ofemergency

The fact that an investigation is multidisciplinary makes it troublesome for organizations to find therequired specialized aptitudes to embrace enormous actualities examination The consumable re-search provides essential characteristics for managing this task and for triumphing over the inacces-sibility of analytical capacities [3] by a method for making the survey less difficult to apply [4].Consumable examination refers to the developing of the abilities that are effective and current in

a business endeavor by a method for creating devices that make an investigation easier to build, see, and expend[5] The consumable inquiry is a public interface or a programming dialect for check-ing human services information such as circulatory strain, weight, and sugar level, using the doorshown inFig 2.2

over-Big Data Analytics for Intelligent Healthcare Management https://doi.org/10.1016/B978-0-12-818146-1.00002-7

Trang 30

2.1.2 ALLOTTED RECORDS MINING ALGORITHMS

Most people working with present information mining libraries like R, WEKA, and RapidMiner onlyaid sequential single-gadget completion of the facts mining algorithms This makes these librariesincorrect for coping with the extensive records massive volumes[6] Scalable distributed facts mininglibraries, such as Apache Mahout, Cloudera Oryx, Oxdata H2O, MLlib, and Deeplearning4j, rewrite therecords mining algorithms to run in a disbursed fashion on Hadoop and Spark Those libraries areadvanced by looking the algorithms for components to be performed in parallel and rewriting them.This procedure is complicated, time-consuming, and the nice, modified set of rules depends entirely

on the participants’ information[7] It makes these libraries tough to broaden, preserve, and enlarge andknow-how large statistics are especially vital It is important that the facts one is counting on are wellanalyzed The additional need for IT experts is an assignment for big records in accordance withMcKinsey’s examination on large data as huge facts: the following boundary for modernism Theserecords are evidence that for a business enterprise to take the massive records initiative, it has either

to rent experts or to educate existing personnel on the brand-new discipline[8]

This affects the system of storing statistics and is making it extra tough to paint with; it can create aneverlasting connection between the devices that are sending records to the system The “sender” willmake certain that the “receiver” has no gaps regarding the information that should be saved This loopought to paint as long as the system receiving data tells the machine that sends it to prevent facts that aresaved as the only dispatched So, can a simple assessment system that can save you lose facts? Thismethod can also be sluggish throughout the whole process[9] To avoid this from occurring, for anycontent that is transmitted, the sender should generate a “key.” To improve expertise, this solution iscomparable with the MD5 Hash that generated over a compressed content material However, in thisexample, the keys are in comparison robotically Losing records is not a constant hardware problem.The software can as easily malfunction and cause irreparable and riskier information loss If one dif-ficult drive fails, there is usually another one to back it up, so there is no damage in information; how-ever when software fails because of a programming “worm” or a flaw in the layout, facts are misplacedall the time To overcome this hassle, programmers evolved series of tools to lessen the impact of asoftware breakdown[10] An easy example is Microsoft Word, which occasionally saves the paintingsthat a consumer creates to protect against their loss in case of hardware or software program failure;saving prevents complete statistics loss

Currently, the approach most generally used for huge aggregate portions of data is to duplicate thestatistics to a massive garage power and then deliver the control to the vacation spot Even so, largedata study initiatives usually contain more than one corporation, distinct geographic places, and largenumbers of researchers This is inefficient and affords a barrier for records change among groups thatare using some of the techniques[11] Other means involve the application of networks to transfer thefiles However, shifting vast amounts of information into or out of a statistics repository (e.g., datawarehouse) is a large networking venture

Trang 31

2.1.5 STATISTICS PRESERVATION-DEMANDING SITUATIONS

Because great fitness facts include extensive collections of datasets, it is difficult to effectively shopand preserve the records in an unmarried robust force using traditional information control systemsincluding relational databases[12] Additionally, it is a heavy cost and time burden for IT within asmall organization or lab

This stage entails integrating and reworking information into the right layout for subsequent statisticsevaluation Combining unstructured statistics is the primary mission for big data analytics (BDA) Re-gardless of established electronic health-care record (EHR) data integration, there are numerous trou-bles[13], as shown inFig 2.3

The problem arises when fixed health information saved in an Oracle database in machine X istransferred to a MySQL database in machine Y The Oracle database and the MySQL database usestatistics structures to keep statistics[14] Also, machine X would possibly use the “quantity” data type

to store patients’ sexuality statistics, whereas system Y may use the “CHAR” data type Metadata onrecord describes the characteristics of a resource[15] Within the relational database version, the col-umn names are used as metadata to explain the traits of the stored statistics

FIG 2.3

EHR data incorporation challenges

Trang 32

There are two significant troubles in metadata integration First, different database systems useunique metadata to describe content For example, one system might use “sex” at the identical timethat another might use “gender” when discussing an affected person A PC does not realize that

“sex” and “gender” can be semantically comparable Second, there are issues when mapping simplemetadata to composite metadata For example, a PC cannot robotically map a metadata “PatientName”into a system in the composite metadata “FirstName” + “LastName” inside the different gadget Also,different structures using one-of-a-kind coding schemes for analyzing information could purpose codemap problems For instance, the SNOMED CT and ICD-10 codes for the disease “abscess of foot” areone-of-a-kind The coding device to move mapping is not developed[16]

The essential expectation for wellness BDA is to utilize registered models to anticipate complex humanbeing wonders from different and enormously scaled datasets The difficulty of choosing or buildingprescient styles is consist with the complexity of the assessment inconvenience if the examination in-convenience is essentially composed of “what’s the normal patient age with diabetes in the world-wide?” It is at this point that a straightforward recommended count set of principles can procurethe appropriate response in a period direct to the records; information sources such as restorative in-formation, social information, video, and overview information are shown inFig 2.4

FIG 2.4

Big data analysis data sources

Trang 33

If the watch question is NP-hard, after that, the registering time might be superexponential[17] Onoccasion, the Bayesian people group is a simplified arrangement of principles for displaying learning incomputational science and bioinformatics Inside the calculation for the many-sided qualities of theBayesian people group, the registering time for finding a useful system increments exponentially be-cause the vast assortment of information will increment.

For some intricate investigation comprehensive of “posting every single diabetic patient with tive heart disappointment hardship who is more energetic than the regular diabetic patient of the pa-tient’s worldwide It is difficult to strategize this inquiry rapidly while the table contains seven billioncolumns without ordering It will take as a base 15 days to obtain the final product using the same PC

Also, many people assume that greater certainties regularly provide better information for making terminations Be that as it may, the hardware of great information innovation and know-how does notshield us from skews, holes, and inadequate presumptions[18] Another endeavor shows that with largedatasets, sizable costs are normal when the goal is making information as straightforward as it appeared

de-in world everyday de-information (Fig 2.5), when online networking information is expanding, and formation examination remains a test[19, 20]

in-World day to day data

Data analysis

Data archive Social media and health data

FIG 2.5

Data archive and analysis

Trang 34

2.3 ARRANGEMENTS OF CHALLENGES

Unstructured insights are extremely difficult to incorporate and systematize adequately while in anuncooked format In that capacity, records extraction techniques are executed to separate essentialand sensible ward information from the uncooked insights Furthermore, a few answers have beenproposed for unstructured data reconciliation[21] The issue with these methodologies is that a largeportion of them are inconveniently situated, that is, the technique is most viably actualized to the one-of-a-kind examination of informational indexes Not many nonexclusive systems exist for joiningunstructured records Answers for subordinate data incorporation are ordered into basic strategies

Computerized composition (a portion of the metadata) or measurement case-mapping calculations tinually create blunders Indeed, those blunders can be consistent through a couple of territory experts

con-Be that as it may, this method cannot work for large data joining, as it uses such a large number ofmetadata to test the errors physically—numerous analysts at that point support utilizing swarm remarksfor upgrading the incorporation A case of this includes a decision-based methodology that tends to thecomplex information-incorporation issues[22] A PC framework principally provided the system withhundreds of physical, social insurance mapping, and after that recognizes the most extreme, reasonablecounterpart for tables and the related fields inside the outline by using coordinating rules Those rulesdeal with the multifaceted nature of semantics decided in human-services databases The first gain ofthis model is that client mediations drastically blast blueprint-mapping exactness

The probabilistic incorporation system doles out opportunities to circumstance connections among sets

of pattern devices After the probabilities are assessed, a limit is used to pick coordinating and matching things In this manner, the vulnerability that was produced all through the blend procedure

non-is wiped out The probabilnon-istic technique attempts to make an intervened composition mechanicallyfrom settled information sources and the seeming semantic mappings among the assets and the inter-ceded construction[23] It stays away from human intercession issues

Likewise, which method ought to be tried before sending them? There are various methodologiesfor approving the styles: (1) utilizes measurable legitimacy to determine if there are issues inside therecords or in the model, (2) isolates the data into preparing and testing sets to test the exactness of styles,(3) requests that space specialists check what the watched forms have, which implies checking insidethe focal situation To manage the security-requesting circumstances, we will utilize protection-savingrecords that determine calculations for the revelation of learning to make sure of the certainty of pri-vacy Governments can likewise build complete arrangements to protect document security[24] Dif-ficult circumstances have been identified with providing assurances to the public Currently, extensivedata are expressed to produce potential outcomes for society, even though the utilization can addition-ally create challenges, for example, administration-requesting circumstances, top-notch information,and good privacy issues, which all can happen because of the use of extensive information by a methodfor the overall population within a territory

Trang 35

2.3.3 DEFINING AND DETECTING ANOMALIES IN HUMAN ECOSYSTEMS

An overarching undertaking, while attempting to update a degree or to discover anomalies in humanecosystems, is the characterization of abnormality The character and detection of what may constitutesocioeconomic “anomalies” and how they differ certainly may be much less clear cut from anomalies

in the realm of detecting sickness outbreaks or monitoring malfunctions in other types of dynamicstructures, such as upgraded car engines [25] Data size is increasing exponentially every day and

is creating a large amount of health data, as shown inFig 2.6

The data required for analytical and computational causes are strongly heterogeneous, which clude typical integration problems of both statistics and schema, and the data are additionallyupdated by the advent of new and updated architectures for analytics[26] A massive challenge

in-90

Daily 80

60 40 20 0

Frequency of visit

Twice a week Weekly Monthly

FIG 2.6

Human body as a source of data

Trang 36

in updating big data is the transformation of unstructured statistics into a suitable and dependentlayouts to present updated and meaningful design analytics[27] Records are scaling, which is aproblematic issue, as information quantity is growing faster than computer assets, and CPU speedsare static[28] The layout of a gadget that correctly offers length is probably additionally updated,resulting in systems that can provide statistics within a given period more quickly[29] The inte-gration of significant facts is multidimensional and multidisciplinary and requires a multiera tech-nique that poses a broad mission.

Massive facts have many implications for sufferers, companies, researchers, payers, and varioushealth-care components It will update the impact of how those players interact with the health-careatmosphere, specifically while external information, regionalization, globalization, mobility, and so-cial networking are concerned In the older model, health-care centers and different companies wereincentivized to hold sufferers in treatment; that is, more inpatient days translated into extra revenue[30] The trend with new models and currently responsible care groups is to update incentives and

to compensate companies to remain updated to keep patients healthy Equally, sufferers are ingly demanding information about their health-care options so that they can comprehend their selec-tions and can participate in choices about their care Patients also provide a vital detail in maintaininglower health-care fees and improving results when sufferers are supplied with correct and current in-formation and guidance, and these facts will assist them to make better decisions and higher adherence

increas-to updated remedy programs[31]

Updated statistics are convenient for gathering demographics and clinical data; every other recordsupply is data that patients expose about themselves While combined with results, information pro-vided by patients can update a treasured source of records for researchers and others seeking informa-tion on reducing costs, boosting positive consequences, and enhancing treatment Several demandingsituations exist with self-suggested records:

• Accuracy: People tend to understate their weight and the documentation of their interaction withbad behaviors such as smoking; in the meantime, they tend overstate unusual behaviors such as aworkout[32]

• Privacy worries: People are usually reluctant to reveal information about themselves because ofprivacy and other issues Creative approaches are needed to discover information and to inspirepatients to accomplish this without adversely impacting their records[19, 20]

• Consistency: Benchmarks require portrayal and connections to offer consistency in self-revealingrecords using social insurance means to eliminate errors and to increase the convenience ofcertainties in rules and principles[33]

• Facility: Mechanisms provide a breakthrough in e-wellbeing and m-wellness, which are coming, versatile, and interpersonal interactions that in the future need to be imaginatively used

up-and-to facilitate donors’ capacity for specific self-records Supplying up-up-and-to-date unidentifiedstatistics can concurrently enhance ranges of self-reporting as a community develops amongmembers [34]

Trang 37

2.5.1 PRESENT ANSWERS TO CHALLENGES FOR THE QUANTITY MISSION

be-2.5.1.2 Hadoop-distributed file system

The Hadoop-Distributed File System (HDFS) is the limited portion of Hadoop; it is expected to storegenerous enlightening accumulations on clusters regularly and to stream that data at high throughput tocustomer applications HDFS stores record structure metadata and application data autonomously Nat-urally, it stores three free copies of each datum square (replication) to ensure faithful quality, openness,and execution

Hadoop MapReduce is a parallel programming framework for dispersed planning, completed overHDFS The Hadoop MapReduce engine contains a JobTracker and a couple of TaskTrackers Rightwhen a MapReduce work is executed, the JobTracker parts it into smaller errands (outline reduce) man-aged by the TaskTrackers In the Map step, the pro-centerpoint takes the information, segments it intosmaller subproblems, and passes them on to worker centers Each worker center point shapes a subissueand creates its results as key/regard sets In the Reduce step, the characteristics with a correspondingkey are accumulated and arranged by a comparable machine to outline the last yield

Apache Spark is an open-source in-memory data examination pack for figuring structure, made in theAMPLab at UC Berkeley As a MapReduce-like gathering and enrolling engine, Spark moreover hasincredible traits, for instance, versatility and adjustment to inside disappointment as MapReduce does[35] The essential impression of Spark is Resilient Distributed datasets (RDDs), which impact Spark to

be an all-around program that meets all necessities to process iterative businesses, including PageRankcomputation, K-suggests figuring, and so forth RDDs are stand-out to Spark and, as such, isolate Startfrom standard MapReduce engines Additionally, given RDDs, applications on Spark can keep data inmemory transversely over the request and reproduction of like data lost in the midst of dissatisfactions.RDD is a scrutinized data collection, which can be either a recordset away in an outside limit structure,for instance HDFS, or can be an induced dataset made by various RDDs RDDs store much informa-tion, for example, its distributions and a course of action of conditions on parent RDDs called hereditywith the help of the heredity, Spark recovers lost data quickly and effectively It is beginning to showgreat execution in getting iterative estimation ready, since it can reuse direct results and keep data inmemory over various parallel undertakings[36]

Trang 38

2.5.1.5 Grid computing

Grid computing is represented via some servers that might be interconnected by an excessive velocitycommunity; each of the servers performs one or many roles The two predominant benefits of gridcomputing are the high garage functionality and the processing of electricity, which interprets the factsand the computational grids[37]

2.5.1.6 Spark structures

The version of spark use, plus in-memory computing, creates significant overall performance gains forexcessive volume and diverse information All these methods permit companies and groups to discovermassive volumes of facts and to gain business insights from them There are viable approaches to ad-dress the quantity hassle We can both decrease the information and invest in appropriate infrastructure

to resolve the trouble of statistics volume and, primarily based on our value price range and necessities,

we can pick technologies and methods described earlier[38] If we have resources with understanding

in Hadoop, we can continuously use them

2.5.1.7 Capacity solutions for records-variety trouble

OLAP equipment (analytical processing tools) records processing can be achieved using OLAP gear,and it establishes connections among information It subsequently assembles information into a logicalformat that allows one to gain a right of entry Without problems, OLAP-equipment professionals canachieve high speed and less frequent lag time for processing top volume records OLAP-equipmenttechniques handle all of the documents provided to them, regardless of whether they are applicable

or not, and this is one of the drawbacks of OLAP tools[39]

Apache Hadoop is a wide-open supply software, and its most fundamental motive is to manipulatevast quantities of statistics in a complete, short period with tremendous ease Hadoop can divide sta-tistics among a couple of systems’ infrastructures that are able to process them A map of the content iscreated in Hadoop so it can be accessed and discovered without problem SAP HANA is an in-reminiscence records platform that is deployable as on-premise equipment, or within the cloud It is

a revolutionary platform it is pleasant and suitable for appearing in real-time analytics, and for oping and deploying real-time applications New DB and indexing architectures make the experience

devel-of disparate data assets swift

Different therapeutic picture division approaches have been proposed, and numerous significant hancements have been acquired Nonetheless, because of inadequacies in social-insurance imagingframeworks, medicinal pictures can contain distinctive assortments of ancient rarities These old rar-ities can influence the item information and dumbfound the pathology The attractive imaging inno-vation can moderate a few relics, and some require subsequent control In clinical research, the naturalmarvels made are commotion, power inhomogeneity, and incomplete volume impacts that happen,which are considered as the open issues in a restorative picture division[40] There are various pro-cedures to position a photograph into areas that might be homogeneous Not every one of the methods isreasonable for medicinal picture examination, given the many-sided quality and errors[41] No stan-dard picture division system may create agreeable outcomes for all imaging applications like mindMRI, cerebrum growth analysis, and so forth The ideal determination of highlights, tissues, cerebrum,

Trang 39

en-and nonmind components are thought of as principle confinements for the mind-picture division Theexact division in an overabundance of the full field of view is another intrusion Administrator bearingand manual thresholding are for the most part different boundaries to fragment the cerebrum picture.Amid the division treatment, the check of advantages is another reason for trouble[19, 20].

Picture division will be the issue of expelling closer-view objects from the foundation in a picture It

is among the most fundamental problems in PC vision, and it has intrigued numerous investigatorsconsistently As the simple use of PCs incrementally increase over time, dependable picture division

is required in more applications, in the modern, restorative, and individual fields Thoroughly grammed division continues to be an open issue due to a wide assortment of conceivable articles’ blend,thus undoubtedly the utilization of human “clues” is unavoidable Intelligent picture division subse-quently is increasingly prevalent among explorers these days[42] The objective of the intuitive divi-sion is (a) separate object(s) from the foundation in an exact way, utilizing client information in a waythat requires insignificant discussion and negligible answer time This theory will unquestionably begin

pro-by depicting general procedures to sort division approaches, proceeds with an intensive study of ing shape-based original picture-division methods, and finishes by presenting a radical new joined al-tering and division instrument Picture division is the pivotal issue with picture investigation alongsidepicture understanding It is additionally a fundamental issue of PC vision and example acknowledg-ment[43] The active contour models (ACM) are the best procedures in picture division, and the criticalthought of ACM is to advance a bend as laid out by some specific limitations to separate the requiredprotest These common dynamic form models, classified as edge-based and based, are two sorts thathave their special paybacks and negatives, and the different attributes with the pictures control the de-cision between them to use in applications The model forms an edge-based capacity utilizing picture-edge certainties, which can produce the shape on the protest limits The edge-based size by the pictureinclination can investigate the correct confinements for the pictures with extraordinary clamor or pos-sibly a slight edge

exist-On the other hand, a district-based model uses factual data to build up a locale, ceasing a capacitythat could stop the form advancement between particular areas[44] Contrasted with the edge-basedmodel, this model can improve the situation for pictures with obscured closes The area-based model isnot delicate to a statement of the level set capacity and can perceive the protest’s limitations Region-based models are favored for picture division, since they give a change over the edge-based model in acouple of perspectives, and it has confinements The general district-based models that are proposed inparallel pictures, with the suspicion that every picture area can be homogeneous, do not work impec-cably for photos with force inhomogeneity, it is touchy to the starter shape, and the developing bendmight catch into local minima In expansion, the Chan-Vese (CV) technique is not reasonable forspeedy preparing in light of the fact that, in every emphasis, the standard powers inside and in the pastform should wind up as shown inFig 2.7 Various segmentation methods are giving different result forthe same data, which enhances the calculation time[45]

The neighborhood twofold-fitted model, by installing nearby picture data, can fragment pictureswith force inhomogeneity that is considerably more exact than arranged systems The essential thought

is to present the Gaussian piecework, even though it partitions well the pictures with power geneity It has a high computational time and a multifaceted nature In this manner, the division processtakes a considerable time when contrasted with old division systems Zhang proposed a dynamic formstrategy propelled by local image-fitting vitality, which gives the same division, coming about and con-taining less time unpredictability when contrasted with local binary fitting Reinitializing the specific

Trang 40

inhomo-level set capacity to some marked separation work all through the development was utilized for keeping

up stable advancement and specific alluring outcomes From a liberal perspective, the reintroductionprocess is frequently somewhat convoluted and expensive The locale-based level-set method with anadjusted marked separation capacity to wipe out the need, related with a reintroduction and regular-ization function, worked admirably under the high-power inhomogeneity issue and had better out-comes in correlation with different techniques It uses both edge and the spot data to section apicture into the absence of a covering spot and gives control of the developing bend in through theenrollment degree in the present pixel inside or outside the dynamic shape It is executed throughthe marked weight work that uses the nearby data in the picture The proposed strategy will fragmentimaging with power inhomogeneity and put on MR pictures to demonstrate the consistent quality, ad-equacy, and strength in the calculation

Flash memory miles wished for caching records, in particular in dynamic solutions that could parsestatistics as either warm (frequently accessed statistics) or cold (hardly ever accessed files)

A value-based database is a database administration machine that has the usefulness to move, bringdown, back, or fix a database exchange or task on the off chance that is not continuously complete.They might be outfitted with continuous examination and a quicker reaction for making choicesFIG 2.7

Comparative study of methods giving different segmentation result

Ngày đăng: 30/08/2021, 09:15

TỪ KHÓA LIÊN QUAN

TRÍCH ĐOẠN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN