1. Trang chủ
  2. » Giáo án - Bài giảng

Luận án tiến sĩ nâng cao hiệu quả truyền dữ liệu trong mạng cảm biến không dây dựa trên tương quan dữ liệu

140 108 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 140
Dung lượng 3,16 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

CSMA Carrier Sense Multiple Access ECODA Entropy COrrelation clustering for Data Aggregation LEACH-C Low Energy Adaptive Clustering Hierarchy- Centralized MEMS Micro Electro Mechanical S

Trang 1

MINISTRY OF EDUCATION AND TRAINING

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

Nguyen Thi Thanh Nga

EFFICIENT DATA COMMUNICATION FOR WIRELESS SENSOR NETWORK

BASED ON DATA CORRELATION

Major: Computer Engineering Code No.: 9480106

COMPUTER ENGINEERING DISSERTATION

SUPERVISORS:

1 Dr Nguyen Kim Khanh

2 Assoc Prof Ngo Hong Son

Hanoi - 2018

Trang 2

COMMITMENT

I assure that this is my own research All the data and results in the thesis are completely true, were agreed to use in this thesis by co-authors This research hasn’t been published by other authors than me

Hanoi, 17th Decemberber 2018

Dr Nguyen Kim Khanh Nguyen Thi Thanh Nga

Assoc Prof Ngo Hong Son

Trang 3

3

ACKNOWLEDGMENTS

This Ph.D thesis has been carried out at the Department of Computer Engineering, School of Information and Communication Technology, Hanoi University of Science and Technology The research has been completed under supervisions of Dr Nguyen Kim Khanh and Associate Prof Dr Ngo Hong Son

Firstly, I would like to express my sincere gratitude to my advisors Dr Nguyen Kim Khanh and Associate Prof Dr Ngo Hong Son for their continuous support of

my Ph.D study and related research, for their patience, motivation, and immense knowledge Their valuable guidance, unceasing encouragement and supports have helped me during all the time of research and writing out of this thesis

Besides my advisors, I would like to thank all my colleagues in the Department

of Computer Engineering for their insightful comments, encouragement and for the hard questions which incented me to widen my research from various perspectives I would like to express my appreciation to Prof Dr Trinh Van Loan for his time and patient helping me to correct the whole thesis as well as value comments during the process of pursuing my doctorate degree

I want to thank all my colleagues in the School of Information and Communication Technology, for their supports and helps in my work

I gratefully acknowledge the receipt of grants from 911 project of Ministry of Education and Training which enabled me to carry out this research

Finally, I would like to thank my family, my sisters, my father and mother, my husband and two children for their endless love, encouraging and unconditional supporting me continuously and throughout writing this thesis

Nguyen Thi Thanh Nga

Trang 4

TABLE OF CONTENT

COMMITMENT 2

ACKNOWLEDGMENTS 3

TABLE OF CONTENT 4

LIST OF ABBREVIATIONS 7

LIST OF FIGURES 8

LIST OF TABLES 11

PREFACE 13

1 INTRODUCTION 16

Overviews 16

Energy conservation in WSNs 19

1.2.1 Radio optimization 19

1.2.2 Sleep/wake-up schemes 20

1.2.3 Energy efficient routing 20

1.2.4 Data reduction 21

1.2.5 Charging solution 22

Data correlation and energy conservation in WSNs 23

Problem statements and contributions 24

2 CORRELATION IN WIRELESS SENSOR NETWORK 25

Correlation model survey 25

Information entropy theory 31

2.2.1 Overview 31

2.2.2 Entropy concept 32

2.2.3 Joint entropy 32

Correlation and entropy 33

2.3.1 Correlation of two variables 33

2.3.1.1 Mutual information 33

2.3.1.2 Entropy correlation coefficient 34

Trang 5

5

2.3.2 Correlation of more than two variables 36

Conclusions 38

3 ENTROPY-BASED CORRELATION CLUSTERING 39

Joint entropy estimation 39

3.1.1 Determining the upper bound of joint entropy 39

3.1.2 Determining the lower bound of joint entropy 42

3.1.3 Validating entropy estimation 44

Correlation region and correlation clustering algorithm 47

3.2.1 Estimated joint entropy and correlation 47

3.2.2 Correlation region definition 50

3.2.3 Correlation clustering algorithm 52

3.2.4 Validation 54

Conclusions 56

4 ENTROPY CORRELATION BASED DATA AGGREGATIONS 57

Compression aggregation 57

4.1.1 Comparison of compression schemes 57

4.1.2 Compression based routing scheme in a correlated region 60

4.1.2.1 1-D analysis 61

4.1.2.2 2-D analysis 65

4.1.2.3 General topology model analysis 69

4.1.3 Optimal routing scheme in correlation networks 71

Representative aggregation 72

4.2.1 Distortion function 72

4.2.2 Number of representative nodes 73

4.2.3 Representative node selection 76

4.2.4 Practical validation 77

Conclusions 80

Trang 6

5 ENTROPY CORRELATION BASED DATA AGGREGATION

PROTOCOL (ECODA) 82

Network model 82

Radio model 83

Outline of ECODA 84

5.3.1 Set-up phase 85

5.3.2 Steady-state phase 87

Performance evaluation 87

5.4.1 Simulation models 87

5.4.1.1 Simulation parameters 88

5.4.1.2 Simulation setups 89

5.4.1.3 Dissipated energy calculation 90

5.4.2 Simulation results and discussions 92

5.4.2.1 Compression aggregation-based routing protocol 92

5.4.2.2 Representative aggregation-based routing protocol 97

5.4.3 Evaluations and comparison 100

5.4.3.1 The case of ECODA with compression aggregation 101

5.4.3.2 The case of ECODA with representative aggregation 106

Conclusions 107

6 CONCLUSIONS AND FUTURE STUDY 109

Summary of Contributions 109

Limitations 110

Future work 111

PUBLICATION LIST 112

REFERENCES 113

APPENDIX 125

Trang 7

CSMA Carrier Sense Multiple Access

ECODA Entropy COrrelation clustering for Data Aggregation

LEACH-C Low Energy Adaptive Clustering Hierarchy- Centralized

MEMS Micro Electro Mechanical Systems

RSSI Received Signal Strength Indication

TDMA Time Division Multiple Access

VLSI Very Large-Scale Integration

WSN(s) Wireless Sensor Network(s)

Trang 8

LIST OF FIGURES

Figure 1.1 Wireless Sensor Network 16Figure 1.2 Wireless Sensor Network Applications 17Figure 2.1 The layout of sensor nodes in an environment with two different conditions area 30Figure 2.2 The relations between entropies, joint entropy, and mutual information 33Figure 2.3 Relation between correlation and joint entropy 37Figure 3.1 Joint entropy calculation principle 42Figure 3.2 Sensor layout in Intel Berkeley Research Lab 45Figure 3.3 Practical, upper bound and lower bound joint entropy (JE) of subsets of the dataset 1 46Figure 3.4 Estimated joint entropy with different values of entropy correlation coefficients using upper bound function (with Hmax = 2[bits]) 48Figure 3.5 Estimated joint entropy (by upper bound) and practical joint entropy of dataset 1 49Figure 3.6 Correlation-based clustering algorithm 52Figure 3.7 Temperature data measured at 11 nodes in the dataset 1 53Figure 3.8 Derivative of estimated joint entropy and calculated the joint entropy of the selected group 55Figure 4.1 Routing paths for three schemes: (a) DSC, (b) RDC, and (c) CDR [122] 59Figure 4.2 Energy consumptions for the DSC, RDC and CDR schemes respectively

to entropy correlation coefficients 60Figure 4.3 Routing pattern of 1-D network 61Figure 4.4 Total bit-hop cost Es that corresponds to cluster size with different values

of entropy correlation coefficient in the case of 1-D with compression along SPT to the cluster head 63Figure 4.5 Total bit-hop cost Es that corresponds to cluster size with different values

of entropy correlation coefficient in the case of 1-D with compression at the cluster head only 64Figure 4.6 Routing pattern of the 2-D network [122] 65

Trang 9

9

Figure 4.7 Total bit-hop cost Es that corresponds to cluster size with different values

of entropy correlation coefficient in the case of 2-D with compression along

SPT to the cluster head 67

Figure 4.8 Total bit-hop cost Es that corresponds to cluster size with different values of entropy correlation coefficient in the case of 2-D with compression at the cluster head only 68

Figure 4.9 Illustration of clustering for a general topology model 69

Figure 4.10 Total transmission cost that corresponds to cluster size with different values of entropy correlation coefficient with compression along SPT to the cluster head 70

Figure 4.11 Total transmission cost respectively to cluster size with different values of entropy correlation coefficient with compression at the cluster head only 71 Figure 4.12 The relation between distortion and the number of representative nodes with N = 10 74

Figure 4.13 The relation between distortion and the number of representative nodes with N = 15 74

Figure 4.14 The relation between distortion and the number of representative nodes with N = 20 75

Figure 4.15 Maximizing obtained information based representative node selection algorithm 77

Figure 5.1 Radio energy dissipation model 83

Figure 5.2 Time scheduling for one round 85

Figure 5.3 Sensor node distribution in the 200mx200m sensing area 88

Figure 5.4 Routing path of compression-based routing protocol 89

Figure 5.5 Total energy in each round in case of compression along SPT to the CH 93

Figure 5.6 Number of alive nodes in each round in case of compression along SPT to the CH 94

Figure 5.7 Total energy in each round in case of compression at the CH only 96

Figure 5.8 Number of alive nodes in each round in case of compression at the CH only 97

Trang 10

Figure 5.9 Total energy in each round in case of representative aggregation with compression with 16 correlation clusters 98Figure 5.10 Number of alive nodes in each round in case of representative aggregation with compression with 16 correlation clusters 98Figure 5.11 Total energy in each round in the case of representative aggregation without compression with 16 correlation clusters 99Figure 5.12 Number of alive nodes in each round in the case of representative aggregation without compression with 16 correlation clusters 100Figure 5.13 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 16 correlation clusters 101Figure 5.14 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 16 correlation clusters 102Figure 5.15 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 8 correlation clusters 102Figure 5.16 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 8 correlation clusters 103Figure 5.17 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 4 correlation clusters 104Figure 5.18 Total energy comparison distance-based protocol and ECODA with compression aggregation in the case of 4 correlation clusters 105Figure 5.19 Total energy comparison between distance-based protocol and ECODA with representative aggregation in the case of 16 correlation clusters 106Figure 5.20 Number of alive nodes comparison between distance-based protocol and ECODA with representative aggregation in the case of 16 correlation clusters 107

Trang 11

11

LIST OF TABLES

Table 3.1 Node’s entropy of the dataset 1 46

Table 3.2 Entropy correlation coefficient of each pair from the dataset 1 47

Table 3.3 Practical, upper bound and lower bound joint entropy (JE) of subsets of the dataset 1 49

Table 3.4 Clustering results of 48 nodes 53

Table 4.1 Number of representative nodes with distortion D = 0.05 76

Table 4.2 Number of representative nodes with distortion D = 0.1 76

Table 4.3 Number of representative nodes with distortion D = 0.15 76

Table 4.4 Selection of representative nodes and the actual distortion based on theoretical calculation (dataset 1 with N = 11 nodes) 78

Table 4.5 Selection of representative nodes and the actual distortion based on practical calculation (dataset 1 with N = 11 nodes) 78

Table 4.6 Entropy values of 10 nodes in the correlation region (dataset 2 with N = 10 nodes) 78

Table 4.7 Selection of representative nodes and the actual distortion based on theoretical calculation (dataset 2 with N = 10 nodes) 79

Table 4.8 Selection of representative nodes and the actual distortion based on practical calculation (dataset 2 with N = 10 nodes) 80

Table 5.1 Simulation parameters 88

Table 5.2 Simulation results in case of compression along SPT to the CH 94

Table 5.3 Simulation results in case of compression at the CH only 95

Table 5.4 Simulation results in the case of representative aggregation with compression at the CH 97

Table 5.5 Simulation results in the case of representative aggregation without compression at the CH 100

Table 5.6 Comparison between distance-based protocol and ECODA with compression aggregation in the case of 16 correlation clusters 103

Table 5.7 Comparison between distance-based in the case of 8 correlation clusters 104

Trang 12

Table 5.8 Comparison between distance-based protocol and ECODA with compression aggregation in the case of 4 correlation clusters 105Table 5.9 Comparison between distance-based protocol and ECODA with representative aggregation in the case of 16 correlation clusters 106

Trang 13

13

PREFACE

Wireless Sensor Network (WSN) is the collection of sensor nodes which cooperatively monitor surrounding phenomena over large physical areas The advances in the integration of micro-electro-mechanical systems and digital electronics with the development of wireless communications have enabled the wide deployment of WSNs Sensor nodes in WSNs have been equipped with various sensing capabilities in space and time and higher processing capacities can satisfy requests from various modern applications Because of low-cost, small-in-size and no-replace battery powered characteristics of sensor nodes, energy conservation is commonly recognized as the key challenge in designing and operating the networks

In typical WSNs applications, sensors are required for spatially dense deployment to achieve satisfactory coverage As a result, multiple sensors will record information about a single event in the sensing field, i.e sensed data are correlated with each other The existence of correlation characteristic can bring many significant potential advantages for the development of efficient communication protocols well-suited to the WSNs paradigm For example, due to the correlation degree, data in a correlated region can be compressed with a high ratio to reduce the amount of sent data for saving dissipated energy Even with high enough correlation, it may not be necessary for every sensor node in a correlation group to transmit its data to the base station Instead, a smaller number of sensor measurements (representation) might be adequate to communicate the event features to the base station within a certain reliability/fidelity level

From this point of view, various researches have focused on discovering and exploiting the correlation of sensed data in WSNs At the beginning of these researches, the traditional probability and statistic theory have been used to describe the correlation among data Nevertheless, these approaches limited the correlation as

a linear relation that may not appropriate for general, nonlinear cases in practice Therefore, the information entropy approach has been considered to obtain the generality However, most of the research approach, using traditional probability - statistic theory or information entropy theory, considered the correlation as a distance-dependence feature In general, the correlation of data may be independent

of external factors such as sensor location and environmental conditions and thus, so

it is better to concentrate on the information contained in the data itself rather than considering only attribute meta-data such as location and time

This thesis concentrates to discover and exploit the general correlation in WSNs using information entropy theory to look at the sensed data itself At first, a

Trang 14

novel distance-independence entropy-based correlation model for describing correlation characteristics in a wireless sensor network is proposed From this entropy correlation model, an energy efficient routing protocol with correlation-based data aggregation will be developed

To discover the correlation property, at first, an estimation of joint entropy for

a data group is established From this estimation, a definition of the correlation group

is proposed and then the correlation model that is used to calculate the joint entropy

of the correlation data group is developed To exploit the correlation characteristic, two main data aggregation schemes are analyzed and evaluated using the proposed correlation model At the end, these schemes are used to develop data aggregation routing protocols Using the proposed routing protocols, the transferred data in the network is reduced so that the dissipated energy is decreased

The thesis structure is as follows:

Chapter 1: Introduction

This chapter reviews the introduction of WSNs, energy conservation schemes, and data correlation problems The main contributions of the thesis are also presented shortly in this chapter

Chapter 2: Correlation in Wireless Sensor Network

This chapter presents the survey of correlation model in WSNs and the correlation through the point of view of information entropy Then, the idea to establish a new correlation model is described

Chapter 3: Entropy-based Correlation Clustering

Based on the analyzed factors in chapter 2, we propose the approximated estimation of joint entropy From this approximation method, we define the correlation region and propose the correlation clustering scheme We also verify the validation of the proposed estimation and correlation clustering scheme in this chapter

Chapter 4: Entropy-based Data Aggregations

In this chapter, we exploit the advantages of using data correlation by data aggregation using entropy correlation including entropy-based representative aggregation and entropy-based data compression

In entropy-based representative aggregation, the distortion of data in the group while some nodes are put into sleep state is evaluated using the proposed correlation

Trang 15

Chapter 5: Entropy Correlation based Data Aggregation Protocol (ECODA)

In this chapter, we outline an Entropy COrrelation-based Data Aggregation protocol (ECODA) using the proposed clustering scheme in chapter 3 and data aggregation schemes in chapter 4 The simulations have also been done to validate the effectiveness of the proposed clustering and aggregating schemes

Chapter 6: Conclusions and Further study

This chapter concludes the results of the thesis with careful evaluations and points out the remained problems that are the future works

Trang 16

1 INTRODUCTION

Overviews

People always want to know more about the physical world around, so that they can have a better understanding of the surrounding environment Therefore, they try to collect the environment’s information as much details as possible Sensor nodes are used to link the physical to the digital world by capturing and revealing real-world phenomena and converting these into a form that can be processed, stored, and acted upon By integrating sensors into numerous devices, machines, and environments, a tremendous societal benefit can be provided such as avoiding catastrophic infrastructure failures, conserving precious natural resources, increasing productivity, enhancing security, and enabling new applications such as context-aware systems and smart technologies The advances in technologies such as very large-scale integration (VLSI), microelectromechanical systems (MEMS), and wireless communications, that make sensors become tinier, low-power, inexpensive, further contribute to the widespread use of distributed sensor systems such as wireless sensor networks

Figure 1.1 Wireless Sensor Network 1

CHAPTER 1

Trang 17

17

Wireless Sensor Network (WSN), is the collection of sensor nodes which cooperatively monitor surrounding phenomena over large physical areas [1]–[4] These sensor nodes can sense, observe or measure, gather information from the environment and transmit the sensed data to the user based on some local decision process A typical sensor node is composed of a sensing unit which is equipped with one or more sensors, a processing unit, a power unit, and a transceiver unit The sensing unit could have various sensors such as thermal, biological, chemical, optical, and magnetic to measure properties of the environment A sensor node acquires data through the sensing unit, processes sensed data by the processing unit and finally transmits processed data using the transceiver unit Because of the limitations of memory capabilities, sensor nodes should be implemented by wireless communication to transfer the data to a base station, allowing them to disseminate their sensor data to remote processing, visualization, analysis, and storage systems

Figure 1.2 Wireless Sensor Network Applications 2

2

https://www.researchgate.net/publication/220505150_Energy_Saving_Mechanisms_for_MAC_Protocols_in_ Wireless_Sensor_Networks/figures?lo=1

Trang 18

There are five types of WSNs: terrestrial WSN, underground WSN, underwater WSN, multi-media WSN, and mobile WSN [3] In terrestrial WSNs [1], there are hundreds to thousands of inexpensive wireless sensor nodes deployed in a given area, either in an ad hoc or in a pre-planned manner Reliable communication

in a dense environment is very important in this WSN type Battery power is limited and may not be rechargeable in terrestrial sensor nodes, however, they can be equipped with a secondary power source such as solar cells In a terrestrial WSN, energy can be conserved with multi-hop optimal routing, short transmission range, in-network data aggregation, eliminating data redundancy, minimizing delays, and using low duty-cycle operations

In underground WSNs [5], sensor nodes are buried underground or in a cave

or mine used to monitor underground conditions An underground WSN is more expensive than a terrestrial WSN in terms of equipment, deployment, and maintenance In addition, the operation of wireless communication is more difficult

in the underground environment due to signal losses and high levels of attenuation

Opposite to a dense deployment of sensor nodes in a terrestrial WSN, underwater WSNs [6] consist of sensor nodes and vehicles deployed underwater Because of their special working environment, underwater sensor nodes are more expensive and fewer sensor nodes are deployed, in comparison with terrestrial WSNs Autonomous underwater vehicles are used for exploration or gathering data from sensor nodes Underwater wireless communications are typically established through transmission of acoustic waves with limited bandwidth, long propagation delay, and signal fading issue In addition, underwater sensor nodes must be able to self-configure and adapt to the harsh ocean environment

Multi-media WSNs [7] have been developed to enable the monitoring and tracking of events using multimedia such as video, audio, and imaging Multi-media WSNs consist of various low-cost sensor nodes equipped with cameras and microphones They are usually deployed in a pre-planned manner into the environment to guarantee coverage Multi-media sensor nodes interconnect with each other over a wireless connection for data retrieval, process, correlation, and compression Because of high data transmission, challenges in multi-media WSN include high bandwidth demand, high energy consumption, quality of service (QoS) provisioning, data processing and compressing techniques, and cross-layer design

Mobile WSNs [8] [9] consist of a collection of sensor nodes that can move on their own and interact with the physical environment Same as in static WSNs, nodes

Trang 19

19

mobile nodes can reposition and organize itself in the network This mobility characteristic requires dynamic routing in a mobile WSN Challenges in mobile WSN include deployment, localization, self-organization, navigation and control, coverage, energy, maintenance, and data process

The above described features of WSNs ensure great potential for many applications [10]–[14] The development of WSNs was motivated by military applications [15]–[19] and then were widely used in various fields such as industrial monitoring [20]–[25], environment monitoring [26]–[33], agriculture [34]–[37], forest fire detection [38]–[40], animal tracking [41] [42], healthcare [43]–[50], security [51]–[53], home automation [54] [55], power utility’s distribution [56], logistics [57], intelligent traffic systems [58], etc

In Vietnam, studies on WSNs have been considered in the last two decades The most attracted topics are energy saving and load balancing in WSNs, in consideration of base station position [59], delay constrained [60], 3D WSN [61], WSNs with holes [62], k-means clustering [63] The applications of WSNs are also widely considered such as landside monitoring [64], smart grid [65], target tracking [66], logistics [57], and healthcare monitoring [67]

Energy conservation in WSNs

In most cases, energy for activities in WSNs comes from a limited battery supply However, in many applications, it is very hard or impossible to recharge the batteries due to the deployment of the nodes because of the difficulties and hostile terrain or due to a large number of nodes deployed in the environment [68] [69] For those reasons, energy conservation is commonly recognized as the key challenge to designing and operating the network in WSNs, because individual sensor nodes are expected to be low-cost, small-in-size, and powered by a non-replaceable battery

In recent years, numerous energy-saving approaches have been proposed in

[70] [71] They can mainly be classified into five categories including radio optimization, data reduction, sleep/wakeup schemes, energy-efficient routing and charging solutions The next section will present these five categories of energy-

saving approaches

1.2.1 Radio optimization

In radio optimization, to save energy, radio parameters such as coding and modulation schemes, power transmission and antenna direction are optimized Radio optimization approaches can further be divided into 4 schemes including modulation

Trang 20

optimization, cooperative communication, transmission power control, and a directional antenna

Modulation optimization tries to optimize the modulation parameters that results in minimum radio energy consumption The good trade-off between the constellation size, the information rate, the transmission time, the distance between nodes and the noise are considered [72] [73]

Cooperative communication schemes try to improve the quality of the received signal by collaborating several single antenna devices to create a virtual multi-antenna transmitter [74] [75]

Transmission power control schemes enhance energy efficiency at the physical layer by adjusting radio transmission power The idea is that a lower communication range between nodes requires less power from radio [76] [77] Another idea is that a node with higher remaining energy may increase its transmission power, which enables other nodes to decrease their transmission power [78]

Directional antenna schemes allow the signal to be sent and received in one direction at a time that allows the improvement of transmission range and throughput [79] [80] To take advantage of directional antennas, new MAC protocols have been proposed in [81] [82] In addition, some specific problems also have to be considered

in [83]

1.2.2 Sleep/wake-up schemes

Sleep/wake-up schemes try to adapt node activity to save energy by putting the radio in sleep mode The main idea of this approach is the duty cycling scheme Duty cycling scheme schedules the node radio state according to network activity to minimize idle listening and favor the sleep mode They are the most energy-efficient but suffer from sleep latency In some cases, it is not possible to broadcast information

to all its neighbors because of unsimultaneously active In addition, some fixing parameters such as listening/sleeping period, preamble length, and slot time are strictly issues because of system performance The detailed survey of duty cycling can be found in [84]

1.2.3 Energy efficient routing

Routing is also a burden that makes seriously drain energy reserves In general, there are various routing paradigms In this research area, some main paradigms are considered such as cluster architecture, energy as a routing metric, multipath routing,

Trang 21

Energy can also be considered as a metric in the setup path phase to extend the lifetime of sensor networks In this case, routing algorithms not only focus on the shortest paths but also can select the next hop based on its residual energy [88]

Multipath routing, in general, is more complex than single-path routing But single-path routing can rapidly drain the energy of nodes on the selected path Multipath routing can balance the energy among nodes by alternating forwarding nodes [89] [90] More surveys on multipath routing protocols can be found in [91]

The premature depletion of nodes in each region can create energy holes or partition the network This situation can be avoided by optimizing node placements

or adding some relay nodes with enhanced capabilities This helps to improve energy balance, avoid sensor hot-spots, ensuring coverage [92]–[94]

1.2.4 Data reduction

Energy consumption depends on data transmission Thus, reducing the amount

of data to be delivered can save energy Data reduction approaches can be divided into three types: data aggregation, adaptive sampling, and network coding

Data aggregation techniques involve different ways of routing data packets to combine them by exploiting the extracted features and statistics of datasets coming from different sensor nodes There are several aggregation techniques with different aggregation functions and for different specific application requirements The first type of aggregation function is to extract the maximum, minimum or averaged value

of aggregated data [95] [96] In this way, it can reduce the amount of communicating data in the networks which affect the power consumption However, this technique can lose much of the original structure in the extracted data

The second type of aggregation technique is data compression Data compression techniques are further divided into distributed data compression [97]

Trang 22

[98] and local data compression [99] [100] The distributed data compression techniques are the most optimal compression However, it is much more complicated than local data compression that is with smaller compression rate The detailed survey

of data compression in WSN can be found in [101] It is important to note that the data compression techniques are only effective with correlation data Therefore, the correlation is usually required when using these techniques

The third type of aggregation technique is representative type [102] in which some nodes are chosen to be the representative of a group of nodes The other nodes

in the group can be put to sleep to save energy The number of sleep nodes that affects the power consumption is decided by specified distortion Same as data compression, these techniques required data in correlation

Adaptive sampling techniques adjust the sampling rate at each sensor while ensuring that application needs are met in terms of coverage of information precision

by exploiting spatial-temporal correlations between data By reducing the number of samples, the amount of transmitted data is reduced thus save the node energy The temporal analysis of sensed data is used in [103] and spatial correlation is used in [104] More details about adaptive sampling can be found in [105]

Network coding is used to reduce the traffic in broadcast scenarios by sending

a linear combination of several data instead of a copy of each data At the destination nodes, data can be decoded by solving the linear equations [84] [106] Network coding exploits the trade-off between computation and communication since communications are slow compared to computations and more energy consumption

1.2.5 Charging solution

Several recent types of research address energy harvesting and wireless charging techniques for WSNs as promising solutions because of recharge capability without human intervention

Energy harvesting techniques have been developed to enable the sensors to harvest energy from their surrounding environment such as solar, wind or kinetic energy [107] Energy harvesting schemes often require energy prediction to manage the available power efficiently It is important to note that because of the limitation

of remain energy between two harvesting opportunities, the energy saving mechanisms are still necessary to implement

The breakthrough in wireless power transfer is expected to enable the wireless charging capability for WSNs Wireless charging can be done in two ways:

Trang 23

23

omnidirectional electromagnetic radiation technology is only applicable to ultra-low power requirement and low sensing activities [108] The reason is electromagnetic waves suffer from the rapid drop in power efficiency over distance, and active radiation technology may pose safety concerns to humans In contrast, magnetic resonance coupling appears to be the most promising technique with higher efficiency and safer However, the charging range is still a big concern [108]

Data correlation and energy conservation in WSNs

In typical WSNs applications, sensors are required for spatially dense deployment to achieve satisfactory coverage [1] Consequently, multiple sensors will record information about a single event in the sensing field, i.e these sensed data strongly depends on each other For example, temperature sensors in the same room record the same temperature information, or several cameras that monitor the same area record many frames with similar information In another word, they are correlated with each other The existence of correlation characteristic can bring many significant potential advantages for the development of efficient communication protocols well-suited to the WSNs paradigm For example, due to the correlation degree, data in a correlated region can be compressed with a high ratio, thus the amount of sent data is reduced [109] Even with high enough correlation, it may not

be necessary for every sensor node in a correlation group to transmit its data to the base station; instead, a smaller number of sensor measurements might be adequate to communicate the event features to the base station within a certain reliability/fidelity level [110]

In addition, in WSNs, the power breakdown heavily depends on the specific node However, the following remarks generally hold [109] [111]

• The radio energy consumption is of the same order of magnitude in the reception, transmission, and idle states, while the power consumption drops of at least one order of magnitude in the sleep state Therefore, the radio should be put to sleep (or turned off) whenever possible

• The communication activity has an energy consumption much higher than the computation activity It has been shown that transmitting one bit may consume as much as executing a few thousand instructions [112] Therefore, communication should be traded for computation

Data correlation can allow us to reduce the data transferring, or even to put some sensor nodes to sleep Thus, it can make WSNs conserve energy significantly

Trang 24

Problem statements and contributions

The main problems in this research are “How to recognize the correlation among dataset by looking at data itself and how to exploit the correlation characteristic for energy conservation in WSNs” In this research, we focus on

WSNs working in high correlation environment A high correlation environment can

be divided into groups called correlation regions where measured data strongly depends on each other By clustering sensor nodes into correlation regions, data aggregation can be done to conserve the energy in WSNs In this paper, we focus on two data aggregation schemes including data compression and representative aggregation The main contributions of the thesis are:

Developing an entropy correlation clustering algorithm and entropy correlation model to describe the correlation characteristics of a correlation cluster

This algorithm can divide a correlation environment into several correlation regions using the entropy values of measured data and the entropy correlation coefficients of measured data pairs in the environment At the same time, this algorithm uses only the data itself and does not depend on the distance information The correlation model describes the relationship between the joint entropy of a dataset and the number of data series in the dataset, in consideration of data’s entropy correlation coefficient

Analyzing and evaluating the impact of the correlation characteristic to data aggregation schemes

With the proposed correlation clustering and model, it is necessary to evaluate their impact on data aggregation schemes With data compression aggregation, several compression schemes and network structures are considered to find the most appropriate compression routing for WSNs With representative aggregation, a distortion function that measures the required ratio of data loss is used The number

of representative nodes is then evaluated, and the representative node selection algorithm is proposed

Developing an entropy correlation-based data aggregation protocol for WSNs

to exploit the correlation characteristic of the sensed environment

The developed protocol includes two phases, one phase is for data collection

to identify correlation characteristic, the other phase is for data aggregation implementation For this protocol, the proposed clustering algorithm and data aggregation schemes are used In addition, the design of the protocol is proposed The

Trang 25

25

2 CORRELATION IN WIRELESS SENSOR NETWORK

As mentioned in chapter 1, correlation characteristic has many significant potential advantages for the development of energy-efficient communication protocols for WSNs To evaluate and exploit the correlation characteristic, it is necessary to build

a correlation model This chapter concentrates on the survey of the existing correlation models From the advantages and limitations of the previous correlation model, the approaching methodology of developing a new correlation model will be pointed out

Correlation model survey

Correlation is represented for the relationships between quantitative variables or categorical variables In other words, it’s a measure of how things are related Data correlation is a measure of how data is related to each other

To exploit the correlation in WSNs, it is necessary to recognize the correlation among data in the network by establishing correlation models There have been many research efforts to study the correlation model in WSNs In [111], correlated nodes are supposed to observe the same source 𝑆, and the observed data 𝑋𝑖(𝑡) at the i th node

is the sum of a correlated version of the source 𝑆𝑖(𝑡) and observed noise 𝑁𝑖(𝑡)

𝑋𝑖(𝑡) = 𝑆𝑖(𝑡) + 𝑁𝑖(𝑡) (2.1)

The correlation model is the covariance function 𝐾𝜗 (correlation coefficient 𝜌) that is chosen to be distance dependence and can be classified into four groups including:

Spherical:

𝐾𝜗 = {1 −

32

Trang 26

as described in [112]–[114]

Some papers also build correlation model in which the correlation coefficient

is a function of distance among nodes [115] [116] In [115], the compression rate is calculated based on the correlation coefficient 𝜌 between two nodes and the correlation coefficient is defined to be inversely proportional to their Euclidean distance 𝑑

Magnitude 𝑚 -dissimilarity: Two-time series 𝑋 {𝑥1, 𝑥2, … , 𝑥𝑞} and

𝑌 {𝑦1, 𝑦2, … , 𝑦𝑞} are magnitude 𝑚-dissimilarity if there is an 𝑖 (1 ≤ 𝑖 ≤ 𝑞) such that

Trang 27

𝑑(𝑣𝑖, 𝑣𝑗) = ∑|𝑥𝑘 − 𝑦𝑘|.

𝑞

𝑘=1

(2.9)

The smaller the Manhattan distance is, the more similarity between the two vectors

is Manhattan distance also is used to define the dissimilarity in [119]

Some research efforts define the correlation model in different ways such as a linear predictive model [120], node weight [121], data density correlation degree [102] In [120], a set of sensor nodes is a correlation set if a reading at a node can be predicted using a linear combination of readings from the other nodes Let 𝑆 ={𝑠1, 𝑠2, … , 𝑠𝐿} is a set of sensor nodes Then, the predicted value of a node 𝑠, 𝑠′[𝑘], can be presented as a linear combination of 𝑠1[𝑘], 𝑠2[𝑘], … , 𝑠𝐿[𝑘] for all 𝑘:

Trang 28

𝐸 = ∑(𝑠[𝑘] − 𝑠′[𝑘])2.

𝐾

𝑘=1

(2.11)

Weighting coefficients are determined such that 𝐸 is minimized

In [121], the correlation of a node with its neighbors is evaluated using correlated weight The definition of the Spatial Correlated Weight considers the average spatial distance deviation between each node and its neighbors within a predefined communication range For any node 𝑖, 𝑁(𝑖) is a set of its node neighbors Let 𝑗 be a neighbor node of node 𝑖, 𝑗 ∈ 𝑁(𝑖) The correlated weight 𝑤𝑖 of node 𝑖 is defined as:

In [102], in order to evaluate the correlation, the data density correlation degree of a node is used Let sensor node 𝑣 has 𝑛 neighboring sensor nodes within the cycle of the communication radius of 𝑣 Those neighboring sensor nodes are called 𝑣1, 𝑣2, … , 𝑣𝑛 The data object of 𝑣 is 𝐷, and its neighboring sensor nodes’ data

is 𝐷1, 𝐷2, … , 𝐷𝑛 respectively Among these 𝑛 data objects, there are 𝑁 data objects whose distances to 𝐷 are less than 𝜀 and 𝑚𝑖𝑛𝑃𝑡𝑠 ≤ 𝑁 ≤ 𝑛 Then the data density correlation degree of sensor node 𝑣 to the sensor nodes whose data objects are in 𝜀-neighborhood of 𝐷 is calculated as follows:

in which 𝑚𝑖𝑛𝑃𝑡𝑠 is the amount threshold; 𝜀 is the data threshold; 𝑑∆ is the distance between 𝐷 and the data center of the data objects which are in the 𝜀-neighborhood of

Trang 29

𝐻𝑛 = 𝐻𝑛−1+ [1 − 1

𝑑𝑖

𝑐 + 1] 𝐻1,

(2.14)

in which 𝑐 is a constant that characterizes the extent of spatial correlation in the data

In the simple case when all nodes are located on a line equally spaced by a distance 𝑑, the joint entropy 𝐻𝑛 of a set of 𝑛 nodes {𝑆1, 𝑆2, … , 𝑆𝑛} is calculated as:

𝐻𝑛 = 𝐻1+ (𝑛 − 1) [1 − 1

𝑑

𝑐 + 1] 𝐻1

Entropy-based correlation model also is used in [125] Entropy correlation coefficient is chosen to be the Pearson linear correlation coefficient to reduce the computation complexity but reduce the generality of using entropy

Trang 30

It can be seen that the correlation models in the above works are all based on the distance between nodes The smaller the distances between nodes are, the higher the correlation they are However, this assumption may be not always true because

of some physical barriers among nodes For example, in Figure 2.1, some sensor nodes are placed in two rooms next to each other, in which room 1 is equipped with

an air conditioner while room 2 is without an air conditioner Node A and B are placed close to each other, but they are in different rooms with independent conditions which causes their sensed data may be independent of each other The sensed data in node

A is correlated to node C because they are placed in the same room with the same conditions, despite their distance is larger than the distance between node A and B Therefore, it is necessary to establish a correlation model which is distance-independent to the positions of sensor nodes In addition, when observing the readings over time of 54 sensors deployed at Intel Berkeley Lab [127] [128], it is found out that correlation of data may be independent of external factors such as sensor location and environmental conditions Therefore, it is better to look at the information contained in the data itself rather than considering only attribute meta-data such as location and time [126]

Figure 2.1 The layout of sensor nodes in an environment with two different conditions area

To improve the above problem, in [129], entropy is calculated from real data and then the joint entropy 𝐻𝑛 of a set of 𝑛 nodes {𝑆1, 𝑆2, … , 𝑆𝑛} is approximated by a function in the set as follows:

in which 𝛼 and 𝛽 are constants determined from real data The advantage of this model is a distance-independent model, but the disadvantage is that this model can only be obtained when the correlation set has been established The calculation for determining the correlation group using this model is very complicated with huge

Trang 31

31

It can be found that most of the correlation models are distance-based that may not be true in some cases such as examples which are shown in [126] In this paper, authors found out that sensors in similar environmental conditions that are not necessarily spatially correlated can report correlated data and correlation of data may

be independent of external factors such as sensor location and environmental conditions Therefore, it is necessary to develop a model that is distance-independent and applicable practically for Wireless Sensor Networks

Correlation model can be established mainly by using traditional probability and statistic theory or by entropy information theory However, the correlation from the point of view of information entropy is more general, but more challenges With the purpose to find a novel and general correlation model, this thesis will use the information entropy theory to discover the correlation characteristic by looking at the data itself

Information entropy theory

2.2.1 Overview

Information entropy or Shannon’s entropy is a foundational concept of information theory Information entropy quantifies the amount of information in a variable, thus providing the foundation for a theory around the notion of information

At a conceptual level, information entropy is simply the "amount of information" in a variable More intuitively, that corresponds to the amount of storage (e.g number of bits) required to store the variable, which can be understood to correspond to the amount of information in that variable However, the calculation of this number of bit and therefore the amount of information in a variable is more involved than might appear at first sight It is not simply the number of bits required

to represent all the different values that a variable might take on, which is just the raw data For example, a variable may take on any of 8 different values In digital storage,

3 bits would be enough to uniquely represent the 8 different values, and thus the variable can be stored in 3 bits

However, this is an upper limit on the required storage; it is the amount of storage required to store the raw “data” of the variable, not the “information” in that data Less storage might be enough to store the information, depending on the process

by which the variable takes on different values For example, suppose a coin is completely biased and always comes up heads when tossed Then the random variable representing the coin toss's outcome has probability 1 of coming up heads (in other words, it is a constant) It is not necessary to store that variable as it can be trivially

Trang 32

guessed at any time Thus, the amount of information in that variable is zero On the other hand, if we have a perfect coin with half-half chances of coming up heads or tails upon a coin toss, then we can guess the outcome of a toss with only 50% accuracy (probability 0.5), so it is necessary to store the actual value of that coin toss outcome's random variable to know its value with better than 50% accuracy The amount of information in this second random variable is much higher than in the first case

In a more sophisticated representation of the variable, if a variable is easier to guess, then we can use that fact to reduce the number of bits needed to store that information If the value of the variable is easier to guess, the variable is less

“surprise” and contains less information Thus, an alternative way of considering entropy is as a measure of “compressibility” of the data, i.e., a compression metric that expresses how much the raw data of a variable can be compressed without losing the information in the variable

2.2.2 Entropy concept

In information theory, the entropy of a random variable is a function which attempts to characterize the “unpredictability” or “uncertainty" of a random variable [130] On the other word, the more uncertainty or unpredictability the event is, the more information it will contain and the larger the value of its entropy is

If a random variable 𝑋 takes on values in a set 𝑋 = {𝑥1, 𝑥2, … , 𝑥𝑛}, and is defined by a probability distribution 𝑃(𝑋), then the entropy 𝐻(𝑋) of the discrete random variable 𝑋 is written as:

Trang 33

33

𝐻(𝑋, 𝑌) = − ∑ ∑ 𝑃(𝑥, 𝑦) 𝑙𝑜𝑔2𝑃(𝑥, 𝑦)

𝑦∈𝑌 𝑥∈𝑋

in which 𝑥 and 𝑦 are particular values of 𝑋 and 𝑌

For more than two random variables 𝑋1, 𝑋2, … , 𝑋𝑛, the joint entropy expands to:

Correlation and entropy

2.3.1 Correlation of two variables

Correlation is a measure of the relation/dependence between variables In entropy theory [130], the relation between two variables can be described by mutual information and correlation coefficient concepts

2.3.1.1 Mutual information

Figure 2.2 The relations between entropies, joint entropy, and mutual information

In Figure 2.2, the relationship between random variable entropies, joint entropy, and mutual information are described The relation between entropy and joint entropy is shown by inequality (2.22) with equality if 𝑋 and 𝑌 are independent:

𝐼(𝑋, 𝑌)

𝐻(𝑋, 𝑌)

Trang 34

The above inequation shows that when the information covered by 𝑋 fully comprised 𝑌 in its content, the joint entropy of two random variables exactly equals

to the summation of the entropies of both variables On the other hand, the joint entropy of these two variables is always smaller than the total entropies of these two variables Knowing the joint entropy of random variables can tell us how much knowing some variables reduces uncertainty about the other The smaller the value

of joint entropy is, the higher the correlation of the random variables is

Another metric which is also used for measuring the mutual dependence between the two variables is mutual information If the entropy of a random variable

is used to measure information about the event itself, mutual information is a quantity that measures the relationship between two random variables which are sampled simultaneously

For example, 𝑋 represents the weather and 𝑌 represents the humidity on a day

in a specific city The value of 𝑋 tells us something about the value of 𝑌 and vice versa such as if the probability of the weather is rainy, then the probability of high humidity is certain That is, these variables share mutual information If 𝑋 represents the weather of a day, and 𝑌 represents the humidity of the same day, then the information of a rainy day can tell us something about the humidity of that day On the other hand, if 𝑌 represents the weather of a day, and 𝑍 represents the humidity of the other day, then 𝑋 and 𝑍 share no mutual information because the weather of one day does not contain any information about the humidity of the other day

In general, mutual information measures how much information is communicated, on average, in one random variable about another The mutual information between two variables is 0 if and only if the two variables are statistically independent The formal definition of the mutual information of two random variables 𝑋 and 𝑌, whose joint distribution is defined by 𝑃(𝑋, 𝑌) is given by:

𝑃(𝑥)𝑃(𝑦)

𝑦∈𝑌 𝑥∈𝑋

2.3.1.2 Entropy correlation coefficient

It is found that mutual information can be used to measure the correlation between two sets of data, the larger the value of mutual information of the two variables, the more the correlation between these two variables However, it is difficult to compare the correlation level between two pairs of random variables using mutual information or joint entropy, because their values depend on the entropy of

Trang 35

35

each individual data in the pair To overcome this problem, a normalized measure of mutual information called entropy correlation coefficient which was introduced in [124] and [131] is used to evaluate the correlation

The relation between mutual information and entropy is given by:

𝐼(𝑋, 𝑌) = 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌), (2.24)

or can be written as:

𝐼(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌) = 1 −

𝐻(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌) = 1 − 1 = 0

(2.27)

If two random variables are completely dependent on each other:

𝐻(𝑋, 𝑌) = H(X) = H(Y) = 1

2(𝐻(𝑋) + 𝐻(𝑌)) (2.28) Substitute (2.28) into (2.25), we can get:

𝐼(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌) = 1 −

𝐻(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌)= 1 −

1

2=

1

2 (2.29) From the above calculations, we get the inequality as:

0 ≤ 𝐼(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌)≤

Trang 36

𝜌(𝑋, 𝑌) = 2 − 2 𝐻(𝑋, 𝑌)

𝜌 receives a value in a range from 0 to 1 and specifies the correlation level between two random variables 𝑋 and 𝑌 When 𝜌 = 0 , the two random variables are independent to each other On the other hand, when 𝜌 = 1, the two random variables are similar to each other

The coefficient 𝜌(𝑋, 𝑌) is called the entropy correlation coefficient of the two random variables 𝑋 and 𝑌, in the relation with mutual information 𝐼(𝑋, 𝑌) or joint entropy 𝐻(𝑋, 𝑌) Entropy correlation coefficient presents the comparative relationship of a pair of data, independent to the value of individual entropy, and therefore it can be used to compare the correlation level of two pairs of data

The entropy correlation coefficient 𝜌 varies from 0 to 1, depending on the correlation between the two nodes The larger the value of 𝜌 is, the higher the correlation is If 𝜌 = 1 (in case 𝐻(𝑋) = 𝐻(𝑌) = 𝐻(𝑋, 𝑌)), two sets of data totally depend on each other If 𝜌 = 0 (in case 𝐻(𝑋, 𝑌) = 𝐻(𝑋) + 𝐻(𝑌) ), they are independent

2.3.2 Correlation of more than two variables

The information entropy theory only gives us the correlation evaluation of two variables For correlation of more than two variables, one can be extended from mutual information or entropy correlation coefficient concepts, but it is not the efficient way and even useless

When working with correlation, there are two requirements that must be figured out The first requirement is how to recognize the correlation between variables and the second one is how to evaluate the correlation level For correlation

of two variables, entropy correlation coefficient can be used for both recognition and evaluation of correlation For the correlation of more than two variables, there has not been any efficient way of using entropy theory directly to solve these two

Trang 37

in the group Joint entropy is always smaller than the total entropy of individual variables The more correlation among variables in the group is, the more difference between joint entropy and total entropy of individual variables is However, it is difficult to use the comparison between the joint entropy and total entropy Instead, the increasing of the joint entropy of a group when one variable is added into the group is considered If the added variable is highly correlated with variables in the group, i.e it strongly depends on the variables in the group, the increasing of the joint entropy of the group by adding the variable is small

Figure 2.3 Relation between correlation and joint entropy

In another word, a small amount of additional information is needed to specify the added variable Therefore, if we consider the relation of joint entropy value with the number of variables in a group, we can find that the increasing speed of joint entropy value will gradually be reduced and approach to zero In another word, the joint entropy value goes to approach the “saturation” state when the number of considered variables increases The nodes with higher correlation will approach the saturation state faster This phenomenon is described in Figure 2.3 and is discovered

Trang 38

by the authors of [129] The speed of going to “saturation” state can be specified to correlation level

However, it is difficult to use this joint entropy characteristic of correlation group to recognize a correlation group To find a correlation subgroup from a considered group, the relation of joint entropy and the number of variables of all possible subgroups must be established and checked Therefore, we need to find an efficient way to use the joint entropy characteristic of correlation group to recognize the correlation and evaluate the correlation level

Conclusions

In this chapter, the survey of the correlation model in WSNs along with several correlation model types has been done The survey results show that the models which use traditional probability and statistic theory only describe the linear correlation The information entropy-based correlation models can perform the general correlation Most of the correlation models are distance-dependent models However, it is necessary to investigate the correlation characteristic by concentrating on the data itself instead of using distance information When working with data itself, one can use the relationship between joint entropy and the number of considered variables to recognize the correlation characteristic However, among the above discussed models, there has not any efficient way to employ this relationship This problem will

be solved in the next chapter

Trang 39

3 ENTROPY-BASED CORRELATION CLUSTERING

In chapter 2, we have shown that correlation is related to the relation between joint entropy and the number of considered variables To determine whether a group

of nodes is correlated or not, it is necessary to know the entropy of each node and the joint entropy of all subgroups of the considered group However, the calculation of joint entropy for a group of more than two nodes is a waste of time with huge computation resources As a result, it is necessary to find a simple method to estimate joint entropy To solve this problem, in this chapter, we try to estimate the joint entropy of a node group from the entropy of individual nodes in the group and the entropy correlation coefficients of all pairs in the group From this estimation, the correlation characteristic can be recognized, and the correlation clustering can be established

Joint entropy estimation

Suppose there is a set of 𝑁 data values {𝑋1, 𝑋2, … , 𝑋𝑁} with the entropy of each data, 𝐻(𝑋𝑖), and the entropy correlation coefficient, 𝜌𝑖𝑗 = 𝜌(𝑋𝑖, 𝑋𝑗), in which any 1 ≤ 𝑖 ≠ 𝑗 ≤ 𝑁 satisfies the following conditions:

in which 𝐻𝑚𝑖𝑛 and 𝐻𝑚𝑎𝑥 are lower bound and upper bound of data’s entropy in the dataset, respectively; 𝜌𝑚𝑖𝑛 and 𝜌𝑚𝑎𝑥 are lower bound and upper bound of entropy correlation coefficient in the set, respectively The joint entropy is estimated based

on the idea of hierarchical clustering [132] as described in the following sections

3.1.1 Determining the upper bound of joint entropy

With a group that has only one node, we have the entropy of the node defined

Trang 40

The correlation coefficient between one cluster and another can be considered

as a parameter to measure the “correlation distance” between two clusters Thus, according to [124] [132], it can be obtained by the greatest/smallest/average correlation coefficient from any member of one cluster to any member of the other cluster Therefore:

𝜌(𝑋𝑖𝑗, 𝑋𝑘) = min{𝜌(𝑋𝑖, 𝑋𝑘), 𝜌(𝑋𝑗, 𝑋𝑘)}} (3.10)

In addition, 𝜌(𝑋𝑖, 𝑋𝑗) ≥ 𝜌𝑚𝑖𝑛 ∀𝑖 ≠ 𝑗; 1 𝑖, 𝑗  𝑁 , then 𝜌(𝑋𝑖𝑗, 𝑋𝑘) ≥ 𝜌𝑚𝑖𝑛

We have

Ngày đăng: 03/01/2019, 15:39

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w