CSMA Carrier Sense Multiple Access ECODA Entropy COrrelation clustering for Data Aggregation LEACH-C Low Energy Adaptive Clustering Hierarchy- Centralized MEMS Micro Electro Mechanical S
Trang 1MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
Nguyen Thi Thanh Nga
EFFICIENT DATA COMMUNICATION FOR WIRELESS SENSOR NETWORK
BASED ON DATA CORRELATION
Major: Computer Engineering Code No.: 9480106
COMPUTER ENGINEERING DISSERTATION
SUPERVISORS:
1 Dr Nguyen Kim Khanh
2 Assoc Prof Ngo Hong Son
Hanoi - 2018
Trang 2COMMITMENT
I assure that this is my own research All the data and results in the thesis are completely true, were agreed to use in this thesis by co-authors This research hasn’t been published by other authors than me
Hanoi, 17th Decemberber 2018
Dr Nguyen Kim Khanh Nguyen Thi Thanh Nga
Assoc Prof Ngo Hong Son
Trang 33
ACKNOWLEDGMENTS
This Ph.D thesis has been carried out at the Department of Computer Engineering, School of Information and Communication Technology, Hanoi University of Science and Technology The research has been completed under supervisions of Dr Nguyen Kim Khanh and Associate Prof Dr Ngo Hong Son
Firstly, I would like to express my sincere gratitude to my advisors Dr Nguyen Kim Khanh and Associate Prof Dr Ngo Hong Son for their continuous support of
my Ph.D study and related research, for their patience, motivation, and immense knowledge Their valuable guidance, unceasing encouragement and supports have helped me during all the time of research and writing out of this thesis
Besides my advisors, I would like to thank all my colleagues in the Department
of Computer Engineering for their insightful comments, encouragement and for the hard questions which incented me to widen my research from various perspectives I would like to express my appreciation to Prof Dr Trinh Van Loan for his time and patient helping me to correct the whole thesis as well as value comments during the process of pursuing my doctorate degree
I want to thank all my colleagues in the School of Information and Communication Technology, for their supports and helps in my work
I gratefully acknowledge the receipt of grants from 911 project of Ministry of Education and Training which enabled me to carry out this research
Finally, I would like to thank my family, my sisters, my father and mother, my husband and two children for their endless love, encouraging and unconditional supporting me continuously and throughout writing this thesis
Nguyen Thi Thanh Nga
Trang 4TABLE OF CONTENT
COMMITMENT 2
ACKNOWLEDGMENTS 3
TABLE OF CONTENT 4
LIST OF ABBREVIATIONS 7
LIST OF FIGURES 8
LIST OF TABLES 11
PREFACE 13
1 INTRODUCTION 16
Overviews 16
Energy conservation in WSNs 19
1.2.1 Radio optimization 19
1.2.2 Sleep/wake-up schemes 20
1.2.3 Energy efficient routing 20
1.2.4 Data reduction 21
1.2.5 Charging solution 22
Data correlation and energy conservation in WSNs 23
Problem statements and contributions 24
2 CORRELATION IN WIRELESS SENSOR NETWORK 25
Correlation model survey 25
Information entropy theory 31
2.2.1 Overview 31
2.2.2 Entropy concept 32
2.2.3 Joint entropy 32
Correlation and entropy 33
2.3.1 Correlation of two variables 33
2.3.1.1 Mutual information 33
2.3.1.2 Entropy correlation coefficient 34
Trang 55
2.3.2 Correlation of more than two variables 36
Conclusions 38
3 ENTROPY-BASED CORRELATION CLUSTERING 39
Joint entropy estimation 39
3.1.1 Determining the upper bound of joint entropy 39
3.1.2 Determining the lower bound of joint entropy 42
3.1.3 Validating entropy estimation 44
Correlation region and correlation clustering algorithm 47
3.2.1 Estimated joint entropy and correlation 47
3.2.2 Correlation region definition 50
3.2.3 Correlation clustering algorithm 52
3.2.4 Validation 54
Conclusions 56
4 ENTROPY CORRELATION BASED DATA AGGREGATIONS 57
Compression aggregation 57
4.1.1 Comparison of compression schemes 57
4.1.2 Compression based routing scheme in a correlated region 60
4.1.2.1 1-D analysis 61
4.1.2.2 2-D analysis 65
4.1.2.3 General topology model analysis 69
4.1.3 Optimal routing scheme in correlation networks 71
Representative aggregation 72
4.2.1 Distortion function 72
4.2.2 Number of representative nodes 73
4.2.3 Representative node selection 76
4.2.4 Practical validation 77
Conclusions 80
Trang 65 ENTROPY CORRELATION BASED DATA AGGREGATION
PROTOCOL (ECODA) 82
Network model 82
Radio model 83
Outline of ECODA 84
5.3.1 Set-up phase 85
5.3.2 Steady-state phase 87
Performance evaluation 87
5.4.1 Simulation models 87
5.4.1.1 Simulation parameters 88
5.4.1.2 Simulation setups 89
5.4.1.3 Dissipated energy calculation 90
5.4.2 Simulation results and discussions 92
5.4.2.1 Compression aggregation-based routing protocol 92
5.4.2.2 Representative aggregation-based routing protocol 97
5.4.3 Evaluations and comparison 100
5.4.3.1 The case of ECODA with compression aggregation 101
5.4.3.2 The case of ECODA with representative aggregation 106
Conclusions 107
6 CONCLUSIONS AND FUTURE STUDY 109
Summary of Contributions 109
Limitations 110
Future work 111
PUBLICATION LIST 112
REFERENCES 113
APPENDIX 125
Trang 7CSMA Carrier Sense Multiple Access
ECODA Entropy COrrelation clustering for Data Aggregation
LEACH-C Low Energy Adaptive Clustering Hierarchy- Centralized
MEMS Micro Electro Mechanical Systems
RSSI Received Signal Strength Indication
TDMA Time Division Multiple Access
VLSI Very Large-Scale Integration
WSN(s) Wireless Sensor Network(s)
Trang 8LIST OF FIGURES
Figure 1.1 Wireless Sensor Network 16Figure 1.2 Wireless Sensor Network Applications 17Figure 2.1 The layout of sensor nodes in an environment with two different conditions area 30Figure 2.2 The relations between entropies, joint entropy, and mutual information 33Figure 2.3 Relation between correlation and joint entropy 37Figure 3.1 Joint entropy calculation principle 42Figure 3.2 Sensor layout in Intel Berkeley Research Lab 45Figure 3.3 Practical, upper bound and lower bound joint entropy (JE) of subsets of the dataset 1 46Figure 3.4 Estimated joint entropy with different values of entropy correlation coefficients using upper bound function (with Hmax = 2[bits]) 48Figure 3.5 Estimated joint entropy (by upper bound) and practical joint entropy of dataset 1 49Figure 3.6 Correlation-based clustering algorithm 52Figure 3.7 Temperature data measured at 11 nodes in the dataset 1 53Figure 3.8 Derivative of estimated joint entropy and calculated the joint entropy of the selected group 55Figure 4.1 Routing paths for three schemes: (a) DSC, (b) RDC, and (c) CDR [122] 59Figure 4.2 Energy consumptions for the DSC, RDC and CDR schemes respectively
to entropy correlation coefficients 60Figure 4.3 Routing pattern of 1-D network 61Figure 4.4 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 1-D with compression along SPT to the cluster head 63Figure 4.5 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 1-D with compression at the cluster head only 64Figure 4.6 Routing pattern of the 2-D network [122] 65
Trang 99
Figure 4.7 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 2-D with compression along
SPT to the cluster head 67
Figure 4.8 Total bit-hop cost Es that corresponds to cluster size with different values of entropy correlation coefficient in the case of 2-D with compression at the cluster head only 68
Figure 4.9 Illustration of clustering for a general topology model 69
Figure 4.10 Total transmission cost that corresponds to cluster size with different values of entropy correlation coefficient with compression along SPT to the cluster head 70
Figure 4.11 Total transmission cost respectively to cluster size with different values of entropy correlation coefficient with compression at the cluster head only 71 Figure 4.12 The relation between distortion and the number of representative nodes with N = 10 74
Figure 4.13 The relation between distortion and the number of representative nodes with N = 15 74
Figure 4.14 The relation between distortion and the number of representative nodes with N = 20 75
Figure 4.15 Maximizing obtained information based representative node selection algorithm 77
Figure 5.1 Radio energy dissipation model 83
Figure 5.2 Time scheduling for one round 85
Figure 5.3 Sensor node distribution in the 200mx200m sensing area 88
Figure 5.4 Routing path of compression-based routing protocol 89
Figure 5.5 Total energy in each round in case of compression along SPT to the CH 93
Figure 5.6 Number of alive nodes in each round in case of compression along SPT to the CH 94
Figure 5.7 Total energy in each round in case of compression at the CH only 96
Figure 5.8 Number of alive nodes in each round in case of compression at the CH only 97
Trang 10Figure 5.9 Total energy in each round in case of representative aggregation with compression with 16 correlation clusters 98Figure 5.10 Number of alive nodes in each round in case of representative aggregation with compression with 16 correlation clusters 98Figure 5.11 Total energy in each round in the case of representative aggregation without compression with 16 correlation clusters 99Figure 5.12 Number of alive nodes in each round in the case of representative aggregation without compression with 16 correlation clusters 100Figure 5.13 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 16 correlation clusters 101Figure 5.14 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 16 correlation clusters 102Figure 5.15 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 8 correlation clusters 102Figure 5.16 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 8 correlation clusters 103Figure 5.17 Total energy comparison between distance-based protocol and ECODA with compression aggregation in the case of 4 correlation clusters 104Figure 5.18 Total energy comparison distance-based protocol and ECODA with compression aggregation in the case of 4 correlation clusters 105Figure 5.19 Total energy comparison between distance-based protocol and ECODA with representative aggregation in the case of 16 correlation clusters 106Figure 5.20 Number of alive nodes comparison between distance-based protocol and ECODA with representative aggregation in the case of 16 correlation clusters 107
Trang 1111
LIST OF TABLES
Table 3.1 Node’s entropy of the dataset 1 46
Table 3.2 Entropy correlation coefficient of each pair from the dataset 1 47
Table 3.3 Practical, upper bound and lower bound joint entropy (JE) of subsets of the dataset 1 49
Table 3.4 Clustering results of 48 nodes 53
Table 4.1 Number of representative nodes with distortion D = 0.05 76
Table 4.2 Number of representative nodes with distortion D = 0.1 76
Table 4.3 Number of representative nodes with distortion D = 0.15 76
Table 4.4 Selection of representative nodes and the actual distortion based on theoretical calculation (dataset 1 with N = 11 nodes) 78
Table 4.5 Selection of representative nodes and the actual distortion based on practical calculation (dataset 1 with N = 11 nodes) 78
Table 4.6 Entropy values of 10 nodes in the correlation region (dataset 2 with N = 10 nodes) 78
Table 4.7 Selection of representative nodes and the actual distortion based on theoretical calculation (dataset 2 with N = 10 nodes) 79
Table 4.8 Selection of representative nodes and the actual distortion based on practical calculation (dataset 2 with N = 10 nodes) 80
Table 5.1 Simulation parameters 88
Table 5.2 Simulation results in case of compression along SPT to the CH 94
Table 5.3 Simulation results in case of compression at the CH only 95
Table 5.4 Simulation results in the case of representative aggregation with compression at the CH 97
Table 5.5 Simulation results in the case of representative aggregation without compression at the CH 100
Table 5.6 Comparison between distance-based protocol and ECODA with compression aggregation in the case of 16 correlation clusters 103
Table 5.7 Comparison between distance-based in the case of 8 correlation clusters 104
Trang 12Table 5.8 Comparison between distance-based protocol and ECODA with compression aggregation in the case of 4 correlation clusters 105Table 5.9 Comparison between distance-based protocol and ECODA with representative aggregation in the case of 16 correlation clusters 106
Trang 1313
PREFACE
Wireless Sensor Network (WSN) is the collection of sensor nodes which cooperatively monitor surrounding phenomena over large physical areas The advances in the integration of micro-electro-mechanical systems and digital electronics with the development of wireless communications have enabled the wide deployment of WSNs Sensor nodes in WSNs have been equipped with various sensing capabilities in space and time and higher processing capacities can satisfy requests from various modern applications Because of low-cost, small-in-size and no-replace battery powered characteristics of sensor nodes, energy conservation is commonly recognized as the key challenge in designing and operating the networks
In typical WSNs applications, sensors are required for spatially dense deployment to achieve satisfactory coverage As a result, multiple sensors will record information about a single event in the sensing field, i.e sensed data are correlated with each other The existence of correlation characteristic can bring many significant potential advantages for the development of efficient communication protocols well-suited to the WSNs paradigm For example, due to the correlation degree, data in a correlated region can be compressed with a high ratio to reduce the amount of sent data for saving dissipated energy Even with high enough correlation, it may not be necessary for every sensor node in a correlation group to transmit its data to the base station Instead, a smaller number of sensor measurements (representation) might be adequate to communicate the event features to the base station within a certain reliability/fidelity level
From this point of view, various researches have focused on discovering and exploiting the correlation of sensed data in WSNs At the beginning of these researches, the traditional probability and statistic theory have been used to describe the correlation among data Nevertheless, these approaches limited the correlation as
a linear relation that may not appropriate for general, nonlinear cases in practice Therefore, the information entropy approach has been considered to obtain the generality However, most of the research approach, using traditional probability - statistic theory or information entropy theory, considered the correlation as a distance-dependence feature In general, the correlation of data may be independent
of external factors such as sensor location and environmental conditions and thus, so
it is better to concentrate on the information contained in the data itself rather than considering only attribute meta-data such as location and time
This thesis concentrates to discover and exploit the general correlation in WSNs using information entropy theory to look at the sensed data itself At first, a
Trang 14novel distance-independence entropy-based correlation model for describing correlation characteristics in a wireless sensor network is proposed From this entropy correlation model, an energy efficient routing protocol with correlation-based data aggregation will be developed
To discover the correlation property, at first, an estimation of joint entropy for
a data group is established From this estimation, a definition of the correlation group
is proposed and then the correlation model that is used to calculate the joint entropy
of the correlation data group is developed To exploit the correlation characteristic, two main data aggregation schemes are analyzed and evaluated using the proposed correlation model At the end, these schemes are used to develop data aggregation routing protocols Using the proposed routing protocols, the transferred data in the network is reduced so that the dissipated energy is decreased
The thesis structure is as follows:
Chapter 1: Introduction
This chapter reviews the introduction of WSNs, energy conservation schemes, and data correlation problems The main contributions of the thesis are also presented shortly in this chapter
Chapter 2: Correlation in Wireless Sensor Network
This chapter presents the survey of correlation model in WSNs and the correlation through the point of view of information entropy Then, the idea to establish a new correlation model is described
Chapter 3: Entropy-based Correlation Clustering
Based on the analyzed factors in chapter 2, we propose the approximated estimation of joint entropy From this approximation method, we define the correlation region and propose the correlation clustering scheme We also verify the validation of the proposed estimation and correlation clustering scheme in this chapter
Chapter 4: Entropy-based Data Aggregations
In this chapter, we exploit the advantages of using data correlation by data aggregation using entropy correlation including entropy-based representative aggregation and entropy-based data compression
In entropy-based representative aggregation, the distortion of data in the group while some nodes are put into sleep state is evaluated using the proposed correlation
Trang 15Chapter 5: Entropy Correlation based Data Aggregation Protocol (ECODA)
In this chapter, we outline an Entropy COrrelation-based Data Aggregation protocol (ECODA) using the proposed clustering scheme in chapter 3 and data aggregation schemes in chapter 4 The simulations have also been done to validate the effectiveness of the proposed clustering and aggregating schemes
Chapter 6: Conclusions and Further study
This chapter concludes the results of the thesis with careful evaluations and points out the remained problems that are the future works
Trang 161 INTRODUCTION
Overviews
People always want to know more about the physical world around, so that they can have a better understanding of the surrounding environment Therefore, they try to collect the environment’s information as much details as possible Sensor nodes are used to link the physical to the digital world by capturing and revealing real-world phenomena and converting these into a form that can be processed, stored, and acted upon By integrating sensors into numerous devices, machines, and environments, a tremendous societal benefit can be provided such as avoiding catastrophic infrastructure failures, conserving precious natural resources, increasing productivity, enhancing security, and enabling new applications such as context-aware systems and smart technologies The advances in technologies such as very large-scale integration (VLSI), microelectromechanical systems (MEMS), and wireless communications, that make sensors become tinier, low-power, inexpensive, further contribute to the widespread use of distributed sensor systems such as wireless sensor networks
Figure 1.1 Wireless Sensor Network 1
CHAPTER 1
Trang 1717
Wireless Sensor Network (WSN), is the collection of sensor nodes which cooperatively monitor surrounding phenomena over large physical areas [1]–[4] These sensor nodes can sense, observe or measure, gather information from the environment and transmit the sensed data to the user based on some local decision process A typical sensor node is composed of a sensing unit which is equipped with one or more sensors, a processing unit, a power unit, and a transceiver unit The sensing unit could have various sensors such as thermal, biological, chemical, optical, and magnetic to measure properties of the environment A sensor node acquires data through the sensing unit, processes sensed data by the processing unit and finally transmits processed data using the transceiver unit Because of the limitations of memory capabilities, sensor nodes should be implemented by wireless communication to transfer the data to a base station, allowing them to disseminate their sensor data to remote processing, visualization, analysis, and storage systems
Figure 1.2 Wireless Sensor Network Applications 2
2
https://www.researchgate.net/publication/220505150_Energy_Saving_Mechanisms_for_MAC_Protocols_in_ Wireless_Sensor_Networks/figures?lo=1
Trang 18There are five types of WSNs: terrestrial WSN, underground WSN, underwater WSN, multi-media WSN, and mobile WSN [3] In terrestrial WSNs [1], there are hundreds to thousands of inexpensive wireless sensor nodes deployed in a given area, either in an ad hoc or in a pre-planned manner Reliable communication
in a dense environment is very important in this WSN type Battery power is limited and may not be rechargeable in terrestrial sensor nodes, however, they can be equipped with a secondary power source such as solar cells In a terrestrial WSN, energy can be conserved with multi-hop optimal routing, short transmission range, in-network data aggregation, eliminating data redundancy, minimizing delays, and using low duty-cycle operations
In underground WSNs [5], sensor nodes are buried underground or in a cave
or mine used to monitor underground conditions An underground WSN is more expensive than a terrestrial WSN in terms of equipment, deployment, and maintenance In addition, the operation of wireless communication is more difficult
in the underground environment due to signal losses and high levels of attenuation
Opposite to a dense deployment of sensor nodes in a terrestrial WSN, underwater WSNs [6] consist of sensor nodes and vehicles deployed underwater Because of their special working environment, underwater sensor nodes are more expensive and fewer sensor nodes are deployed, in comparison with terrestrial WSNs Autonomous underwater vehicles are used for exploration or gathering data from sensor nodes Underwater wireless communications are typically established through transmission of acoustic waves with limited bandwidth, long propagation delay, and signal fading issue In addition, underwater sensor nodes must be able to self-configure and adapt to the harsh ocean environment
Multi-media WSNs [7] have been developed to enable the monitoring and tracking of events using multimedia such as video, audio, and imaging Multi-media WSNs consist of various low-cost sensor nodes equipped with cameras and microphones They are usually deployed in a pre-planned manner into the environment to guarantee coverage Multi-media sensor nodes interconnect with each other over a wireless connection for data retrieval, process, correlation, and compression Because of high data transmission, challenges in multi-media WSN include high bandwidth demand, high energy consumption, quality of service (QoS) provisioning, data processing and compressing techniques, and cross-layer design
Mobile WSNs [8] [9] consist of a collection of sensor nodes that can move on their own and interact with the physical environment Same as in static WSNs, nodes
Trang 1919
mobile nodes can reposition and organize itself in the network This mobility characteristic requires dynamic routing in a mobile WSN Challenges in mobile WSN include deployment, localization, self-organization, navigation and control, coverage, energy, maintenance, and data process
The above described features of WSNs ensure great potential for many applications [10]–[14] The development of WSNs was motivated by military applications [15]–[19] and then were widely used in various fields such as industrial monitoring [20]–[25], environment monitoring [26]–[33], agriculture [34]–[37], forest fire detection [38]–[40], animal tracking [41] [42], healthcare [43]–[50], security [51]–[53], home automation [54] [55], power utility’s distribution [56], logistics [57], intelligent traffic systems [58], etc
In Vietnam, studies on WSNs have been considered in the last two decades The most attracted topics are energy saving and load balancing in WSNs, in consideration of base station position [59], delay constrained [60], 3D WSN [61], WSNs with holes [62], k-means clustering [63] The applications of WSNs are also widely considered such as landside monitoring [64], smart grid [65], target tracking [66], logistics [57], and healthcare monitoring [67]
Energy conservation in WSNs
In most cases, energy for activities in WSNs comes from a limited battery supply However, in many applications, it is very hard or impossible to recharge the batteries due to the deployment of the nodes because of the difficulties and hostile terrain or due to a large number of nodes deployed in the environment [68] [69] For those reasons, energy conservation is commonly recognized as the key challenge to designing and operating the network in WSNs, because individual sensor nodes are expected to be low-cost, small-in-size, and powered by a non-replaceable battery
In recent years, numerous energy-saving approaches have been proposed in
[70] [71] They can mainly be classified into five categories including radio optimization, data reduction, sleep/wakeup schemes, energy-efficient routing and charging solutions The next section will present these five categories of energy-
saving approaches
1.2.1 Radio optimization
In radio optimization, to save energy, radio parameters such as coding and modulation schemes, power transmission and antenna direction are optimized Radio optimization approaches can further be divided into 4 schemes including modulation
Trang 20optimization, cooperative communication, transmission power control, and a directional antenna
Modulation optimization tries to optimize the modulation parameters that results in minimum radio energy consumption The good trade-off between the constellation size, the information rate, the transmission time, the distance between nodes and the noise are considered [72] [73]
Cooperative communication schemes try to improve the quality of the received signal by collaborating several single antenna devices to create a virtual multi-antenna transmitter [74] [75]
Transmission power control schemes enhance energy efficiency at the physical layer by adjusting radio transmission power The idea is that a lower communication range between nodes requires less power from radio [76] [77] Another idea is that a node with higher remaining energy may increase its transmission power, which enables other nodes to decrease their transmission power [78]
Directional antenna schemes allow the signal to be sent and received in one direction at a time that allows the improvement of transmission range and throughput [79] [80] To take advantage of directional antennas, new MAC protocols have been proposed in [81] [82] In addition, some specific problems also have to be considered
in [83]
1.2.2 Sleep/wake-up schemes
Sleep/wake-up schemes try to adapt node activity to save energy by putting the radio in sleep mode The main idea of this approach is the duty cycling scheme Duty cycling scheme schedules the node radio state according to network activity to minimize idle listening and favor the sleep mode They are the most energy-efficient but suffer from sleep latency In some cases, it is not possible to broadcast information
to all its neighbors because of unsimultaneously active In addition, some fixing parameters such as listening/sleeping period, preamble length, and slot time are strictly issues because of system performance The detailed survey of duty cycling can be found in [84]
1.2.3 Energy efficient routing
Routing is also a burden that makes seriously drain energy reserves In general, there are various routing paradigms In this research area, some main paradigms are considered such as cluster architecture, energy as a routing metric, multipath routing,
Trang 21Energy can also be considered as a metric in the setup path phase to extend the lifetime of sensor networks In this case, routing algorithms not only focus on the shortest paths but also can select the next hop based on its residual energy [88]
Multipath routing, in general, is more complex than single-path routing But single-path routing can rapidly drain the energy of nodes on the selected path Multipath routing can balance the energy among nodes by alternating forwarding nodes [89] [90] More surveys on multipath routing protocols can be found in [91]
The premature depletion of nodes in each region can create energy holes or partition the network This situation can be avoided by optimizing node placements
or adding some relay nodes with enhanced capabilities This helps to improve energy balance, avoid sensor hot-spots, ensuring coverage [92]–[94]
1.2.4 Data reduction
Energy consumption depends on data transmission Thus, reducing the amount
of data to be delivered can save energy Data reduction approaches can be divided into three types: data aggregation, adaptive sampling, and network coding
Data aggregation techniques involve different ways of routing data packets to combine them by exploiting the extracted features and statistics of datasets coming from different sensor nodes There are several aggregation techniques with different aggregation functions and for different specific application requirements The first type of aggregation function is to extract the maximum, minimum or averaged value
of aggregated data [95] [96] In this way, it can reduce the amount of communicating data in the networks which affect the power consumption However, this technique can lose much of the original structure in the extracted data
The second type of aggregation technique is data compression Data compression techniques are further divided into distributed data compression [97]
Trang 22[98] and local data compression [99] [100] The distributed data compression techniques are the most optimal compression However, it is much more complicated than local data compression that is with smaller compression rate The detailed survey
of data compression in WSN can be found in [101] It is important to note that the data compression techniques are only effective with correlation data Therefore, the correlation is usually required when using these techniques
The third type of aggregation technique is representative type [102] in which some nodes are chosen to be the representative of a group of nodes The other nodes
in the group can be put to sleep to save energy The number of sleep nodes that affects the power consumption is decided by specified distortion Same as data compression, these techniques required data in correlation
Adaptive sampling techniques adjust the sampling rate at each sensor while ensuring that application needs are met in terms of coverage of information precision
by exploiting spatial-temporal correlations between data By reducing the number of samples, the amount of transmitted data is reduced thus save the node energy The temporal analysis of sensed data is used in [103] and spatial correlation is used in [104] More details about adaptive sampling can be found in [105]
Network coding is used to reduce the traffic in broadcast scenarios by sending
a linear combination of several data instead of a copy of each data At the destination nodes, data can be decoded by solving the linear equations [84] [106] Network coding exploits the trade-off between computation and communication since communications are slow compared to computations and more energy consumption
1.2.5 Charging solution
Several recent types of research address energy harvesting and wireless charging techniques for WSNs as promising solutions because of recharge capability without human intervention
Energy harvesting techniques have been developed to enable the sensors to harvest energy from their surrounding environment such as solar, wind or kinetic energy [107] Energy harvesting schemes often require energy prediction to manage the available power efficiently It is important to note that because of the limitation
of remain energy between two harvesting opportunities, the energy saving mechanisms are still necessary to implement
The breakthrough in wireless power transfer is expected to enable the wireless charging capability for WSNs Wireless charging can be done in two ways:
Trang 2323
omnidirectional electromagnetic radiation technology is only applicable to ultra-low power requirement and low sensing activities [108] The reason is electromagnetic waves suffer from the rapid drop in power efficiency over distance, and active radiation technology may pose safety concerns to humans In contrast, magnetic resonance coupling appears to be the most promising technique with higher efficiency and safer However, the charging range is still a big concern [108]
Data correlation and energy conservation in WSNs
In typical WSNs applications, sensors are required for spatially dense deployment to achieve satisfactory coverage [1] Consequently, multiple sensors will record information about a single event in the sensing field, i.e these sensed data strongly depends on each other For example, temperature sensors in the same room record the same temperature information, or several cameras that monitor the same area record many frames with similar information In another word, they are correlated with each other The existence of correlation characteristic can bring many significant potential advantages for the development of efficient communication protocols well-suited to the WSNs paradigm For example, due to the correlation degree, data in a correlated region can be compressed with a high ratio, thus the amount of sent data is reduced [109] Even with high enough correlation, it may not
be necessary for every sensor node in a correlation group to transmit its data to the base station; instead, a smaller number of sensor measurements might be adequate to communicate the event features to the base station within a certain reliability/fidelity level [110]
In addition, in WSNs, the power breakdown heavily depends on the specific node However, the following remarks generally hold [109] [111]
• The radio energy consumption is of the same order of magnitude in the reception, transmission, and idle states, while the power consumption drops of at least one order of magnitude in the sleep state Therefore, the radio should be put to sleep (or turned off) whenever possible
• The communication activity has an energy consumption much higher than the computation activity It has been shown that transmitting one bit may consume as much as executing a few thousand instructions [112] Therefore, communication should be traded for computation
Data correlation can allow us to reduce the data transferring, or even to put some sensor nodes to sleep Thus, it can make WSNs conserve energy significantly
Trang 24Problem statements and contributions
The main problems in this research are “How to recognize the correlation among dataset by looking at data itself and how to exploit the correlation characteristic for energy conservation in WSNs” In this research, we focus on
WSNs working in high correlation environment A high correlation environment can
be divided into groups called correlation regions where measured data strongly depends on each other By clustering sensor nodes into correlation regions, data aggregation can be done to conserve the energy in WSNs In this paper, we focus on two data aggregation schemes including data compression and representative aggregation The main contributions of the thesis are:
Developing an entropy correlation clustering algorithm and entropy correlation model to describe the correlation characteristics of a correlation cluster
This algorithm can divide a correlation environment into several correlation regions using the entropy values of measured data and the entropy correlation coefficients of measured data pairs in the environment At the same time, this algorithm uses only the data itself and does not depend on the distance information The correlation model describes the relationship between the joint entropy of a dataset and the number of data series in the dataset, in consideration of data’s entropy correlation coefficient
Analyzing and evaluating the impact of the correlation characteristic to data aggregation schemes
With the proposed correlation clustering and model, it is necessary to evaluate their impact on data aggregation schemes With data compression aggregation, several compression schemes and network structures are considered to find the most appropriate compression routing for WSNs With representative aggregation, a distortion function that measures the required ratio of data loss is used The number
of representative nodes is then evaluated, and the representative node selection algorithm is proposed
Developing an entropy correlation-based data aggregation protocol for WSNs
to exploit the correlation characteristic of the sensed environment
The developed protocol includes two phases, one phase is for data collection
to identify correlation characteristic, the other phase is for data aggregation implementation For this protocol, the proposed clustering algorithm and data aggregation schemes are used In addition, the design of the protocol is proposed The
Trang 2525
2 CORRELATION IN WIRELESS SENSOR NETWORK
As mentioned in chapter 1, correlation characteristic has many significant potential advantages for the development of energy-efficient communication protocols for WSNs To evaluate and exploit the correlation characteristic, it is necessary to build
a correlation model This chapter concentrates on the survey of the existing correlation models From the advantages and limitations of the previous correlation model, the approaching methodology of developing a new correlation model will be pointed out
Correlation model survey
Correlation is represented for the relationships between quantitative variables or categorical variables In other words, it’s a measure of how things are related Data correlation is a measure of how data is related to each other
To exploit the correlation in WSNs, it is necessary to recognize the correlation among data in the network by establishing correlation models There have been many research efforts to study the correlation model in WSNs In [111], correlated nodes are supposed to observe the same source 𝑆, and the observed data 𝑋𝑖(𝑡) at the i th node
is the sum of a correlated version of the source 𝑆𝑖(𝑡) and observed noise 𝑁𝑖(𝑡)
𝑋𝑖(𝑡) = 𝑆𝑖(𝑡) + 𝑁𝑖(𝑡) (2.1)
The correlation model is the covariance function 𝐾𝜗 (correlation coefficient 𝜌) that is chosen to be distance dependence and can be classified into four groups including:
Spherical:
𝐾𝜗 = {1 −
32
Trang 26as described in [112]–[114]
Some papers also build correlation model in which the correlation coefficient
is a function of distance among nodes [115] [116] In [115], the compression rate is calculated based on the correlation coefficient 𝜌 between two nodes and the correlation coefficient is defined to be inversely proportional to their Euclidean distance 𝑑
Magnitude 𝑚 -dissimilarity: Two-time series 𝑋 {𝑥1, 𝑥2, … , 𝑥𝑞} and
𝑌 {𝑦1, 𝑦2, … , 𝑦𝑞} are magnitude 𝑚-dissimilarity if there is an 𝑖 (1 ≤ 𝑖 ≤ 𝑞) such that
Trang 27𝑑(𝑣𝑖, 𝑣𝑗) = ∑|𝑥𝑘 − 𝑦𝑘|.
𝑞
𝑘=1
(2.9)
The smaller the Manhattan distance is, the more similarity between the two vectors
is Manhattan distance also is used to define the dissimilarity in [119]
Some research efforts define the correlation model in different ways such as a linear predictive model [120], node weight [121], data density correlation degree [102] In [120], a set of sensor nodes is a correlation set if a reading at a node can be predicted using a linear combination of readings from the other nodes Let 𝑆 ={𝑠1, 𝑠2, … , 𝑠𝐿} is a set of sensor nodes Then, the predicted value of a node 𝑠, 𝑠′[𝑘], can be presented as a linear combination of 𝑠1[𝑘], 𝑠2[𝑘], … , 𝑠𝐿[𝑘] for all 𝑘:
Trang 28𝐸 = ∑(𝑠[𝑘] − 𝑠′[𝑘])2.
𝐾
𝑘=1
(2.11)
Weighting coefficients are determined such that 𝐸 is minimized
In [121], the correlation of a node with its neighbors is evaluated using correlated weight The definition of the Spatial Correlated Weight considers the average spatial distance deviation between each node and its neighbors within a predefined communication range For any node 𝑖, 𝑁(𝑖) is a set of its node neighbors Let 𝑗 be a neighbor node of node 𝑖, 𝑗 ∈ 𝑁(𝑖) The correlated weight 𝑤𝑖 of node 𝑖 is defined as:
In [102], in order to evaluate the correlation, the data density correlation degree of a node is used Let sensor node 𝑣 has 𝑛 neighboring sensor nodes within the cycle of the communication radius of 𝑣 Those neighboring sensor nodes are called 𝑣1, 𝑣2, … , 𝑣𝑛 The data object of 𝑣 is 𝐷, and its neighboring sensor nodes’ data
is 𝐷1, 𝐷2, … , 𝐷𝑛 respectively Among these 𝑛 data objects, there are 𝑁 data objects whose distances to 𝐷 are less than 𝜀 and 𝑚𝑖𝑛𝑃𝑡𝑠 ≤ 𝑁 ≤ 𝑛 Then the data density correlation degree of sensor node 𝑣 to the sensor nodes whose data objects are in 𝜀-neighborhood of 𝐷 is calculated as follows:
in which 𝑚𝑖𝑛𝑃𝑡𝑠 is the amount threshold; 𝜀 is the data threshold; 𝑑∆ is the distance between 𝐷 and the data center of the data objects which are in the 𝜀-neighborhood of
Trang 29𝐻𝑛 = 𝐻𝑛−1+ [1 − 1
𝑑𝑖
𝑐 + 1] 𝐻1,
(2.14)
in which 𝑐 is a constant that characterizes the extent of spatial correlation in the data
In the simple case when all nodes are located on a line equally spaced by a distance 𝑑, the joint entropy 𝐻𝑛 of a set of 𝑛 nodes {𝑆1, 𝑆2, … , 𝑆𝑛} is calculated as:
𝐻𝑛 = 𝐻1+ (𝑛 − 1) [1 − 1
𝑑
𝑐 + 1] 𝐻1
Entropy-based correlation model also is used in [125] Entropy correlation coefficient is chosen to be the Pearson linear correlation coefficient to reduce the computation complexity but reduce the generality of using entropy
Trang 30It can be seen that the correlation models in the above works are all based on the distance between nodes The smaller the distances between nodes are, the higher the correlation they are However, this assumption may be not always true because
of some physical barriers among nodes For example, in Figure 2.1, some sensor nodes are placed in two rooms next to each other, in which room 1 is equipped with
an air conditioner while room 2 is without an air conditioner Node A and B are placed close to each other, but they are in different rooms with independent conditions which causes their sensed data may be independent of each other The sensed data in node
A is correlated to node C because they are placed in the same room with the same conditions, despite their distance is larger than the distance between node A and B Therefore, it is necessary to establish a correlation model which is distance-independent to the positions of sensor nodes In addition, when observing the readings over time of 54 sensors deployed at Intel Berkeley Lab [127] [128], it is found out that correlation of data may be independent of external factors such as sensor location and environmental conditions Therefore, it is better to look at the information contained in the data itself rather than considering only attribute meta-data such as location and time [126]
Figure 2.1 The layout of sensor nodes in an environment with two different conditions area
To improve the above problem, in [129], entropy is calculated from real data and then the joint entropy 𝐻𝑛 of a set of 𝑛 nodes {𝑆1, 𝑆2, … , 𝑆𝑛} is approximated by a function in the set as follows:
in which 𝛼 and 𝛽 are constants determined from real data The advantage of this model is a distance-independent model, but the disadvantage is that this model can only be obtained when the correlation set has been established The calculation for determining the correlation group using this model is very complicated with huge
Trang 3131
It can be found that most of the correlation models are distance-based that may not be true in some cases such as examples which are shown in [126] In this paper, authors found out that sensors in similar environmental conditions that are not necessarily spatially correlated can report correlated data and correlation of data may
be independent of external factors such as sensor location and environmental conditions Therefore, it is necessary to develop a model that is distance-independent and applicable practically for Wireless Sensor Networks
Correlation model can be established mainly by using traditional probability and statistic theory or by entropy information theory However, the correlation from the point of view of information entropy is more general, but more challenges With the purpose to find a novel and general correlation model, this thesis will use the information entropy theory to discover the correlation characteristic by looking at the data itself
Information entropy theory
2.2.1 Overview
Information entropy or Shannon’s entropy is a foundational concept of information theory Information entropy quantifies the amount of information in a variable, thus providing the foundation for a theory around the notion of information
At a conceptual level, information entropy is simply the "amount of information" in a variable More intuitively, that corresponds to the amount of storage (e.g number of bits) required to store the variable, which can be understood to correspond to the amount of information in that variable However, the calculation of this number of bit and therefore the amount of information in a variable is more involved than might appear at first sight It is not simply the number of bits required
to represent all the different values that a variable might take on, which is just the raw data For example, a variable may take on any of 8 different values In digital storage,
3 bits would be enough to uniquely represent the 8 different values, and thus the variable can be stored in 3 bits
However, this is an upper limit on the required storage; it is the amount of storage required to store the raw “data” of the variable, not the “information” in that data Less storage might be enough to store the information, depending on the process
by which the variable takes on different values For example, suppose a coin is completely biased and always comes up heads when tossed Then the random variable representing the coin toss's outcome has probability 1 of coming up heads (in other words, it is a constant) It is not necessary to store that variable as it can be trivially
Trang 32guessed at any time Thus, the amount of information in that variable is zero On the other hand, if we have a perfect coin with half-half chances of coming up heads or tails upon a coin toss, then we can guess the outcome of a toss with only 50% accuracy (probability 0.5), so it is necessary to store the actual value of that coin toss outcome's random variable to know its value with better than 50% accuracy The amount of information in this second random variable is much higher than in the first case
In a more sophisticated representation of the variable, if a variable is easier to guess, then we can use that fact to reduce the number of bits needed to store that information If the value of the variable is easier to guess, the variable is less
“surprise” and contains less information Thus, an alternative way of considering entropy is as a measure of “compressibility” of the data, i.e., a compression metric that expresses how much the raw data of a variable can be compressed without losing the information in the variable
2.2.2 Entropy concept
In information theory, the entropy of a random variable is a function which attempts to characterize the “unpredictability” or “uncertainty" of a random variable [130] On the other word, the more uncertainty or unpredictability the event is, the more information it will contain and the larger the value of its entropy is
If a random variable 𝑋 takes on values in a set 𝑋 = {𝑥1, 𝑥2, … , 𝑥𝑛}, and is defined by a probability distribution 𝑃(𝑋), then the entropy 𝐻(𝑋) of the discrete random variable 𝑋 is written as:
Trang 3333
𝐻(𝑋, 𝑌) = − ∑ ∑ 𝑃(𝑥, 𝑦) 𝑙𝑜𝑔2𝑃(𝑥, 𝑦)
𝑦∈𝑌 𝑥∈𝑋
in which 𝑥 and 𝑦 are particular values of 𝑋 and 𝑌
For more than two random variables 𝑋1, 𝑋2, … , 𝑋𝑛, the joint entropy expands to:
Correlation and entropy
2.3.1 Correlation of two variables
Correlation is a measure of the relation/dependence between variables In entropy theory [130], the relation between two variables can be described by mutual information and correlation coefficient concepts
2.3.1.1 Mutual information
Figure 2.2 The relations between entropies, joint entropy, and mutual information
In Figure 2.2, the relationship between random variable entropies, joint entropy, and mutual information are described The relation between entropy and joint entropy is shown by inequality (2.22) with equality if 𝑋 and 𝑌 are independent:
𝐼(𝑋, 𝑌)
𝐻(𝑋, 𝑌)
Trang 34The above inequation shows that when the information covered by 𝑋 fully comprised 𝑌 in its content, the joint entropy of two random variables exactly equals
to the summation of the entropies of both variables On the other hand, the joint entropy of these two variables is always smaller than the total entropies of these two variables Knowing the joint entropy of random variables can tell us how much knowing some variables reduces uncertainty about the other The smaller the value
of joint entropy is, the higher the correlation of the random variables is
Another metric which is also used for measuring the mutual dependence between the two variables is mutual information If the entropy of a random variable
is used to measure information about the event itself, mutual information is a quantity that measures the relationship between two random variables which are sampled simultaneously
For example, 𝑋 represents the weather and 𝑌 represents the humidity on a day
in a specific city The value of 𝑋 tells us something about the value of 𝑌 and vice versa such as if the probability of the weather is rainy, then the probability of high humidity is certain That is, these variables share mutual information If 𝑋 represents the weather of a day, and 𝑌 represents the humidity of the same day, then the information of a rainy day can tell us something about the humidity of that day On the other hand, if 𝑌 represents the weather of a day, and 𝑍 represents the humidity of the other day, then 𝑋 and 𝑍 share no mutual information because the weather of one day does not contain any information about the humidity of the other day
In general, mutual information measures how much information is communicated, on average, in one random variable about another The mutual information between two variables is 0 if and only if the two variables are statistically independent The formal definition of the mutual information of two random variables 𝑋 and 𝑌, whose joint distribution is defined by 𝑃(𝑋, 𝑌) is given by:
𝑃(𝑥)𝑃(𝑦)
𝑦∈𝑌 𝑥∈𝑋
2.3.1.2 Entropy correlation coefficient
It is found that mutual information can be used to measure the correlation between two sets of data, the larger the value of mutual information of the two variables, the more the correlation between these two variables However, it is difficult to compare the correlation level between two pairs of random variables using mutual information or joint entropy, because their values depend on the entropy of
Trang 3535
each individual data in the pair To overcome this problem, a normalized measure of mutual information called entropy correlation coefficient which was introduced in [124] and [131] is used to evaluate the correlation
The relation between mutual information and entropy is given by:
𝐼(𝑋, 𝑌) = 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌), (2.24)
or can be written as:
𝐼(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌) = 1 −
𝐻(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌) = 1 − 1 = 0
(2.27)
If two random variables are completely dependent on each other:
𝐻(𝑋, 𝑌) = H(X) = H(Y) = 1
2(𝐻(𝑋) + 𝐻(𝑌)) (2.28) Substitute (2.28) into (2.25), we can get:
𝐼(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌) = 1 −
𝐻(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌)= 1 −
1
2=
1
2 (2.29) From the above calculations, we get the inequality as:
0 ≤ 𝐼(𝑋, 𝑌)𝐻(𝑋) + 𝐻(𝑌)≤
Trang 36𝜌(𝑋, 𝑌) = 2 − 2 𝐻(𝑋, 𝑌)
𝜌 receives a value in a range from 0 to 1 and specifies the correlation level between two random variables 𝑋 and 𝑌 When 𝜌 = 0 , the two random variables are independent to each other On the other hand, when 𝜌 = 1, the two random variables are similar to each other
The coefficient 𝜌(𝑋, 𝑌) is called the entropy correlation coefficient of the two random variables 𝑋 and 𝑌, in the relation with mutual information 𝐼(𝑋, 𝑌) or joint entropy 𝐻(𝑋, 𝑌) Entropy correlation coefficient presents the comparative relationship of a pair of data, independent to the value of individual entropy, and therefore it can be used to compare the correlation level of two pairs of data
The entropy correlation coefficient 𝜌 varies from 0 to 1, depending on the correlation between the two nodes The larger the value of 𝜌 is, the higher the correlation is If 𝜌 = 1 (in case 𝐻(𝑋) = 𝐻(𝑌) = 𝐻(𝑋, 𝑌)), two sets of data totally depend on each other If 𝜌 = 0 (in case 𝐻(𝑋, 𝑌) = 𝐻(𝑋) + 𝐻(𝑌) ), they are independent
2.3.2 Correlation of more than two variables
The information entropy theory only gives us the correlation evaluation of two variables For correlation of more than two variables, one can be extended from mutual information or entropy correlation coefficient concepts, but it is not the efficient way and even useless
When working with correlation, there are two requirements that must be figured out The first requirement is how to recognize the correlation between variables and the second one is how to evaluate the correlation level For correlation
of two variables, entropy correlation coefficient can be used for both recognition and evaluation of correlation For the correlation of more than two variables, there has not been any efficient way of using entropy theory directly to solve these two
Trang 37in the group Joint entropy is always smaller than the total entropy of individual variables The more correlation among variables in the group is, the more difference between joint entropy and total entropy of individual variables is However, it is difficult to use the comparison between the joint entropy and total entropy Instead, the increasing of the joint entropy of a group when one variable is added into the group is considered If the added variable is highly correlated with variables in the group, i.e it strongly depends on the variables in the group, the increasing of the joint entropy of the group by adding the variable is small
Figure 2.3 Relation between correlation and joint entropy
In another word, a small amount of additional information is needed to specify the added variable Therefore, if we consider the relation of joint entropy value with the number of variables in a group, we can find that the increasing speed of joint entropy value will gradually be reduced and approach to zero In another word, the joint entropy value goes to approach the “saturation” state when the number of considered variables increases The nodes with higher correlation will approach the saturation state faster This phenomenon is described in Figure 2.3 and is discovered
Trang 38by the authors of [129] The speed of going to “saturation” state can be specified to correlation level
However, it is difficult to use this joint entropy characteristic of correlation group to recognize a correlation group To find a correlation subgroup from a considered group, the relation of joint entropy and the number of variables of all possible subgroups must be established and checked Therefore, we need to find an efficient way to use the joint entropy characteristic of correlation group to recognize the correlation and evaluate the correlation level
Conclusions
In this chapter, the survey of the correlation model in WSNs along with several correlation model types has been done The survey results show that the models which use traditional probability and statistic theory only describe the linear correlation The information entropy-based correlation models can perform the general correlation Most of the correlation models are distance-dependent models However, it is necessary to investigate the correlation characteristic by concentrating on the data itself instead of using distance information When working with data itself, one can use the relationship between joint entropy and the number of considered variables to recognize the correlation characteristic However, among the above discussed models, there has not any efficient way to employ this relationship This problem will
be solved in the next chapter
Trang 393 ENTROPY-BASED CORRELATION CLUSTERING
In chapter 2, we have shown that correlation is related to the relation between joint entropy and the number of considered variables To determine whether a group
of nodes is correlated or not, it is necessary to know the entropy of each node and the joint entropy of all subgroups of the considered group However, the calculation of joint entropy for a group of more than two nodes is a waste of time with huge computation resources As a result, it is necessary to find a simple method to estimate joint entropy To solve this problem, in this chapter, we try to estimate the joint entropy of a node group from the entropy of individual nodes in the group and the entropy correlation coefficients of all pairs in the group From this estimation, the correlation characteristic can be recognized, and the correlation clustering can be established
Joint entropy estimation
Suppose there is a set of 𝑁 data values {𝑋1, 𝑋2, … , 𝑋𝑁} with the entropy of each data, 𝐻(𝑋𝑖), and the entropy correlation coefficient, 𝜌𝑖𝑗 = 𝜌(𝑋𝑖, 𝑋𝑗), in which any 1 ≤ 𝑖 ≠ 𝑗 ≤ 𝑁 satisfies the following conditions:
in which 𝐻𝑚𝑖𝑛 and 𝐻𝑚𝑎𝑥 are lower bound and upper bound of data’s entropy in the dataset, respectively; 𝜌𝑚𝑖𝑛 and 𝜌𝑚𝑎𝑥 are lower bound and upper bound of entropy correlation coefficient in the set, respectively The joint entropy is estimated based
on the idea of hierarchical clustering [132] as described in the following sections
3.1.1 Determining the upper bound of joint entropy
With a group that has only one node, we have the entropy of the node defined
Trang 40The correlation coefficient between one cluster and another can be considered
as a parameter to measure the “correlation distance” between two clusters Thus, according to [124] [132], it can be obtained by the greatest/smallest/average correlation coefficient from any member of one cluster to any member of the other cluster Therefore:
𝜌(𝑋𝑖𝑗, 𝑋𝑘) = min{𝜌(𝑋𝑖, 𝑋𝑘), 𝜌(𝑋𝑗, 𝑋𝑘)}} (3.10)
In addition, 𝜌(𝑋𝑖, 𝑋𝑗) ≥ 𝜌𝑚𝑖𝑛 ∀𝑖 ≠ 𝑗; 1 𝑖, 𝑗 𝑁 , then 𝜌(𝑋𝑖𝑗, 𝑋𝑘) ≥ 𝜌𝑚𝑖𝑛
We have