1. Trang chủ
  2. » Công Nghệ Thông Tin

Data Mining and Knowledge Discovery Handbook, 2 Edition part 71 potx

10 200 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 381,32 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery PKDD 2001 pp.. In Proceedings of the 3rd International Conference on Machine Learning a

Trang 1

δ(λ) =



1 ifλ /∈ Y i

0 otherwise

Coverage evaluates how far we need, on average, to go down the ranked list of labels in

order to cover all the relevant labels of the example

Cov= 1

m

m

i=1

max

λ∈Y i

r i(λ) − 1

Ranking loss expresses the number of times that irrelevant labels are ranked higher than

relevant labels:

R-Loss= 1

m

m

i=1

1

|Y i ||Y i | |{(λ a ,λb ) : r ia ) > r ib ),(λ a ,λb ) ∈ Y i ×Y i }|

whereY i is the complementary set of Y i with respect to L.

Average precision evaluates the average fraction of labels ranked above a particular label

λ ∈ Y i which actually are in Y i

AvgPrec= 1

m

m

i=1

1

|Y i |

λ∈Y i

|{λ  ∈ Y i : r i ) ≤ r i(λ)}|

r i(λ)

34.7.3 Hierarchical

The hierarchical loss (Cesa-Bianchi et al., 2006b) is a modified version of the Hamming loss that takes into account an existing hierarchical structure of the labels It examines the predicted labels in a top-down manner according to the hierarchy and whenever the prediction for a label

is wrong, the subtree rooted at that node is not considered further in the calculation of the loss Let anc(λ) be the set of all the ancestor nodes of λ The hierarchical loss is defined as follows:

H-Loss= 1

m

m

i=1|{λ : λ ∈ Y i Z i ,anc(λ) ∩ (Y i Z i ) = /0}|

Several other measures for hierarchical (multi-label) classification are examined in (Moskovitch et al., 2006, Sun & Lim, 2001)

34.8 Related Tasks

One of the most popular supervised learning tasks is multi-class classification, which involves

a set of labels L, where |L| > 2 The critical difference with respect to multi-label classification

is that each instance is associated with only one element of L, instead of a subset of L Jin and Ghahramani (Jin & Ghahramani, 2002) call multiple-label problems, the

semi-supervised classification problems where each example is associated with more than one classes, but only one of those classes is the true class of the example This task is not that common in real-world applications as the one we are studying

Multiple-instance or multi-instance learning is a variation of supervised learning, where

labels are assigned to bags of instances (Maron & p Erez, 1998) In certain applications, the training data can be considered as both multi-instance and multi-label (Zhou, 2007) In image classification for example, the different regions of an image can be considered as

multiple-instances, each of which can be labeled with a different concept, such as sunset and sea.

Trang 2

Several methods have been recently proposed for addressing such data (Zhou & Zhang, 2006, Zha et al., 2008)

In Multitask learning (Caruana, 1997) we try to solve many similar tasks in parallel

usu-ally using a shared representation Taking advantage of the common characteristics of these tasks a better generalization can be achieved A typical example is to learn to identify hand written text for different writers in parallel Training data from one writer can aid the construc-tion of better predictive models for other authors

34.9 Multi-Label Data Mining Software

There exists a number of implementations of specific algorithms for mining multi-label data, most of which have been discussed in Section 34.2.2 The BoosTexter system6, implements the boosting-based approaches proposed in (Schapire, 2000) There also exist Matlab

imple-mentations for MLkNN7and BPMLL8

There are also more general-purpose software that handle multi-label data as part of their functionality LibSVM (Chang & Lin, 2001) is a library for support vector machines that can learn from multi-label data using the binary relevance transformation Clus9 is a predictive clustering system that is based on decision tree learning Its capabilities include (hierarchical) multi-label classification

Finally, Mulan10 is an open-source software devoted to multi-label data mining It in-cludes implementations of a large number of learning algorithms, basic capabilities for di-mensionality reduction and hierarchical multi-label classification and an extensive evaluation framework

References

Barutcuoglu, Z., Schapire, R E & Troyanskaya, O G (2006) Bioinformatics 22, 830–836 Blockeel, H., Schietgat, L., Struyf, J., Dz?eroski, S & Clare, A (2006) Lecture Notes

in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4213 LNAI, 18–29

Boleda, G., im Walde, S S & Badia, T (2007) In Proceedings of the 2007 Joint Confer-ence on Empirical Methods in Natural Language Processing and Computational Natural Language Learning pp 171–180,, Prague

Boutell, M., Luo, J., Shen, X & Brown, C (2004) Pattern Recognition 37, 1757–1771 Brinker, K., F¨urnkranz, J & H¨ullermeier, E (2006) In Proceedings of the 17th European Conference on Artificial Intelligence (ECAI ’06) pp 489–493,, Riva del Garda, Italy Brinker, K & H¨ullermeier, E (2007) In Proceedings of the 20th International Conference

on Artificial Intelligence (IJCAI ’07) pp 702–707,, Hyderabad, India

Caruana, R (1997) Machine Learning 28, 41–75

6http://www.cs.princeton.edu/ schapire/boostexter.html

7http://lamda.nju.edu.cn/datacode/MLkNN.htm

8http://lamda.nju.edu.cn/datacode/BPMLL.htm

9http://www.cs.kuleuven.be/ dtai/clus/

10http://sourceforge.net/projects/mulan/

Trang 3

Cesa-Bianchi, N., Gentile, C & Zaniboni, L (2006a) In ICML ’06: Proceedings of the 23rd international conference on Machine learning pp 177–184,

Cesa-Bianchi, N., Gentile, C & Zaniboni, L (2006b) Journal of Machine Learning Research

7, 31–54

Chang, C.-C & Lin, C.-J (2001) LIBSVM: a library for support vector machines Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm

Chawla, N V., Japkowicz, N & Kotcz, A (2004) SIGKDD Explorations 6, 1–6

Chen, W., Yan, J., Zhang, B., Chen, Z & Yang, Q (2007) In Proc 7th IEEE International Conference on Data Mining pp 451–456, IEEE Computer Society, Los Alamitos, CA, USA

Clare, A & King, R (2001) In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2001) pp 42–53,, Freiburg, Germany Crammer, K & Singer, Y (2003) Journal of Machine Learning Research 3, 1025–1058

de Comite, F., Gilleron, R & Tommasi, M (2003) In Proceedings of the 3rd International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 2003) pp 35–49,, Leipzig, Germany

Diplaris, S., Tsoumakas, G., Mitkas, P & Vlahavas, I (2005) In Proceedings of the 10th Panhellenic Conference on Informatics (PCI 2005) pp 448–456,, Volos, Greece Elisseeff, A & Weston, J (2002) In Advances in Neural Information Processing Systems 14

Esuli, A., Fagni, T & Sebastiani, F (2008) Information Retrieval 11, 287–313

F¨urnkranz, J., H¨ullermeier, E., Mencia, E L & Brinker, K (2008) Machine Learning Gao, S., Wu, W., Lee, C.-H & Chua, T.-S (2004) In Proceedings of the 21st international conference on Machine learning (ICML ’04) p 42,, Banff, Alberta, Canada

Ghamrawi, N & McCallum, A (2005) In Proceedings of the 2005 ACM Conference on Information and Knowledge Management (CIKM ’05) pp 195–200,, Bremen, Germany Godbole, S & Sarawagi, S (2004) In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004) pp 22–30,

Harris, M A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G M., Blake, J A., Bult, C., Dolan, M., Drabkin, H., Eppig, J T., Hill, D P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J M., Christie, K R., Costanzo, M C., Dwight, S S., Engel, S., Fisk, D G., Hirschman, J E., Hong, E L., Nash, R S., Sethuraman, A., Theesfeld, C L., Bot-stein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Ber-riman, M., Wood, V., de La, Tonellato, P., Jaiswal, P., Seigfried, T & White, R (2004) Nucleic Acids Res 32

H¨ullermeier, E., F¨urnkranz, J., Cheng, W & Brinker, K (2008) Artificial Intelligence 172, 1897–1916

Ji, S., Tang, L., Yu, S & Ye, J (2008) In Proceedings of the 14th SIGKDD International Conferece on Knowledge Discovery and Data Mining, Las Vegas, USA

Jin, R & Ghahramani, Z (2002) In Proceedings of Neural Information Processing Systems

2002 (NIPS 2002), Vancouver, Canada

Katakis, I., Tsoumakas, G & Vlahavas, I (2008) In Proceedings of the ECML/PKDD 2008 Discovery Challenge, Antwerp, Belgium

Kohavi, R & John, G H (1997) Artificial Intelligence 97, 273–324

Lewis, D D., Yang, Y., Rose, T G & Li, F (2004) J Mach Learn Res 5, 361–397

Trang 4

Li, T & Ogihara, M (2003) In Proceedings of the International Symposium on Music Information Retrieval pp 239–240,, Washington D.C., USA

Li, T & Ogihara, M (2006) IEEE Transactions on Multimedia 8, 564–574

Loza Mencia, E & F¨urnkranz, J (2008a) In 2008 IEEE International Joint Conference on Neural Networks (IJCNN-08) pp 2900–2907,, Hong Kong

Loza Mencia, E & F¨urnkranz, J (2008b) In 12th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2008 pp 50–65,, Antwerp, Bel-gium

Luo, X & Zincir-Heywood, A (2005) In Proceedings of the 15th International Symposium

on Methodologies for Intelligent Systems pp 161–169,

Maron, O & p Erez, T A L (1998) In Advances in Neural Information Processing Systems

10 pp 570–576, MIT Press

McCallum, A (1999) In Proceedings of the AAAI’ 99 Workshop on Text Learning Mencia, E L & F¨urnkranz, J (2008) In 12th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2008, Antwerp, Belgium Moskovitch, R., Cohenkashi, S., Dror, U., Levy, I., Maimon, A & Shahar, Y (2006) Artifi-cial Intelligence in Medicine 37, 177–190

Park, C H & Lee, M (2008) Pattern Recogn Lett 29, 878–887

Pestian, J P., Brew, C., Matykiewicz, P., Hovermale, D J., Johnson, N., Cohen, K B & Duch, W (2007) In BioNLP ’07: Proceedings of the Workshop on BioNLP 2007 pp 97–104, Association for Computational Linguistics, Morristown, NJ, USA

Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Mei, T & Zhang, H.-J (2007) In MULTIMEDIA

’07: Proceedings of the 15th international conference on Multimedia pp 17–26, ACM, New York, NY, USA

Read, J (2008) In Proc 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008) pp 143–150,

Rokach L., Genetic algorithm-based feature set partitioning for classification prob-lems,Pattern Recognition, 41(5):1676–1700, 2008

Rokach L., Mining manufacturing data using genetic algorithm-based feature set decompo-sition, Int J Intelligent Systems Technologies and Applications, 4(1):57-78, 2008 Rokach L., Maimon O and Lavi I., Space Decomposition In Data Mining: A Clustering Ap-proach, Proceedings of the 14th International Symposium On Methodologies For Intel-ligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag,

2003, pp 24–31

Rousu, J., Saunders, C., Szedmak, S & Shawe-Taylor, J (2006) Journal of Machine Learn-ing Research 7, 1601–1626

Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., Tetko, I., G¨uldener, U., Mannhaupt, G., M¨unsterk¨otter, M & Mewes, H W (2004) Nucleic Acids Res 32, 5539–5545

Schapire, R.E Singer, Y (2000) Machine Learning 39, 135–168

Snoek, C G M., Worring, M., van Gemert, J C., Geusebroek, J.-M & Smeulders, A W M (2006) In MULTIMEDIA ’06: Proceedings of the 14th annual ACM international con-ference on Multimedia pp 421–430, ACM, New York, NY, USA

Spyromitros, E., Tsoumakas, G & Vlahavas, I (2008) In Proc 5th Hellenic Conference on Artificial Intelligence (SETN 2008)

Srivastava, A & Zane-Ulman, B (2005) In IEEE Aerospace Conference

Streich, A P & Buhmann, J M (2008) In 12th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2008, Antwerp, Belgium

Trang 5

Sun, A & Lim, E.-P (2001) In ICDM ’01: Proceedings of the 2001 IEEE International Conference on Data Mining pp 521–528, IEEE Computer Society, Washington, DC, USA

Sun, L., Ji, S & Ye, J (2008) In Proceedings of the 14th SIGKDD International Conferece

on Knowledge Discovery and Data Mining, Las Vegas, USA

Thabtah, F., Cowling, P & Peng, Y (2004) In Proceedings of the 4th IEEE International Conference on Data Mining, ICDM ’04 pp 217–224,

Trohidis, K., Tsoumakas, G., Kalliris, G & Vlahavas, I (2008) In Proc 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, PA, USA, 2008

Tsoumakas, G & Katakis, I (2007) International Journal of Data Warehousing and Mining

3, 1–13

Tsoumakas, G., Katakis, I & Vlahavas, I (2008) In Proc ECML/PKDD 2008 Workshop

on Mining Multidimensional Data (MMD’08) pp 30–44,

Tsoumakas, G & Vlahavas, I (2007) In Proceedings of the 18th European Conference on Machine Learning (ECML 2007) pp 406–417,, Warsaw, Poland

Ueda, N & Saito, K (2003) Advances in Neural Information Processing Systems 15 , 721–728

Veloso, A., Wagner, M J., Goncalves, M & Zaki, M (2007) In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2007) vol LNAI 4702, pp 605–612, Springer, Warsaw, Poland

Vembu, S & G¨artner, T (2009) In Preference Learning, (F¨urnkranz, J & H¨ullermeier, E., eds), Springer

Vens, C., Struyf, J., Schietgat, L., Dˇzeroski, S & Blockeel, H (2008) Machine Learning

73, 185–214

Wieczorkowska, A., Synak, P & Ras, Z (2006) In Proceedings of the 2006 International Conference on Intelligent Information Processing and Web Mining (IIPWM’06) pp 307–315,

Wolpert, D (1992) Neural Networks 5, 241–259

Yang, S., Kim, S.-K & Ro, Y M (2007) Circuits and Systems for Video Technology, IEEE Transactions on 17, 324–335

Yang, Y (1999) Journal of Information Retrieval 1, 67–88

Yang, Y & Pedersen, J O (1997) In Proceedings of ICML-97, 14th International Confer-ence on Machine Learning, (Fisher, D H., ed.), pp 412–420, Morgan Kaufmann Pub-lishers, San Francisco, US, Nashville, US

Yu, K., Yu, S & Tresp, V (2005) In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval pp 258–

265, ACM Press, Salvador, Brazil

Zha, Z.-J., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J & Wang, Z (2008) In Computer Vision and Pattern Recognition, 2008 CVPR 2008 IEEE Conference on pp 1–8,

Zhang, M.-L & Zhou, Z.-H (2006) IEEE Transactions on Knowledge and Data Engineering

18, 1338–1351

Zhang, M.-L & Zhou, Z.-H (2007a) Pattern Recognition 40, 2038–2048

Zhang, M.-L & Zhou, Z.-H (2007b) In Proceedings of the Twenty-Second AAAI Confer-ence on Artificial IntelligConfer-ence pp 669–674, AAAI Press, Vancouver, Britiths Columbia, Canada

Zhang, Y., Burer, S & Street, W N (2006) Journal of Machine Learning Research 7, 1315–1338

Trang 6

Zhang, Y & Zhou, Z.-H (2008) In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008 pp 1503–1505, AAAI Press, Chicago, Illinois, USA Zhou, Z.-H (2007) In Proceedings of the 3rd International Conference on Advanced Data Mining and Applications (ADMA’07) p 1 Springer

Zhou, Z H & Zhang, M L (2006) In NIPS, (Sch¨olkopf, B., Platt, J C & Hoffman, T., eds), pp 1609–1616, MIT Press

Zhu, S., Ji, X., Xu, W & Gong, Y (2005) In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in Information Retrieval pp 274– 281

Trang 8

Privacy in Data Mining

Vicenc¸ Torra

IIIA - CSIC, Campus UAB s/n, 08193 Bellaterra Catalonia, Spain

vtorra@iiia.csic.es

Summary In this chapter we describe the main tools for privacy in data mining We present

an overview of the tools for protecting data, and then we focus on protection procedures Information loss and disclosure risk measures are also described

35.1 Introduction

Data is nowadays gathered in large amounts by companies and national offices This data is often analyzed either using statistical methods or data mining ones When such methods are applied within the walls of the company that has gathered them, the danger of disclosure of sensitive information might be limited In contrast, when the analysis have to be performed by third parties, privacy becomes a much more relevant issue

To make matters worst, it is not uncommon the scenario where an analysis does not only require data from a single data source, but from several data sources This is the case of banks looking for fraud detection and hospitals analyzing deseases and treatments In the first case, data from several banks might help on fraud detection Similarly, data from different hospitals might help on the process of finding the causes of a bad response to a given treatment, or the causes of a given desease

Privacy-Preserving Data Mining (Aggarwal and Yu, 2008) (PPDM) and Statistical Dis-closure Control (Willenborg, 2001, Domingo-Ferrer and Torra, 2001a) (SDC) are two related fields with a similar interest on ensuring data privacy Their goal is to avoid the disclosure of sensitive or proprietary information to third parties

Within these fields, several methods have been proposed for processing and analysing data without compromising privacy, for releasing data ensuring some levels of data privacy; measures and indices have been defined for evaluating disclosure risk (that is, in what extent data satisfy the privacy constraints), and data utility or information loss (that is, in what extent the protected data is still useful for applications) In addition, tools have been proposed to visualize and compare different approaches for data protection

In this chapter we will review some of the existing methods and give an overview of the measures The structure of the chapter is as follows In Section 35.2, we present a classifi-cation of protection procedures In Section 35.3, we review different interpretations for risk and give an overview of disclosure risk measures In Section 35.4, we present major protection

procedures Also in this section we review k-anonymity Then, Section 35.5 is focused on how

O Maimon, L Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,

DOI 10.1007/978-0-387-09823-4_35, © Springer Science+Business Media, LLC 2010

Trang 9

to measure data utility and information loss A few information loss measures are reviewed there The chapter finishes in Section 35.6 presenting different approaches for visualizing the trade-off between risk and utility, or risk and information loss Some conclusions close the chapter

35.2 On the Classification of Protection Procedures

The literature on Privacy Preserving Data Mining (PPDM) and on Statistical Disclosure Con-trol (SDC) is vast, and a large number of procedures for ensuring privacy have been proposed

We classify them in two categories according to the prior knowledge the data owner has about the usage of the data

Data-driven or general purpose protection procedures In this case, no specific analysis or usage is foreseen for the data The data owner does not know what kind of analysis will

be performed by the third party

This is the case when data is released for public use, as there is no way to know what kind

of study a potential user will perform This situation is common in National Statistical Offices, where data obtained from census and questionnaires can be e.g downloaded from internet (census.gov) A similar case can occur for other public offices that publish regularly data obtained from questionnaires Another case is when data are transferred to e.g researchers so that they can analyse them Hospitals and other healthcare institutions can also be the target of such protection procedures, as they can be interested in protection procedures that permit different researchers to apply different data analysis tools (e.g., regression, clustering, association rules)

Within data-driven procedures, subcategories can be distinguished according to the type

of data used The main distinction about data types is between original datafiles (e.g., individuals described in terms of attributes) and aggregates of the data (e.g., contingency tables) In the statistical disclosure control community, the former type corresponds to microdata and the later to tabular data

With respect to the type or structure of the original files, most of the research has been fo-cused on standard files with numerical or categorical data (ordinal or nominal categorical data) Nevertheless, other more complex types of data have also been considered in the literature, as, e.g., multirelational databases, logs, and social networks Another aspect to

be considered in relation to the structure of the files is about the constraints that the pro-tected data needs to satisfy (e.g., when there is a linear combination of some variables) Data protection methods need to consider such constraints so that the protected data also satisfies them (see e.g (Torra, 2008) for details on a classification of the constraints and

a study of microaggregation under this light)

Computation-driven or specific purpose protection procedures In this case it is known be-forehand which type of analysis has to be applied to the data As the data uses are known, protection procedures are defined according to the intented subsequent computa-tion Thus, protection procedures are tailored to a specific purpose

This will be the case of a retailer with a commercial database with information on cus-tomers having a fidelity card, when such data has to be transferred to a third party for market basket analysis For example, there exist tailored procedures for data protection for association rules They can be applied in this context of market basket analysis Results-driven protection procedures In this case, privacy concerns to the result of applying

a particular data mining method to some particular data (Atallah et al., 1999,Atzori et al.,

Trang 10

2008) For example, the association rules obtained from a commercial database should not permit the disclosure of sensitive information about particular customers

Although this class of procedures can be seen as computation-driven, they are important

enough to deserve their own class This class of methods are also known by anonymity preserving pattern discovery (Atzori et al., 2008), result privacy (Bertino et al., 2008), and output secrecy (Haritsa, 2008).

Other dimensions have been considered in the literature for classifying protection proce-dures One of them concerns the number of data sources

Single data source The data analysis only requires data from a single source

Multiple data sources Data from different sources have to be combined in order to compute

a certain analysis

The analysis of data protection procedures for multiple data sources usually falls within the computation-driven approach A typical scenario in this setting is when a few com-panies collaborate in a certain analysis, each one providing its own data base In the typical scenario within data privacy, data owners want to compute such analysis without disclosing their own data to the other data owners So, the goal is that at the end of the analysis the only additional information obtained by each of the data owners is the result

of the analysis itself That is, no extra knowledge should be acquired while computing the analysis

A trivial approach for solving this problem is to consider a trusted third party (TTP)

that computes the analysis This is the centralized approach In this case, data is just transferred using a completely secure channel (i.e., using cryptographic protocols) In contrast, in distributed privacy preserving data mining, data owners compute the analysis

in a collaborative manner In this way, the trusted third party is not needed For such computation, cryptographic tools are also used

Multiple data sources for data-driven protection procedures has limited interest Each data owner can publish its own data protected using general purpose protection procedures, and then data can be linked (using e.g record linkage algorithms) and finally analysed

So, this roughly corresponds to multidatabase mining

The literature often classifies protection procedures using another dimension concerning the type of tools used That is, methods are classified either as following the perturbative

or the cryptographic approach Our classification given above encompasses these two ap-proaches General purpose protection procedures follow the so-called perturbative approach, while computation-driven protection procedures mainly follow the cryptographic approach Note, however, that there are some papers on perturbative approaches as e.g noise

addi-tion for specific uses as e.g associaaddi-tion rules (see (Atallah et al., 1999)) Nevertheless, such

methods are general enough to be used in other applications So, they are general purpose protection procedures

In addition, it is important to underline that, in this chapter, we will not use the term

perturbative approach with the interpretation above Instead, we will use the term perturbative

methods/approaches in a more restricteed way (see Section 35.4), as it is usual in the statistical disclosure control community

In the rest of this section we further discuss both computation-driven and data-driven procedures

Ngày đăng: 04/07/2014, 05:21

TỪ KHÓA LIÊN QUAN