the computational complexity of machine learning - michael j kearns

This book is a detailed investigation of the computational complexity ofmachine learning from examples in the distribution-free model introduced byL.G.. Our results include general tools

Trang 1

of Machine Learning

Trang 2

The Computational Complexity

Trang 3

For their love and courage

Trang 4

2.1 Representing subsets of a domain : : : : : : : : : : : : : : : : 62.2 Distribution-free learning : : : : : : : : : : : : : : : : : : : : : 92.3 An example of ecient learning : : : : : : : : : : : : : : : : : 142.4 Other de nitions and notation : : : : : : : : : : : : : : : : : : 172.5 Some representation classes : : : : : : : : : : : : : : : : : : : 19

3.1 Ecient learning algorithms and hardness results : : : : : : : 223.2 Characterizations of learnable classes : : : : : : : : : : : : : : 273.3 Results in related models : : : : : : : : : : : : : : : : : : : : : 29

4.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 334.2 Composing learning algorithms to obtain new algorithms : : : 34

Trang 5

5 Learning in the Presence of Errors 45

5.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 455.2 De nitions and notation for learning with errors : : : : : : : : 485.3 Absolute limits on learning with errors : : : : : : : : : : : : : 525.4 Ecient error-tolerant learning : : : : : : : : : : : : : : : : : 605.5 Limits on ecient learning with errors : : : : : : : : : : : : : 77

6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 856.2 Lower bounds on the number of examples needed for positive-only and negative-only learning : : : : : : : : : : : : : : : : : 866.3 A general lower bound on the number of examples needed forlearning : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 906.3.1 Applications of the general lower bound : : : : : : : : 966.4 Expected sample complexity : : : : : : : : : : : : : : : : : : : 99

7.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1017.2 Background from cryptography : : : : : : : : : : : : : : : : : 1057.3 Hard learning problems based on cryptographic functions : : : 1087.3.1 A learning problem based on RSA: : : : : : : : : : : : 1097.3.2 A learning problem based on quadratic residues : : : : 1117.3.3 A learning problem based on factoring Blum integers : 114

Trang 6

7.4 Learning small Boolean formulae, nite automata and thresholdcircuits is hard : : : : : : : : : : : : : : : : : : : : : : : : : : 1167.5 A generalized construction based on any trapdoor function : : 1187.6 Application: hardness results for approximation algorithms : : 121

8.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1298.2 A polynomial-time weak learning algorithm for all monotoneBoolean functions under uniform distributions : : : : : : : : : 1308.3 A polynomial-time learning algorithm for DNF under uniformdistributions : : : : : : : : : : : : : : : : : : : : : : : : : : : : 132

9.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1409.2 The equivalence : : : : : : : : : : : : : : : : : : : : : : : : : : 141

Trang 7

This book is a revision of my doctoral dissertation, which was completed inMay 1989 at Harvard University While the changes to the theorems andproofs are primarily clari cations of or corrections to my original thesis, Ihave added a signi cant amount of expository and explanatory material, in an

eort to make the work at least partially accessible to an audience wider thanthe \mainstream" theoretical computer science community Thus, there aremore examples and more informal intuition behind the formal mathematicalresults My hope is that those lacking the background for the formal proofs willnevertheless be able to read selectively, and gain some useful understanding ofthe goals, successes and shortcomings of computational learning theory.Computational learning theory can be broadly and imprecisely de ned asthe mathematical study of ecient learning by machines or computationalsystems The demand for eciency is one of the primary characteristics distin-guishing computational learning theory from the older but still active areas ofinductive inference and statistical pattern recognition Thus, computationallearning theory encompasses a wide variety of interesting learning environ-ments and formal models, too numerous to detail in any single volume Ourmarizing work in various learning models and then carefully scrutinizing asingle model that is reasonably natural and realistic, and has enjoyed greatpopularity in its infancy

This book is a detailed investigation of the computational complexity ofmachine learning from examples in the distribution-free model introduced byL.G Valiant [93] (also known as the probably approximately correct model oflearning) In the distribution-free model, a learning algorithm receives positive

Trang 8

and negative examples of an unknown target set (or concept) that is chosenfrom some known class of sets (or concept class) These examples are gen-erated randomly according to a xed but unknown probability distributionrepresenting Nature, and the goal of the learning algorithm is to infer an hy-pothesis concept that closely approximates the target concept with respect

to the unknown distribution This book is concerned with proving theoremsabout learning in this formal mathematical model

As we have mentioned, we are primarily interested in the phenomenon of cientlearning in the distribution-free model, in the standard polynomial-timesense Our results include general tools for determining the polynomial-timelearnability of a concept class, an extensive study of ecient learning whenerrors are present in the examples, and lower bounds on the number of exam-ples required for learning in our model A centerpiece of the book is a series

ef-of results demonstrating the computational diculty ef-of learning a number

of well-studied concept classes These results are obtained by reducing someapparently hard number-theoretic problems from public-key cryptography tothe learning problems The hard-to-learn concept classes include the sets rep-resented by Boolean formulae, deterministic nite automata and a simpli nding with high probabilityhypotheses inH consistent with an input sample generated by a representation

in C Subsequent papers by Board and Pitt [26] and Schapire [90] considerthe stronger and philosophically interesting converse of Theorem 3.1 in whichone actually uses a polynomial-time learning algorithm not just for nding

a consistent hypothesis, but for performing data compression on a sample.Recently generalizations of Occam's Razor to models more complicated thanconcept learning have been given by Kearns and Schapire [63]

One drawback of Theorem 3.1 is that the hypothesis output by the ing algorithm must have a polynomial-size representation as a string of bitsfor the result to apply Thus, it is most appropriate for discrete domains,where instances are speci ed as nite strings of bits, and does not apply well

learn-to representation classes over real-valued domains, where the speci cation of asingle instance may not have any nite representation as a bit string This ledBlumer et al to seek a general characterization of the sample complexity of

Trang 40

Recent Research in Computational Learning Theory 29

learning any representation class They show that the Vapnik-Chervonenkis

di-vcd(C))examples are required for the distribution-free learning of C, and O(vcd(C))are sucient for learning (ignoring for the moment the dependence on and

), with any algorithm nding a consistent hypothesis in C being a ing algorithm Thus, the classes that are learnable in any amount of time

learn-in the distribution-free model are exactly those classes with nite Chervonenkis dimension These results will be discussed in greater detail inChapter 6, where we improve the lower bound on sample complexity given

Vapnik-by Blumer et al [25] Recently many of the ideas contained in the work ofBlumer et al have been greatly generalized by Haussler [50], who applies uni-form convergence techniques developed by many authors to determine samplesize bounds for relatively unrestricted models of learning

As we have mentioned, the results of Blumer et al apply primarily to thesample complexity of learning A step towards characterizing what is polyno-mially learnable was taken by Pitt and Warmuth [79] They de ne a naturalnotion of polynomial-time reducibility between learning problems, analogous

to the notion of reducibility in complexity theory and generalizing simple ductions given here in Section 4.3 and by Littlestone [73] Pitt and Warmuthare able to give partial characterizations of the complexity of learning vari-ous representation classes by nding \learning-complete" problems for theserepresentations classes For example, they prove that if deterministic niteautomata are polynomially learnable, then the class of all languages accepted

re-by log-space Turing machines is polynomially learnable

3.3 Results in related models

A number of restrictions and extensions of the basic model of Valiant have beenconsidered These modi cations are usually proposed either in an attempt tomake the model more realistic (e.g., adding noise to the sample data) or tomake the learning task easier in cases where distribution-free learning appearsdicult One may also modify the model in order to more closely examine theresources required for learning, such as space complexity

For instance, for classes for which learning is known to be intractable

of the 19th A.C.M Symposium on the Theory of Computing [60] The results

of Chapter initially appeared in the paper \Learning in the presence of cious errors",... demonstrating the computational diculty ef -of learning a number

of well-studied concept classes These results are obtained by reducing someapparently hard number-theoretic problems from public-key

Tiêu đề	The Computational Complexity of Machine Learning
Tác giả	Michael J. Kearns
Trường học	The Massachusetts Institute of Technology
Chuyên ngành	Machine Learning
Thể loại	Thesis
Thành phố	Cambridge

Định dạng
Số trang	176
Dung lượng	776,56 KB