con-2.53 Definition The worst-case running time of an algorithm is an upper bound on the running time for any input, expressed as a function of the input size.. 2.54 Definition The avera
Trang 1For further information, see www.cacr.math.uwaterloo.ca/hac
CRC Press has granted the following specific permissions for the electronic version of this book:
Permission is granted to retrieve, print and store a single copy of this chapter for personal use This permission does not extend to binding multiple chapters of the book, photocopying or producing copies for other than personal use of the person creating the copy, or making electronic copies available for retrieval by others without prior permission in writing from CRC Press.
Except where over-ridden by the specific permission above, the standard copyright notice from CRC Press applies to this electronic version:
Neither this book nor any part may be reproduced or transmitted in any form or
by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher.
The consent of CRC Press does not extend to copying for general distribution, for promotion, for creating new works, or for resale Specific permission must be obtained in writing from CRC Press for such copying.
c
Trang 2One approach to distributing public-keys is the so-called Merkle channel (see Simmons
[1144, p.387]) Merkle proposed that public keys be distributed over so many independentpublic channels (newspaper, radio, television, etc.) that it would be improbable for an ad-versary to compromise all of them
In 1979 Kohnfelder [702] suggested the idea of using public-key certificates to facilitate
the distribution of public keys over unsecured channels, such that their authenticity can beverified Essentially the same idea, but by on-line requests, was proposed by Needham andSchroeder (ses Wilkes [1244])
A provably secure key agreement protocol has been proposed whose security is based on the
Heisenberg uncertainty principle of quantum physics The security of so-called quantum
cryptography does not rely upon any complexity-theoretic assumptions For further details
on quantum cryptography, consult Chapter 6 of Brassard [192], and Bennett, Brassard, andEkert [115]
§1.12
For an introduction and detailed treatment of many pseudorandom sequence generators, seeKnuth [692] Knuth cites an example of a complex scheme to generate random numberswhich on closer analysis is shown to produce numbers which are far from random, and con-
cludes: random numbers should not be generated with a method chosen at random.
Trang 3mod-Mathematical Background
Contents in Brief
2.1 Probability theory 50
2.2 Information theory 56
2.3 Complexity theory 57
2.4 Number theory 63
2.5 Abstract algebra 75
2.6 Finite fields 80
2.7 Notes and further references 85
This chapter is a collection of basic material on probability theory, information the-ory, complexity thethe-ory, number thethe-ory, abstract algebra, and finite fields that will be used throughout this book Further background and proofs of the facts presented here can be found in the references given in§2.7 The following standard notation will be used through-out: 1 Z denotes the set of integers; that is, the set { , −2, −1, 0, 1, 2, }.
2 Q denotes the set of rational numbers; that is, the set {a
b | a, b ∈ Z, b 6= 0}
3 R denotes the set of real numbers.
4 π is the mathematical constant; π≈ 3.14159
5 e is the base of the natural logarithm; e≈ 2.71828
6 [a, b] denotes the integers x satisfying a≤ x ≤ b
7 bxc is the largest integer less than or equal to x For example, b5.2c = 5 and b−5.2c = −6
8 dxe is the smallest integer greater than or equal to x For example, d5.2e = 6 and d−5.2e = −5
9 If A is a finite set, then|A| denotes the number of elements in A, called the cardinality
of A
10 a∈ A means that element a is a member of the set A
11 A⊆ B means that A is a subset of B
12 A⊂ B means that A is a proper subset of B; that is A ⊆ B and A 6= B
13 The intersection of sets A and B is the set A∩ B = {x | x ∈ A and x ∈ B}
14 The union of sets A and B is the set A∪ B = {x | x ∈ A or x ∈ B}
15 The difference of sets A and B is the set A− B = {x | x ∈ A and x 6∈ B}
16 The Cartesian product of sets A and B is the set A× B = {(a, b) | a ∈ A and b ∈
B} For example, {a1, a2} × {b1, b2, b3} = {(a1, b1), (a1, b2), (a1, b3), (a2, b1), (a2, b2), (a2, b3 }
Trang 417 A function or mapping f : A−→ B is a rule which assigns to each element a in Aprecisely one element b in B If a∈ A is mapped to b ∈ B then b is called the image
of a, a is called a preimage of b, and this is written f (a) = b The set A is called the
domain of f , and the set B is called the codomain of f
18 A function f : A−→ B is 1 − 1 (one-to-one) or injective if each element in B is the
image of at most one element in A Hence f (a1) = f (a2) implies a1= a2
19 A function f : A−→ B is onto or surjective if each b ∈ B is the image of at least
one a∈ A
20 A function f : A −→ B is a bijection if it is both one-to-one and onto If f is a
bijection between finite sets A and B, then|A| = |B| If f is a bijection between a
set A and itself, then f is called a permutation on A.
21 ln x is the natural logarithm of x; that is, the logarithm of x to the base e
22 lg x is the logarithm of x to the base 2
23 exp(x) is the exponential function ex
24 Pn
i=1aidenotes the sum a1+ a2+· · · + an
25 Qn
i=1aidenotes the product a1· a2· · · an
26 For a positive integer n, the factorial function is n! = n(n− 1)(n − 2) · · · 1 Byconvention, 0! = 1
2.1 Probability theory
2.1.1 Basic definitions
2.1 Definition An experiment is a procedure that yields one of a given set of outcomes The individual possible outcomes are called simple events The set of all possible outcomes is called the sample space.
This chapter only considers discrete sample spaces; that is, sample spaces with only
finitely many possible outcomes Let the simple events of a sample space S be labeled
s1, s2, , sn
2.2 Definition A probability distribution P on S is a sequence of numbers p1, p2, , pnthatare all non-negative and sum to 1 The number piis interpreted as the probability of sibeingthe outcome of the experiment
2.3 Definition An event E is a subset of the sample space S The probability that event E
occurs, denoted P (E), is the sum of the probabilities piof all simple events siwhich belong
to E If si∈ S, P ({si}) is simply denoted by P (si)
2.4 Definition If E is an event, the complementary event is the set of simple events not
be-longing to E, denoted E
2.5 Fact Let E⊆ S be an event
(i) 0≤ P (E) ≤ 1 Furthermore, P (S) = 1 and P (∅) = 0 (∅ is the empty set.)(ii) P (E) = 1− P (E)
Trang 5(iii) If the outcomes in S are equally likely, then P (E) = |E||S|.
2.6 Definition Two events E1and E2are called mutually exclusive if P (E1∩ E2) = 0 That
is, the occurrence of one of the two events excludes the possibility that the other occurs
2.7 Fact Let E1and E2be two events
(i) If E1⊆ E2, then P (E1 ≤ P (E2)
(ii) P (E1∪ E2) + P (E1∩ E2) = P (E1) + P (E2) Hence, if E1and E2are mutuallyexclusive, then P (E1∪ E2) = P (E1) + P (E2)
2.1.2 Conditional probability
2.8 Definition Let E1and E2be two events with P (E2) > 0 The conditional probability of
E1given E2, denoted P (E1|E2), is
P (E1|E2) =P (E1∩ E2
P (E2 .
P (E1|E2) measures the probability of event E1occurring, given that E2has occurred
2.9 Definition Events E1and E2are said to be independent if P (E1∩ E2) = P (E1)P (E2)
Observe that if E1and E2are independent, then P (E1|E2) = P (E1) and P (E2|E1) =
P (E2) That is, the occurrence of one event does not influence the likelihood of occurrence
Let S be a sample space with probability distribution P
2.11 Definition A random variable X is a function from the sample space S to the set of real
numbers; to each simple event si∈ S, X assigns a real number X(si)
Since S is assumed to be finite, X can only take on a finite number of values
2.12 DefinitionP Let X be a random variable on S The expected value or mean of X is E(X) =
Trang 6If a random variable has small variance then large deviations from the mean are likely to be observed This statement is made more precise below.
un-2.16 Fact (Chebyshev’s inequality) Let X be a random variable with mean µ = E(X) and
variance σ2= Var(X) Then for any t > 0,
num-2.18 Fact (properties of binomial coefficients) Let n and k be non-negative integers.
(i) nk
= k!(n−k)!n! (ii) nk
= n−kn
.(iii) n+1k+1
= nk+ k+1n
2.19 FactPn (binomial theorem) For any real numbers a, b, and non-negative integer n, (a+b)n=
k=0 nk
ak n−k
2.20 Definition A Bernoulli trial is an experiment with exactly two possible outcomes, called
success and failure.
2.21 Fact Suppose that the probability of success on a particular Bernoulli trial is p Then theprobability of exactly k successes in a sequence of n such independent trials is
nk
pk(1− p)n−k, for each 0≤ k ≤ n (2.1)
2.22 Definition The probability distribution (2.1) is called the binomial distribution.
2.23 Fact The expected number of successes in a sequence of n independent Bernoulli trials,with probability p of success in each trial, is np The variance of the number of successes
is np(1− p)
2.24 Fact (law of large numbers) Let X be the random variable denoting the fraction of
suc-cesses in n independent Bernoulli trials, with probability p of success in each trial Thenfor any > 0,
P (|X − p| > ) −→ 0, as n −→ ∞
In other words, as n gets larger, the proportion of successes should be close to p, theprobability of success in each trial
Trang 7
= 1n!
nXk=0(−1)n−k
nk
km,with the exception that0
2.26 Fact (classical occupancy problem) An urn has m balls numbered 1 to m Suppose that n
balls are drawn from the urn one at a time, with replacement, and their numbers are listed.The probability that exactly t different balls have been drawn is
P1(m, n, t) =
nt
m(t)
mn , 1≤ t ≤ n
The birthday problem is a special case of the classical occupancy problem
2.27 Fact (birthday problem) An urn has m balls numbered 1 to m Suppose that n balls are
drawn from the urn one at a time, with replacement, and their numbers are listed
(i) The probability of at least one coincidence (i.e., a ball drawn at least twice) is
1
√m
≈ 1 − exp
−n22m
.(ii) As m−→ ∞, the expected number of draws before a coincidence ispπm
2 .
The following explains why probability distribution (2.2) is referred to as the birthday
surprise or birthday paradox The probability that at least 2 people in a room of 23 people
have the same birthday is P2(365, 23)≈ 0.507, which is surprisingly large The quantity
P2(365, n) also increases rapidly as n increases; for example, P2(365, 30)≈ 0.706
A different kind of problem is considered in Facts 2.28, 2.29, and 2.30 below Supposethat there are two urns, one containing m white balls numbered 1 to m, and the other con-taining m red balls numbered 1 to m First, n1balls are selected from the first urn and theirnumbers listed Then n2balls are selected from the second urn and their numbers listed.Finally, the number of coincidences between the two lists is counted
2.28 Fact (model A) If the balls from both urns are drawn one at a time, with replacement, then
the probability of at least one coincidence is
Trang 8where the summation is over all 0≤ t1≤ n1, 0≤ t2≤ n2 If n = n1= n2, n = O(√
m)and m−→ ∞, then
P3(m, n1, n2 −→ 1 − exp
−n2m
1 + O
1
√m
≈ 1 − exp
−n2m
2.29 Fact (model B) If the balls from both urns are drawn without replacement, then the
prob-ability of at least one coincidence is
1 + n1+ n2− 1
1m
1 + O
1
√m
≈ 1 − exp−n1n2
m
2.1.6 Random mappings
2.31 Definition LetFndenote the collection of all functions (mappings) from a finite domain
of size n to a finite codomain of size n
Models where random elements of Fn are considered are called random mappings
models In this section the only random mappings model considered is where every function
fromFnis equally likely to be chosen; such models arise frequently in cryptography andalgorithmic number theory Note that|Fn| = nn, whence the probability that a particularfunction fromFnis chosen is 1/nn
2.32 Definition Let f be a function inFnwith domain and codomain equal to{1, 2, , n}
The functional graph of f is a directed graph whose points (or vertices) are the elements {1, 2, , n} and whose edges are the ordered pairs (x, f(x)) for all x ∈ {1, 2, , n}.
2.33 Example (functional graph) Consider the function f :{1, 2, , 13} −→ {1, 2, , 13}defined by f (1) = 4, f (2) = 11, f (3) = 1, f (4) = 6, f (5) = 3, f (6) = 9, f (7) = 3,
f (8) = 11, f (9) = 1, f (10) = 2, f (11) = 10, f (12) = 4, f (13) = 7 The functional
As Figure 2.1 illustrates, a functional graph may have several components (maximal connected subgraphs), each component consisting of a directed cycle and some directed
trees attached to the cycle.
2.34 Fact As n tends to infinity, the following statements regarding the functional digraph of arandom function f fromFnare true:
(i) The expected number of components is 12ln n
Trang 913 7
Figure 2.1:A functional graph (see Example 2.33).
(ii) The expected number of points which are on the cycles isp
(iv) The tree size of u is the number of edges in the maximal tree rooted on a cycle in the
component that contains u
(v) The component size of u is the number of edges in the component that contains u (vi) The predecessors size of u is the number of iterated preimages of u.
2.36 Example The functional graph in Figure 2.1 has 2 components and 4 terminal points Thepoint u = 3 has parameters λ(u) = 1, µ(u) = 4, ρ(u) = 5 The tree, component, andpredecessors sizes of u = 3 are 4, 9, and 3, respectively
2.37 Fact As n tends to infinity, the following are the expectations of some parameters ated with a random point in{1, 2, , n} and a random function from Fn: (i) tail length:p
associ-πn/8 (ii) cycle length:p
πn/8 (iii) rho-length:p
πn/2 (iv) tree size: n/3 (v) nent size: 2n/3 (vi) predecessors size:p
compo-πn/8
2.38 Fact As n tends to infinity, the expectations of the maximum tail, cycle, and rho lengths in
a random function fromFnare c1√
n arises after following a path oflength√
n edges
Trang 102.2 Information theory
2.2.1 Entropy
Let X be a random variable which takes on a finite set of values x1, x2, , xn, with ability P (X = xi) = pi, where 0≤ pi≤ 1 for each i, 1 ≤ i ≤ n, and wherePni=1pi = 1.Also, let Y and Z be random variables which take on finite sets of values
prob-The entropy of X is a mathematical measure of the amount of information provided by
an observation of X Equivalently, it is the uncertainity about the outcome before an vation of X Entropy is also useful for approximating the average number of bits required
obser-to encode the elements of X
2.39 Definition The entropy or uncertainty of X is defined to be H(X) =−Pni=1pilg pi=
Pn
i=1pilg
1
p i
where, by convention, pi· lg pi= pi· lg1
(ii) H(X) = 0 if and only if pi= 1 for some i, and pj= 0 for all j6= i (that is, there is
no uncertainty of the outcome)
(iii) H(X) = lg n if and only if pi = 1/n for each i, 1≤ i ≤ n (that is, all outcomes areequally likely)
2.41 Definition The joint entropy of X and Y is defined to be
where the summation index x ranges over all values of X The conditional entropy of X
given Y , also called the equivocation of Y about X, is
H(X|Y ) =X
y
P (Y = y)H(X|Y = y),where the summation index y ranges over all values of Y
2.44 Fact (properties of conditional entropy) Let X and Y be random variables.
(i) The quantity H(X|Y ) measures the amount of uncertainty remaining about X after
Y has been observed
Trang 11(ii) H(X|Y ) ≥ 0 and H(X|X) = 0.
(iii) H(X, Y ) = H(X) + H(Y|X) = H(Y ) + H(X|Y )
(iv) H(X|Y ) ≤ H(X), with equality if and only if X and Y are independent
2.2.2 Mutual information
2.45 Definition The mutual information or transinformation of random variables X and Y is
I(X; Y ) = H(X)− H(X|Y ) Similarly, the transinformation of X and the pair Y , Z isdefined to be I(X; Y, Z) = H(X)− H(X|Y, Z)
2.46 Fact (properties of mutual transinformation)
(i) The quantity I(X; Y ) can be thought of as the amount of information that Y revealsabout X Similarly, the quantity I(X; Y, Z) can be thought of as the amount of in-formation that Y and Z together reveal about X
(ii) I(X; Y )≥ 0
(iii) I(X; Y ) = 0 if and only if X and Y are independent (that is, Y contributes no formation about X)
in-(iv) I(X; Y ) = I(Y ; X)
2.47 Definition The conditional transinformation of the pair X, Y given Z is defined to be
IZ(X; Y ) = H(X|Z) − H(X|Y, Z)
2.48 Fact (properties of conditional transinformation)
(i) The quantity IZ(X; Y ) can be interpreted as the amount of information that Y vides about X, given that Z has already been observed
pro-(ii) I(X; Y, Z) = I(X; Y ) + IY(X; Z)
2.49 Definition An algorithm is a well-defined computational procedure that takes a variable
input and halts with an output
Trang 12Of course, the term “well-defined computational procedure” is not mathematically cise It can be made so by using formal computational models such as Turing machines,random-access machines, or boolean circuits Rather than get involved with the technicalintricacies of these models, it is simpler to think of an algorithm as a computer programwritten in some specific programming language for a specific computer that takes a vari-able input and halts with an output.
pre-It is usually of interest to find the most efficient (i.e., fastest) algorithm for solving agiven computational problem The time that an algorithm takes to halt depends on the “size”
of the problem instance Also, the unit of time used should be made precise, especially whencomparing the performance of two algorithms
2.50 Definition The size of the input is the total number of bits needed to represent the input
in ordinary binary notation using an appropriate encoding scheme Occasionally, the size
of the input will be the number of items in the input
2.51 Example (sizes of some objects)
(i) The number of bits in the binary representation of a positive integer n is 1 +blg ncbits For simplicity, the size of n will be approximated by lg n
(ii) If f is a polynomial of degree at most k, each coefficient being a non-negative integer
at most n, then the size of f is (k + 1) lg n bits
(iii) If A is a matrix with r rows, s columns, and with non-negative integer entries each
2.52 Definition The running time of an algorithm on a particular input is the number of
prim-itive operations or “steps” executed
Often a step is taken to mean a bit operation For some algorithms it will be more venient to take step to mean something else such as a comparison, a machine instruction, amachine clock cycle, a modular multiplication, etc
con-2.53 Definition The worst-case running time of an algorithm is an upper bound on the running
time for any input, expressed as a function of the input size
2.54 Definition The average-case running time of an algorithm is the average running time
over all inputs of a fixed size, expressed as a function of the input size
2.3.2 Asymptotic notation
It is often difficult to derive the exact running time of an algorithm In such situations one
is forced to settle for approximations of the running time, and usually may only derive the
asymptotic running time That is, one studies how the running time of the algorithm
in-creases as the size of the input inin-creases without bound
In what follows, the only functions considered are those which are defined on the tive integers and take on real values that are always positive from some point onwards Let
posi-f and g be two such posi-functions
2.55 Definition (order notation)
(i) (asymptotic upper bound) f (n) = O(g(n)) if there exists a positive constant c and a
positive integer n0such that 0≤ f(n) ≤ cg(n) for all n ≥ n0
Trang 13(ii) (asymptotic lower bound) f (n) = Ω(g(n)) if there exists a positive constant c and a
positive integer n0such that 0≤ cg(n) ≤ f(n) for all n ≥ n0
(iii) (asymptotic tight bound) f (n) = Θ(g(n)) if there exist positive constants c1and c2,and a positive integer n0such that c1g(n)≤ f(n) ≤ c2g(n) for all n≥ n0
(iv) (o-notation) f (n) = o(g(n)) if for any positive constant c > 0 there exists a constant
n0> 0 such that 0≤ f(n) < cg(n) for all n ≥ n0
Intuitively, f (n) = O(g(n)) means that f grows no faster asymptotically than g(n) towithin a constant multiple, while f (n) = Ω(g(n)) means that f (n) grows at least as fastasymptotically as g(n) to within a constant multiple f (n) = o(g(n)) means that g(n) is anupper bound for f (n) that is not asymptotically tight, or in other words, the function f (n)becomes insignificant relative to g(n) as n gets larger The expression o(1) is often used tosignify a function f (n) whose limit as n approaches∞ is 0
2.56 Fact (properties of order notation) For any functions f (n), g(n), h(n), and l(n), the
fol-lowing are true
(i) f (n) = O(g(n)) if and only if g(n) = Ω(f (n))
(ii) f (n) = Θ(g(n)) if and only if f (n) = O(g(n)) and f (n) = Ω(g(n))
(iii) If f (n) = O(h(n)) and g(n) = O(h(n)), then (f + g)(n) = O(h(n))
(iv) If f (n) = O(h(n)) and g(n) = O(l(n)), then (f· g)(n) = O(h(n)l(n))
(v) (reflexivity) f (n) = O(f (n)).
(vi) (transitivity) If f (n) = O(g(n)) and g(n) = O(h(n)), then f (n) = O(h(n)).
2.57 Fact (approximations of some commonly occurring functions)
(i) (polynomial function) If f (n) is a polynomial of degree k with positive leading term,
then f (n) = Θ(nk)
(ii) For any constant c > 0, logcn = Θ(lg n)
(iii) (Stirling’s formula) For all integers n≥ 1,
√2πn
ne
n
≤ n! ≤ √2πn
ne
n+(1/(12n))
.Thus n! =√
2πn nen
1 + Θ(n1)
Also, n! = o(nn) and n! = Ω(2n)
(iv) lg(n!) = Θ(n lg n)
2.58 Example (comparative growth rates of some functions) Let and c be arbitrary constants
with 0 < < 1 < c The following functions are listed in increasing order of their totic growth rates:
asymp-1 < ln ln n < ln n < exp(√
ln n ln ln n) < n< nc< nln n< cn< nn< ccn
2.3.3 Complexity classes
2.59 Definition A polynomial-time algorithm is an algorithm whose worst-case running time
function is of the form O(nk), where n is the input size and k is a constant Any algorithm
whose running time cannot be so bounded is called an exponential-time algorithm Roughly speaking, polynomial-time algorithms can be equated with good or efficient algorithms, while exponential-time algorithms are considered inefficient There are, how-
ever, some practical situations when this distinction is not appropriate When consideringpolynomial-time complexity, the degree of the polynomial is significant For example, even
Trang 14though an algorithm with a running time of O(nln ln n), n being the input size, is ically slower that an algorithm with a running time of O(n100), the former algorithm may
asymptot-be faster in practice for smaller values of n, especially if the constants hidden by the big-Onotation are smaller Furthermore, in cryptography, average-case complexity is more im-portant than worst-case complexity — a necessary condition for an encryption scheme to
be considered secure is that the corresponding cryptanalysis problem is difficult on average(or more precisely, almost always difficult), and not just for some isolated cases
2.60 Definition A subexponential-time algorithm is an algorithm whose worst-case running
time function is of the form eo(n), where n is the input size
A subexponential-time algorithm is asymptotically faster than an algorithm whose ning time is fully exponential in the input size, while it is asymptotically slower than apolynomial-time algorithm
run-2.61 Example (subexponential running time) Let A be an algorithm whose inputs are either
elements of a finite fieldFq(see§2.6), or an integer q If the expected running time of A is
For simplicity, the theory of computational complexity restricts its attention to
deci-sion problems, i.e., problems which have either YES or NO as an answer This is not too
restrictive in practice, as all the computational problems that will be encountered here can
be phrased as decision problems in such a way that an efficient algorithm for the decisionproblem yields an efficient algorithm for the computational problem, and vice versa
2.62 Definition The complexity class P is the set of all decision problems that are solvable in
polynomial time
2.63 Definition The complexity class NP is the set of all decision problems for which a YES
answer can be verified in polynomial time given some extra information, called a certificate.
2.64 Definition The complexity class co-NP is the set of all decision problems for which a NO
answer can be verified in polynomial time using an appropriate certificate
It must be emphasized that if a decision problem is in NP, it may not be the case that the
certificate of a YES answer can be easily obtained; what is asserted is that such a certificatedoes exist, and, if known, can be used to efficiently verify the YES answer The same is
true of the NO answers for problems in co-NP.
2.65 Example (problem in NP) Consider the following decision problem:
COMPOSITES
INSTANCE: A positive integer n
QUESTION: Is n composite? That is, are there integers a, b > 1 such that n = ab?
COMPOSITES belongs to NP because if an integer n is composite, then this fact can be
verified in polynomial time if one is given a divisor a of n, where 1 < a < n (the certificate
in this case consists of the divisor a) It is in fact also the case that COMPOSITES belongs
to co-NP It is still unknown whether or not COMPOSITES belongs to P.
Trang 152.66 Fact P ⊆ NP and P ⊆ co-NP.
The following are among the outstanding unresolved questions in the subject of plexity theory:
no harder than L2
2.68 Definition Let L1and L2be two decision problems If L1 ≤P L2and L2 ≤P L1, then
L1and L2are said to be computationally equivalent.
2.69 Fact Let L1, L2, and L3be three decision problems
(i) (transitivity) If L1≤P L2and L2≤P L3, then L1≤P L3
(ii) If L1≤P L2and L2∈ P, then L1 ∈ P.
2.70 Definition A decision problem L is said to be NP-complete if
(i) L∈ NP, and
(ii) L1≤P L for every L1∈ NP.
The class of all NP-complete problems is denoted by NPC.
NP-complete problems are the hardest problems in NP in the sense that they are at least as difficult as every other problem in NP There are thousands of problems drawn from diverse fields such as combinatorics, number theory, and logic, that are known to be NP-
complete
2.71 Example (subset sum problem) The subset sum problem is the following: given a set of
positive integers{a1, a2, , an} and a positive integer s, determine whether or not there
is a subset of the aithat sum to s The subset sum problem is NP-complete.
2.72 Fact Let L1and L2be two decision problems
(i) If L1is NP-complete and L1∈ P, then P = NP.
(ii) If L1∈ NP, L2 is NP-complete, and L2≤P L1, then L1is also NP-complete.
(iii) If L1is NP-complete and L1∈ co-NP, then NP = co-NP.
By Fact 2.72(i), if a polynomial-time algorithm is found for any single NP-complete problem, then it is the case that P = NP, a result that would be extremely surprising Hence,
a proof that a problem is NP-complete provides strong evidence for its intractability
Fig-ure 2.2 illustrates what is widely believed to be the relationship between the complexity
classes P, NP, co-NP, and NPC.
Fact 2.72(ii) suggests the following procedure for proving that a decision problem L1
is NP-complete:
Trang 16NPCNPP
2.3.4 Randomized algorithms
The algorithms studied so far in this section have been deterministic; such algorithms
fol-low the same execution path (sequence of operations) each time they execute with the same
input By contrast, a randomized algorithm makes random decisions at certain points in
the execution; hence their execution paths may differ each time they are invoked with thesame input The random decisions are based upon the outcome of a random number gen-erator Remarkably, there are many problems for which randomized algorithms are knownthat are more efficient, both in terms of time and space, than the best known deterministicalgorithms
Randomized algorithms for decision problems can be classified according to the ability that they return the correct answer
prob-2.75 Definition Let A be a randomized algorithm for a decision problem L, and let I denote
an arbitrary instance of L
(i) A has 0-sided error if P (A outputs YES| I’s answer is YES ) = 1, and
P (A outputs YES| I’s answer is NO ) = 0
(ii) A has 1-sided error if P (A outputs YES| I’s answer is YES ) ≥ 1
2, and
P (A outputs YES| I’s answer is NO ) = 0
Trang 17(iii) A has 2-sided error if P (A outputs YES| I’s answer is YES ) ≥ 2
3, and
P (A outputs YES| I’s answer is NO ) ≤ 1
3.The number 12 in the definition of 1-sided error is somewhat arbitrary and can be re-placed by any positive constant Similarly, the numbers23and13in the definition of 2-sidederror, can be replaced by12+ and 12− , respectively, for any constant , 0 < < 1
2.
2.76 Definition The expected running time of a randomized algorithm is an upper bound on the
expected running time for each input (the expectation being over all outputs of the randomnumber generator used by the algorithm), expressed as a function of the input size.The important randomized complexity classes are defined next
2.77 Definition (randomized complexity classes)
(i) The complexity class ZPP (“zero-sided probabilistic polynomial time”) is the set of
all decision problems for which there is a randomized algorithm with 0-sided errorwhich runs in expected polynomial time
(ii) The complexity class RP (“randomized polynomial time”) is the set of all decision
problems for which there is a randomized algorithm with 1-sided error which runs in(worst-case) polynomial time
(iii) The complexity class BPP (“bounded error probabilistic polynomial time”) is the set
of all decision problems for which there is a randomized algorithm with 2-sided errorwhich runs in (worst-case) polynomial time
2.78 Fact P ⊆ ZPP ⊆ RP ⊆ BPP and RP ⊆ NP.
2.4 Number theory
2.4.1 The integers
The set of integers{ , −3, −2, −1, 0, 1, 2, 3, } is denoted by the symbol Z
2.79 Definition Let a, b be integers Then a divides b (equivalently: a is a divisor of b, or a is
a factor of b) if there exists an integer c such that b = ac If a divides b, then this is denoted
by a|b
2.80 Example (i)−3|18, since 18 = (−3)(−6) (ii) 173|0, since 0 = (173)(0)
The following are some elementary properties of divisibility
2.81 Fact (properties of divisibility) For all a, b, c∈ Z, the following are true:
(i) a|a
(ii) If a|b and b|c, then a|c
(iii) If a|b and a|c, then a|(bx + cy) for all x, y ∈ Z
(iv) If a|b and b|a, then a = ±b
Trang 182.82 Definition (division algorithm for integers) If a and b are integers with b ≥ 1, then
or-dinary long division of a by b yields integers q (the quotient) and r (the remainder) such
that
a = qb + r, where 0≤ r < b
Moreover, q and r are unique The remainder of the division is denoted a mod b, and thequotient is denoted a div b
2.83 Fact Let a, b∈ Z with b 6= 0 Then a div b = ba/bc and a mod b = a − bba/bc
2.84 Example If a = 73, b = 17, then q = 4 and r = 5 Hence 73 mod 17 = 5 and
2.85 Definition An integer c is a common divisor of a and b if c|a and c|b
2.86 Definition A non-negative integer d is the greatest common divisor of integers a and b,
denoted d = gcd(a, b), if
(i) d is a common divisor of a and b; and
(ii) whenever c|a and c|b, then c|d
Equivalently, gcd(a, b) is the largest positive integer that divides both a and b, with the ception that gcd(0, 0) = 0
ex-2.87 Example The common divisors of 12 and 18 are{±1, ±2, ±3, ±6},and gcd(12, 18) = 6
2.88 Definition A non-negative integer d is the least common multiple of integers a and b,
de-noted d = lcm(a, b), if
(i) a|d and b|d; and
(ii) whenever a|c and b|c, then d|c
Equivalently, lcm(a, b) is the smallest non-negative integer divisible by both a and b
2.89 Fact If a and b are positive integers, then lcm(a, b) = a· b/ gcd(a, b)
2.90 Example Since gcd(12, 18) = 6, it follows that lcm(12, 18) = 12· 18/6 = 36
2.91 Definition Two integers a and b are said to be relatively prime or coprime if gcd(a, b) = 1.
2.92 Definition An integer p≥ 2 is said to be prime if its only positive divisors are 1 and p Otherwise, p is called composite.
The following are some well known facts about prime numbers
2.93 Fact If p is prime and p|ab, then either p|a or p|b (or both)
2.94 Fact There are an infinite number of prime numbers
2.95 Fact (prime number theorem) Let π(x) denote the number of prime numbers≤ x Then
limx→∞
π(x)x/ ln x = 1.
Trang 19This means that for large values of x, π(x) is closely approximated by the sion x/ ln x For instance, when x = 1010, π(x) = 455, 052, 511, whereasbx/ ln xc =
expres-434, 294, 481 A more explicit estimate for π(x) is given below
2.96 Fact Let π(x) denote the number of primes≤ x Then for x ≥ 17
π(x) > x
ln xand for x > 1
lcm(a, b) = pmax(e1 1,f1)pmax(e2 2,f2)· · · pmax(ek ,f k )
2.99 Example Let a = 4864 = 28· 19, b = 3458 = 2 · 7 · 13 · 19 Then gcd(4864, 3458) =
2· 19 = 38 and lcm(4864, 3458) = 28· 7 · 13 · 19 = 442624
2.100 Definition For n≥ 1, let φ(n) denote the number of integers in the interval [1, n] which
are relatively prime to n The function φ is called the Euler phi function (or the Euler totient
function).
2.101 Fact (properties of Euler phi function)
(i) If p is a prime, then φ(p) = p− 1
(ii) The Euler phi function is multiplicative That is, if gcd(m, n) = 1, then φ(mn) =
2.102 Fact For all integers n≥ 5,
φ(n) > n
6 ln ln n.
Trang 202.4.2 Algorithms in Z
Let a and b be non-negative integers, each less than or equal to n Recall (Example 2.51)that the number of bits in the binary representation of n isblg nc + 1, and this number isapproximated by lg n The number of bit operations for the four basic integer operations ofaddition, subtraction, multiplication, and division using the classical algorithms is summa-rized in Table 2.1 These algorithms are studied in more detail in§14.2 More sophisticatedtechniques for multiplication and division have smaller complexities
Addition a + b O(lg a + lg b) = O(lg n)Subtraction a− b O(lg a + lg b) = O(lg n)Multiplication a· b O((lg a)(lg b)) = O((lg n)2Division a = qb + r O((lg q)(lg b)) = O((lg n)2
Table 2.1:Bit complexity of basic operations in Z.
The greatest common divisor of two integers a and b can be computed via Fact 2.98.However, computing a gcd by first obtaining prime-power factorizations does not result in
an efficient algorithm, as the problem of factoring integers appears to be relatively cult The Euclidean algorithm (Algorithm 2.104) is an efficient algorithm for computingthe greatest common divisor of two integers that does not require the factorization of theintegers It is based on the following simple fact
diffi-2.103 Fact If a and b are positive integers with a > b, then gcd(a, b) = gcd(b, a mod b)
2.104 AlgorithmEuclidean algorithm for computing the greatest common divisor of two integersINPUT: two non-negative integers a and b with a≥ b
OUTPUT: the greatest common divisor of a and b
1 While b6= 0 do the following:
1.1 Set r←a mod b, a←b, b←r
2 Return(a)
2.105 Fact Algorithm 2.104 has a running time of O((lg n)2) bit operations
2.106 Example (Euclidean algorithm) The following are the division steps of Algorithm 2.104