× 2 −1.The Rounding to Nearest Rule says to round up since the 53rd bit is nonzero, and further bitsare nonzero.. , 52 of the mantissa will not incur rounding error... After subtracting
Trang 16 (a) P (x) = a0 + x5(a5 + x5(a10 + x5a15)) The three multiplications x2 = x · x, x4 =
x2· x2, x5 = x4· x are needed, together with 3 multiplications and 3 additions from the nested
multiplication Total of 6 multiplications and 3 additions
6 (b) P (x) = x7(a7 + x5(a12 + x5(a17+ x5(a22+ x5a27)))) The four multiplications x2 =
x · x, x4 = x2· x2, x5 = x4· x, x7 = x5· x2 are needed, together with 5 multiplications and 4additions from the nested multiplication Total of 9 multiplications and 4 additions
7 The degree n polynomial with base points is P (x) = c1+ (x − r1)(c2+ (x − r2)(c3 + (x −
r3)(c4+ + (x − r n )c n+1))) The operations needed are n multiplications and 2n additions
COMPUTER PROBLEMS 0.1
1 The MATLAB command nest(50,ones(51,1),1.00001) gives 51.01275208274999,
differing from (x51− 1)/(x − 1) with x = 1.00001 by 4.76 × 10 −12
Trang 22 The command nest(99,(-1).ˆ(0:99),1.00001) gives −0.00050024507964763 The
equivalent expression (1 − x100)/(1 + x) for x = 1.00001 differs by 1.713 × 10 −16
EXERCISES 0.2 Binary Numbers
2 (a) (1/8)10= (2−3)10 = (0.001)2
2 (b) (7/8)10= (2−1+ 2−2+ 2−3)10 = (0.111)2
2 (c) (35/16)10= (2 + 3/16)10= (2 + 1/8 + 1/16)10= (10.0011)2
Trang 3SECTION 0.2 BINARYNUMBERS 3
3 × 2 = 1
3+ 11
3 × 2 = 2
3+ 0
7 × 2 = 6
7+ 06
7 × 2 = 57+ 15
7 × 2 = 37+ 13
7 × 2 = 67+ 0
Therefore (5
7)10= (0.101)2
Trang 5SECTION 0.2 BINARYNUMBERS 5
4 (b)
2
3 × 2 = 1
3+ 11
3 × 2 = 2
3+ 02
3 × 2 = 1
3+ 1
5 × 2 = 2
5+ 02
5 × 2 = 4
5+ 04
5 × 2 = 3
5+ 13
5 × 2 = 1
5+ 1
Trang 7SECTION 0.2 BINARYNUMBERS 7
Trang 88 (d) (1010.01)2 = (23+ 21)10+ (0.01)2 Set x = (0.01)2 Then 22x − x = (01)2 implies x = 1
(15 + 15/56)10.EXERCISES 0.3 Floating Point Representation of Real Numbers
3) = +1 0101010101010101010101010101010101010101010101010101 × 2 −1
1 (d) (0.9)10 = (0.11100)2 =
+1 1100110011001100110011001100110011001100110011001100 1100 × 2 −1.The Rounding to Nearest Rule says to round up since the 53rd bit is nonzero, and further bitsare nonzero
"
= +1 1001001001001001001001001001001001001001001001001001 × 22
3 Note that fl(5) = 1.01 ×22 Adding 1 as bit 3, 4, , 52 of the mantissa will not incur rounding
error These correspond to 2−k for k = 1, 2, , 50.
4 Note that fl(19) = 1.0011 × 24 Adding 1 to bit 52 of the mantissa, corresponding to 19 + 2−48,
will not be rounded away, and so 48 is the largest such k.
Trang 9SECTION 0.3 FLOATINGPOINT REPRESENTATION OFREALNUMBERS 9
5 (a) 1 + (2−51+ 2−53) =
+1 0000000000000000000000000000000000000000000000000010 1 × 20.fl(1 + (2−51+ 2−53)) =
+1 0000000000000000000000000000000000000000000000000010 ×20,using the ing to Nearest Rule Therefore fl((1 + (2−51+ 2−53))− 1) =
Round-. 0000000000000000000000000000000000000000000000000010
= 1 0000000000000000000000000000000000000000000000000000 × 2 −51= 2−51
5 (b) 1 + (2−51+ 2−52+ 2−53) =
+1 0000000000000000000000000000000000000000000000000011 1 × 20.fl(1 + (2−51+ 2−52+ 2−53)) =
+1 0000000000000000000000000000000000000000000000000100 ×20,using the ing to Nearest Rule Therefore fl((1 + (2−51+ 2−52+ 2−53))− 1) =
Round-. 0000000000000000000000000000000000000000000000000100
= 1 0000000000000000000000000000000000000000000000000000 × 2 −50= 2−50
6 (a) 1 + (2−51+ 2−52+ 2−54)
= +1 0000000000000000000000000000000000000000000000000011 01 × 20.fl(1 + (2−51+ 2−52+ 2−54)) =
+1 0000000000000000000000000000000000000000000000000011 ×20,using the ing to Nearest Rule Therefore fl((1 + (2−51+ 2−52+ 2−54))− 1) =
+1 0000000000000000000000000000000000000000000000000011 ×20,using the ing to Nearest Rule Therefore fl((1 + (2−51+ 2−52+ 2−60))− 1) =
Round-. 0000000000000000000000000000000000000000000000000011 =
1 1000000000000000000000000000000000000000000000000000 × 2 −51
= 2−51+ 2−52 = 3"mach
7 (a) (8)10 = (1000.)2 = 1.0 ×23 The biased exponent is 3+1023 = 1026, which is 210+2 The
sign is 0 (positive), so the sign/exponent is represented by the binary string 0100 0000 0010
The mantissa is 52 zeros, so the machine representation is the 64 bits
0100 0000 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
or 4020000000000000 in hex format
7 (b) (21)10 = (10101.)2 = 1.0101 × 24 The biased exponent is 4 + 1023 = 1027 = 210+ 3,
represented by 100 0000 0011 The machine representation is
0100 0000 0011 0101 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
Trang 10or 4035000000000000 in hex format.
7 (c) (1/8)10 = 1.0 × 2 −3 The biased exponent is −3 + 1023 = 1020 = 210− 4, represented by
011 1111 1100 The machine representation is
0011 1111 1100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
or 3fc0000000000000 in hex format
7 (d) (1/3)10 = 1.01 × 2 −2 , and after rounding down, fl(1/3) = 1.0101 0101 × 2 −2 The
biased exponent is −2 + 1023 = 1021 = 210− 3, represented by 011 1111 1101 The machine
representation is
0011 1111 1101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101
or 3fd5555555555555 in hex format
7 (e) (2/3)10 = 1.01 × 2 −1 , and after rounding down, fl(1/3) = 1.0101 0101 × 2 −1 The
biased exponent is −1 + 1023 = 1022 = 210− 2, represented by 011 1111 1110 The machine
representation is
0011 1111 1110 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101
or 3fe5555555555555 in hex format
7 (f) (0.1)10 = 1.1001 × 2 −4 , and after rounding up, fl(0.1) = 1.1001 1001 1010 × 2 −4 The
biased exponent is −4 + 1023 = 1019 = 210− 5, represented by 011 1111 1011 The machine
representation is
0011 1111 1011 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010
or 3fb999999999999a in hex format
7 (g) (−0.1)10 =−1.1001 × 2 −4 , and after rounding, fl(−0.1) = −1.1001 1001 1010 × 2 −4
The biased exponent is −4 + 1023 = 1019 = 210 − 5, represented by 011 1111 1011 The
machine representation is
1011 1111 1011 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010
or bfb999999999999a in hex format
7 (h) (−0.2)10 =−1.1001 × 2 −3 , and after rounding, fl(−0.2) = −1.1001 1001 1010 × 2 −3
The biased exponent is −3 + 1023 = 1020 = 210 − 4, represented by 011 1111 1100 The
machine representation is
1011 1111 1100 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010
or bfc999999999999a in hex format
8 Yes Yes No, under chopping, 1/3 + 2/3 = 1 − "mach
9 (a) (7/3)10 = 1.0010 × 21, and after rounding, fl(7/3) = 1.0010 1010 1011 × 21 (4/3)10=
1.01 × 20, and after rounding, fl(4/3) = 1.01 0101 0101 × 20 Subtracting gives
1 0010101010101010101010101010101010101010101010101011 0 × 21
− 0 1010101010101010101010101010101010101010101010101010 1 × 21
= 0 1000000000000000000000000000000000000000000000000000 1 × 21
Trang 11SECTION 0.3 FLOATINGPOINT REPRESENTATION OFREALNUMBERS 11that is normalized to
= 1 0000000000000000000000000000000000000000000000000001 × 20,
which is 1 + "mach After subtracting 1, the result is that the double precision floating point
version of (7/3 − 4/3) − 1 is "mach
9 (b) (4/3)10 = 1.01 × 20, and after rounding, fl(4/3) = 1.01 0101 0101 × 20 (1/3)10 =
1.01 × 2 −2 , and after rounding, fl(1/3) = 1.01 0101 0101 × 2 −2 Subtracting gives
11 The associative law of addition fails for floating point addition with the Rounding to Nearest
Rule, for example, because 1 + ("mach/2 + "mach/2) = 1 + "mach > 1 , while (1 + "mach/2) +
"mach/2 = 1 , because 1 + "mach/2 = 1
12 (a) fl (1/3) = 1.0101 01 × 2 −2, with relative rounding error of 2−54 < "mach/2 = 2 −53
12 (b) fl (3.3) = 1.101001100110 0110×21, 3.3 − fl (3.3) = 0.4×2 −51with relative rounding
error of 8"mach/33
12 (c) fl (9/7) = 1.010010 0100101 × 20, fl(9/7) − 9/7 = 3"mach/7, with relative rounding
error of "mach/3
13 (a) 2, represented by 010 0 (b) 2 −511 , represented by 0010 0 (c) 0, represented by 10 0.
When bit 4 through 12 is the nonzero bit, the floating point number is positive but less than
2−511 When bit 13 through 64 is the nonzero bit, the number is positive and subnormal, soless than 2−511
14 (a) 0 (b) 2−51(c) 2−51
Trang 1215(a) (8.3)10 = 1.00001001 × 23, and rounded, fl(8.3) = 1.0000 1001 1001 1001 1010 × 23.
(7.3)10 = 1.1101001 × 22, and rounded, fl(7.3) = 1.1101 0011 0011 0011 0011 × 22.Subtracting gives
Trang 13SECTION 0.4 LOSS OFSIGNIFICANCE 13
16 (a) fl (11/4) = 1.011 × 21, with rounding error of 0
16 (b) fl (2.7) = 1.010110011001 100110010 × 21, fl (2.7) − 2.7 = 4"mach/5with relative
rounding error of 8"mach/27
16 (c) fl (10/3) = 1.1010 1011 × 21, fl(10/3) − 10/3 = 2"mach/3, with relative rounding error
of "mach/5.EXERCISES 0.4 Loss of Significance
1 (a) For x near 2πn for integer n, sec x ≈ 1, and the numerator exhibits subtraction of nearly
equal numbers An algebraically equivalent expression avoids the difficulty:
eliminates the loss of significance
1 (c) For x near 0, there is subtraction of nearly equal numbers Using common denominators
eliminates the problem:
Trang 142 (a) p = 8
2 (b) p = 5
Trang 15SECTION 0.5 REVIEW OFCALCULUS 15
3 Since a is large and negative, the expression represents subtraction of nearly equal numbers.
Multiply numerator and denominator by the conjugate:
5 Set x = 3344556600 and y = 1.2222222 The difference between the lengths of the hypotenuse
and the longer leg is
where we have rewritten the expression to eliminate the subtraction of nearly equal numbers
Although calculating the leftmost expression in double precision yields no correct significant
digits, the rightmost expression gives the correct answer 2.23322 × 10 −10.EXERCISES 0.5 Review of Calculus
1 (a) Since f(0)f(1) = (1)(−2) < 0, there exists c between 0 and 1 such that f(c) = 0 by the
Intermediate Value Theorem
1 (b) Since f(0)f(1) = (1)(−9) < 0, f(c) = 0 for some c between 0 and 1 as in (a).
1 (c) Since f(0)f(1/2) = (1)(−1/2) < 0, f(c) = 0 for some c between 0 and 1/2 by the
Intermediate Value Theorem, thus 0 ≤ c ≤ 1.
1
2.
Since f(x) = x2, this implies c2 = 1/2, or c = 1/ √2
3 (c) According to the Mean Value Theorem for Integrals, there exists c between 0 and 1
Trang 164 (a) P (x) = 1 + x2
4 (b) P (x) = 1 −25
2x2
4 (c) P (x) = 1 − x + x2
5 (a) The derivatives evaluated at x = 0 are f(0) = 1, f " (0) = 0, f "" (0) = 2, f """(0) =
0, f (iv)(0) = 12, and f(v)(0) = 0 Then the degree 5 Taylor polynomial is P (x) = 1+x2+1
2x4
5 (b) The derivatives evaluated at x = 0 are f(0) = 1, f " (0) = 0, f ""(0) = −4, f """(0) =
0, f (iv)(0) = 16, and f(v)(0) = 0 The degree 5 Taylor polynomial is P (x) = 1 − 2x2+ 2
3x4
5 (c) The derivatives at x = 0 are f(0) = 0, f " (0) = 1, f ""(0) =−1, f """ (0) = 2, f (iv)(0) =−6,
and f (v)(0) = 24 The degree 5 Taylor polynomial is P (x) = x − 1
2x2+ 13x3− 1
4x4+15x5
5 (d) The derivatives at x = 0 are f(0) = 0, f " (0) = 0, f "" (0) = 2, f """ (0) = 0, f (iv)(0) = −8,
and f (v)(0) = 0 The degree 5 Taylor polynomial is P (x) = x2 −1
7 (a) The derivatives at x = 1 are f(1) = 0, f " (1) = 1, f ""(1) = −1, f """(1) = 2, and f(iv)(1) =
−6 The degree 4 Taylor polynomial is P (x) = x − 1 −1
2(x − 1)2+13(x − 1)3− 1
4(x − 1)4
7 (b) f(0.9) can be approximated by P (0.9) = −0.1053583 Likewise, f(1.1) ≈ P (1.1) =
0.0953083.
7 (c) The remainder term is (x − 1)5/(5c5), where c lies between x and 1 At x = 0.9, the error
is (0.1)5/(5c5)≤ (0.1)5/(5(0.9)5)≈ 0.000003387, where the upper bound results from
eval-uating c at the worst case c = 0.9 At x = 1.1, the error is (0.1)5/(5c5)≤ (0.1)5/(5(1.0)5)≈
0.000002 On the basis of the remainder, we predict smaller error at x = 1.1.
7 (d) The error at x = 0.9 is |f(0.9) − P (0.9)| = 0.00000218, and the error at x = 1.1 is
|f(1.1) − P (1.1)| = 0.00000185.
8 (a) P (x) = 1 − x2/2 + x4/24
8 (b) 0.000326
9 The degree one Taylor polynomial is P (x) = 1 +1
2x , with Taylor remainder E = x2/(8(1 + c) 3/2)for c between x and 0 Setting x = 0.02, E ≤ (0.02)2/(8(1) 3/2 ) = 0.00005 The actual
values are√ 1.02 ≈ 1.0099505 and 1 + 1
2(0.02) = 1.01, which is a difference of 0.0000495, slightly less than the upper bound E.
Trang 17CHAPTER 1
Solving Equations
EXERCISES 1.1 The Bisection Method
1 (a) Check that f(x) = x3−9 satisfies f(2) = −1 and f(3) = 27−9 = 18 By the Intermediate
Value Theorem, f(2)f(3) < 0 implies the existence of a root between x = 2 and x = 3.
1 (b) Define f(x) = 3x3+ x2− x − 5 Check that f(1) = −2 and f(2) = 21, so there is a root
2) = 538 > 0 , which implies the new interval is [2,5
2] The second step is to
evaluate f(9
4) = 729
64 − 9 > 0, giving the interval [2,9
4] The best estimate is the midpoint
x c = 178
3 (b) Start with f(x) = 3x3+x2−x−5 on [1, 2], where f(1) > 0 and f(2) < 0 Since f(3
2) > 0, the second interval is [1,3
2] Since f(5
4) > 0, the third interval is [1,5
4] The best estimate is
the endpoint x c = 98
3 (c) Start with f(x) = cos2x + 6 − x on [6, 7], where f(6) > 0 and f(7) < 0 Since f(6.5) > 0,
the second interval is [6.5, 7] Since f(6.75) > 0, the third interval is [6.75, 7] The best estimate is the midpoint x c = 6.875.
5 (b) According to (1.1), the error after n steps is less than (3−2)/2 n+1 Ensuring that the error is
less than 10−10requires!1
2
"n+1
< 10 −10, or 2n+1 > 1010, which yields n > 10/ log10(2)−1 ≈
32.2 Therefore 33 steps are required.
6 Bisection Method converges to 0, but 0 is not a root
Trang 18COMPUTER PROBLEMS 1.1
1 (a) There is a root in [2, 3] (see Exercise 1.1.1) In MATLAB, use the textbook’s Program 1.1,
bisect.m Six correct decimal places corresponds to error tolerances 5 × 10 −7, according
to Def 1.3 The calling sequence
>> f=@(x) xˆ3-9;
>> xc=bisect(f,2,3,5e-7)
returns the approximate root 2.080083.
1 (b) Similar to (a), on interval [1, 2] The command
>> xc=bisect(@(x) 3*xˆ3+xˆ2-x-5,1,2,5e-7)
returns the approximate root 1.169726.
1 (c) Similar to (a), on interval [6, 7] The command
(b) There are roots in [−2, −1], [−0.5, 0.5], and [0.5, 1.5] Using bisect as in part (a) yields
the approximate roots −1.023482, 0.163823, and 0.788942.
(c) There are roots in [−1.7, −0.7], [−0.7, 0.3], and [0.3, 1.3] Using bisect as in part (a) yields
the approximate roots −0.818094, 0, and 0.506308.
Trang 19SECTION 1.2 FIXED-POINT ITERATION 19
yields the approximate cube root 1.25992105 in 27 steps.
5 (b) There is a root in the interval [1, 2] Using bisect as in (a) gives the approximate cube
root 1.44224957 in 27 steps.
5 (c) There is a root in the interval [1, 2] Using bisect as in (a) gives the approximate cube
root 1.70997595 in 27 steps.
6 0.785398
7 Trial and error, or a plot of f(x) = det(A) − 1000, shows that f(−18)f(−17) < 0 and
f (9)f (10) < 0 Applying bisect to f(x) yields the roots −17.188498 and 9.708299 The backward errors of the roots are |f(−17.188498)| = 0.0018 and |f(9.708299)| = 0.00014.
8 2.948011
9 The desired height is the root of the function f(H) = πH2(1− 1
3H) − 1 Using
>> bisect(@(H) pi*Hˆ2*(1-H/3)-1,0,1,0.001)
gives the solution 636 mm
EXERCISES 1.2 Fixed-Point Iteration