The art of computer programming volume 3 sorting and searching (second edition 2011) part 2

Mostofthischapterisdevoted to the study of a very simple search problem: howtofindthedata that hasbeenstoredwith a givenidentification.. | Theanalysis ofthisprogramisstraightforward;itsh

Trang 1

Let’slookattherecord

— AL SMITH(1928)

Thischaptermighthavebeengiven themorepretentioustitle“Storageand

Retrieval of Information”;onthe other hand,itmightsimply havebeencalledTableLook-Up.”Weareconcerned with the process of collecting information

recovered as quickly as possible.Sometimes weareconfronted withmoredata

thanwecanreally use,anditmaybe wisest to forgetandto destroymostofit;butatother timesit isimportant to retainandorganize the givenfacts insuch

awaythatfast retrievalispossible

Mostofthischapterisdevoted to the study of a very simple search problem:

howtofindthedata that hasbeenstoredwith a givenidentification Forexample,ina numerical applicationwe might wantto find /(x),givenxand

a table of the values of/; inanonnumericalapplication,we might wanttofind

theEnglish translation of a given Russian word

In general,weshallsuppose that asetofNrecords hasbeenstored,and

isespecially appropriate, becausemanypeoplespenda great deal of time every

daysearchingfortheir keys.Wegenerally require theNkeys to bedistinct,sothateach key uniquelyidentifies itsrecord.Thecollection ofallrecordsiscalled

atableorfile,wheretheword“table”isusuallyused to indicate a smallfile,

and“file”isusually used to indicate a largetable.Alargefileoragroupoffiles

isfrequently called a database

Algorithmsforsearching are presented with a so-called argument, K,andthe

twopossibilitiescanarise:Either the searchwassuccessful,having located theunique record containing K; oritwasunsuccessful,havingdeterminedthatK

enteranewrecord,containing K, into thetable;amethodthatdoesthisiscalled

a search- and-insertion algorithm.Somehardwaredevicesknownas associative

the functioning of ahumanbrain;butweshallstudy techniquesforsearching

ona conventional general-purposedigitalcomputer

associated with K, the algorithmsin thischapter generally ignore everything but

Trang 2

6 SEARCHING 393

thekeys themselves.In practicewecan find the associated data oncewehavelocatedK;forexample,ifKappearsinlocationTABLE+i,the associateddata

(ora pointer toit)mightbeinlocationTABLE+*+1,orinDATA+i,etc.It

thereforeconvenient to gloss over thedetailsofwhatshouldbedoneafterKhas

beensuccessfullyfound

Searchingisthemost time-consumingpart ofmanyprograms,andthesubstitution ofagoodsearchmethodforabadone often leads to a substantialincreaseinspeed Infactwecan often arrange the data or the data structure

sothatsearchingiseliminatedentirely,byensuring thatwealwaysknowjust

achievethis; forexample, adoublylinkedlistmakesitunnecessary to searchfor

the predecessor or successor of a given item Anotherwaytoavoid searchingoccursifweareallowed tochoose the keysfreely,sincewe mightas welllet

beplacedinlocationTABLE+K.Bothofthesetechniqueswere used to

elimi-nate searchingfromthe topological sortingalgorithm discussedinSection2.2.3

sortingalgorithmhad beengivensymbolicnamesinstead ofnumbers.Efficient

algorithmsforsearchingturn out tobequiteimportantinpractice.Searchmethods can beclassified inseveralways Wemightdividethem

into internalversus external searching, just aswedivided the sorting algorithms

contents ofthe table are essentiallyunchanging(sothatit isimportant toimizethe searchtime without regardforthetime required tosetupthetable),

alsodeletions.Athirdpossibleschemeistoclassifysearchmethodsaccording to

the keys,analogous to the distinctionbetweensortingby comparisonandsorting

bydistribution.Finallywe mightdividesearching intothosemethodsthatusethe actualkeysandthose thatworkwith transformed keys

Theorganization ofthischapterisessentiallyacombinationofthelattertwo

search,then Section 6.2 discusses theimprovementsthatcanbemadebasedon

comparisonsbetweenkeys,using alphabetic ornumericorder togovern thesions.Section 6.3 treatsdigitalsearching,andSection 6.4 discussesanimportant

ofthe actual keys Eachofthese sections treatsbothinternalandexternalsearching,inboththestaticandthedynamiccase;andeach section points out

Searchingandsortingareoften closely related toeach other.Forexample,consider the followingproblem:Giventwosetsofnumbers,A ={aj, 02,

am}andB ={bi,b%, bn determinewhetherornotA CB Threesolutions

Trang 3

394 SEARCHING 6

2.Sortthea’sand6’s,thenmakeone sequential passthrough bothfiles,checking the appropriate condition

3.Enter the 6/sinatable,then searchforeach of thea,

Eachof these solutionsisattractiveforadifferentrange of values ofmandn

Solution1willtake roughlyCimnunits of time,forsomeconstanta, and

solution 2willtakeaboutC2 (to lgm +nlgn)units, forsome(larger)constantc.-i

Witha suitable hashingmethod,solution 3willtake roughlyc3m +c4nunits oftime,forsome(stilllarger)constantsc3andc4.Itfollowsthat solution1isgoodforvery smalltoandn,but solution 2 soonbecomesbetter asmandn grow

memorysize;then solution 2isusually again superioruntilngetsmuchlargerstill.Thus wehave a situationwheresortingissometimesagoodsubstitutefor

searching,andsearchingissometimesagoodsubstitutefor sorting.Morecomplicated searchproblemscan often be reduced to the simpler caseconsidered here For example, suppose that the keys arewordsthatmight be

lexicographic orderandanotherinwhichthey are orderedfromright toleft(as

agreeupto half ormoreofitslengthwithanentryinone of thesetwofiles.The

searchmethodsof Sections 6.2and6.3canthereforebe adaptedto find thekeythatwasprobably intended

Arelatedproblemhas received considerable attentioninconnection with

whenthereisagoodchance that thenamewillbemisspelledduetopoor

handwriting or voice transmission.Thegoalistotransform theargumentinto

somecode that tends to bring togetherallvariants of thesame name The

followingcontemporary formof the“Soundex” method,a technique thatwas

originallydevelopedby MargaretK.Odelland RobertC.Russell[seeU.S.Patents1261167(1918),1435663(1922)],has oftenbeenusedforencodingsurnames:

1.Retain thefirstletterof thename, and dropalloccurrences ofa, h,i,o,

3.Iftwoormoreletterswith thesamecode were adjacentinthe originalname

(before step1),oradjacent exceptforinterveningh’sandw’s,omitallbutthefirst.

4.Convert to theform“letter,digit,digit, digit”byaddingtrailingzeros(ifthere arelessthanthreedigits),orbydropping rightmostdigits(ifthere

Trang 4

Forexample, thenamesEuler,Gauss, Hilbert,Knuth,Lloyd, Lukasiewicz,and

Ofcoursethissystemwillbring togethernamesthat aresomewhatdifferent,

aswell asnamesthat aresimilar;thesameseven codeswould beobtainedfor

handafew relatednameslikeRogersandRodgers, or SinclairandSt.Clair,orTchebysheffand Chebyshev, remainseparate But by andlargetheSoundex

code greatly increases the chance of finding anameinone ofitsdisguises.[Forfurtherinformation, see C.P.Bourne andD F Ford,JACM8 (1961),538-

552;LeonDavidson,CACM5 (1962),169-171;Federal Population Censuses

Whenusing aschemelikeSoundex,we neednot giveuptheassumption

thatallkeys aredistinct;we can makelistsofallrecordswith equivalent codes,treatingeachlistasaunit

people oftenwanttoconsidermanydifferent fieldsofeach record as potentialkeys,with theabilitytolocateitemswhenonly part of the key informationis

talentandaFrench accent; given a largefileofbaseballstatistics,a sportswriter

maywish to determine the totalnumberofruns scoredbytheChicagoWhiteSoxin1964,during the seventh inning of night games, against left-handedpitchers.Givena largefileofdataaboutanything, peopleliketoask arbitrarilycomplicated questions Indeed,we mightconsideranentirelibrary asa database,

anda searchermaywanttofindeverything that hasbeenpublishedabout

informationretrieval.Anintroduction to the techniquesforsuchsecondary key(multi-attribute)retrievalproblemsappearsbelowinSection6.5

Before entering into a detailed study of searching,itmaybehelpfultoputthingsinhistoricalperspective Duringthepre-computerera,manybooksoflogarithmtables,trigonometrytables, etc.,were compiled, so thatmathematical

calculationscouldbereplacedbysearching.Eventually these tables were ferred topunchedcards,andusedforscientificproblemsinconnectionwith

forsearching.Withsmall internalmemories,andwith nothing but sequential

almost impossible

the1950s eventually led to the recognition that searchingwas aninteresting

ofspaceinthe early machines,programmerswere suddenly confronted with

Trang 5

Thefirstsurveys of the searchingproblemwere publishedbyA.I.Dumey,

J.Research<fcDevelopment1 (1957), 130-146; A D Booth, Informationand

Control1(1958),159-164; A.S.Douglas,Comp.J.2(1959), 1-9 More

extensivetreatments were givenlaterby KennethE Iverson,AProgramming

ontreestructureswere introduced, asweshall see;andresearchaboutsearching

is stillactivelycontinuingatthe present time

6.1.SEQUENTIAL SEARCHING

This sequential procedureisthe obviouswayto search,anditmakesa usefulstarting pointforour discussion of searching becausemanyofthemoreintricatealgorithms are basedonit.Weshallseethat sequential searching involvessome

very interestingideas, inspite ofitssimplicity

Thealgorithmmight beformulatedmoreprecisely asfollows:AlgorithmS(Sequential search) Givena table of records Ri,R2, Rn,

Trang 6

6.1 SEQUENTIALSEARCHING 397

ProgramS(Sequential search) AssumethatKiappearsinlocationKEY+i,

andthat theremainderofrecord Ri appearsinlocationINFO+i.Thefollowing

AtlocationSUCCESS, the instruction “LDAINF0+N.1”willnowbringthe desiredinformation into rA |

Theanalysis ofthisprogramisstraightforward;itshowsthat therunningtime ofAlgorithmSdepends on twothings,

K =Ki,wehaveC i,S —1;hence the total timeis(5i+l)u.Ontheother

handifthe searchisunsuccessful,wehaveC N,S =0,fora total time of

(5N +3)u.Ifevery inputkey occurs with equal probability, the average value

ofCina successful searchwillbe

straightforwardchangemakesthealgorithmfaster,unlessthelistofrecordsisquite short:

Algorithm Q(Quicksequential search) This algorithmisthesameasrithmS,except thatitassumesthepresence ofadummyrecordRn+iatthe

Algo-endofthefile.

Ql.[Initialize.]Seti<—1,andsetK^+i K

Q2.[Compare.]IfK —Ki, go toQ4

Q3.[Advance.]Increaseiby1andreturn to Q2

terminates unsuccessfully(i= N +1) |

Program Q(Quicksequential search) rA=K,rll=*—N

Ql.Initialize

Trang 7

Interms of the quantitiesCand5inthe analysis ofProgramS,the runningtime has decreased to(4C—45+10)it;thisisan improvement wheneverC >6

ina successful search,and wheneverN >8inanunsuccessful search

Thetransitionfrom AlgorithmS toAlgorithmQmakesuse ofantantspeed-upprinciple:Whenaninner loop of aprogramteststwoormore

impor-conditions,weshould try to reduce the testing to just one condition

ProgramQ'(Quicker sequential search). rA=K,rll=i—N

Theinner loop hasbeenduplicated;thisavoidsabouthalf of the +1”

instructions, soitreduces the running time to

3.5C -3.55+10+(C ~ S)mod2

2

tables arebeing searched;manyexistingprograms can be improvedin thisway

D E.Knuth, ComputingSurveys 6 (1974), 266-269.]

areinincreasing order:

Algorithm T(Sequential search in orderedtable). Givena table of records

thisalgorithm searchesfora givenargumentK For convenienceandspeed,the algorithmassumesthat thereisadummyrecordFI_\ + iwhosekey valueis

Kn+i =oo>K

Tl.[Initialize.]Seti+-1.

T2.[Compare.]IfK <AT,,go to T4

Trang 8

T4.[Equality?]IfK —Ki, the algorithm terminatessuccessfully.Otherwiseitterminates unsuccessfully |

If allinput keys are equallylikely,thisalgorithm takes essentially thesame

averagetime asAlgorithmQ,fora successful search.Butunsuccessful searches

morequickly

Eachofthealgorithmsaboveuses subscripts todenote the tableentries.It

isconvenient to describe themethodsinterms of these subscripts, but thesame

searchprocedures canbeusedfortablesthathave a linked representation, sincethedataisbeing traversed sequentially (See exercises2, 3,and4.)Frequencyofaccess.Sofarwehavebeen assumingthateveryargumentoccurs

asoften asevery other Thisisnot always arealisticassumption;inageneralsituation,keyKjwilloccurwith probabilitypj,wherePi+P2+ + Pn =1-

Thetime required todoa successful searchisessentiallyproportional to the

C N =px+2p2-\ fNp N.

(3)

Ifwehave the option of putting the records into the tableinanydesired order,

thisquantityCVissmallestwhen

thatis,whenthemostfrequentlyused recordsappearnear the beginning.Let’slookatseveral probability distributions,inorder to seehow muchofasavingispossiblewhenthe records arearrangedinthe optimalmannerspecified

,byexercise7;theaveragenumberofcomparisonsis

lessthantwo,forthisdistribution,ifthe recordsappearintheproper orderwithin thetable

2

Pi=Nc, p2— (N —l)c, , pjv=c, wherec=

+ (6)

as (5).Inthiscasewefind

Trang 9

Ofcourse the probability distributionsin(5

)and(6)are ratherartificial,andtheymayneverbeaverygood approximationtoreality.Amoretypicalsequence ofprobabilities,called “Zipf’s law,”has

(8)This distributionwaspopularizedbyG K.Zipf,whoobserved that thenth most

commonwordinnaturallanguage textseemstooccurwith a frequency imately proportional to 1/n [The Psycho-Biology ofLanguage(Boston, Mass.:

(Reading, Mass.:Addison-Wesley,1949).] Heobserved thesame phenomenon

incensustables,whenmetropolitan areas arerankedinorder of decreasingpopulation.IfZipf’slaw governs the frequency of the keysinatable,wehave

immediately

C N = N/H n; (9)searchingsuch afile isabout|InAtimesfasterthansearching thesamefile

Cleave,MechanicalResolution of LinguisticProblems(NewYork:Academic

Press, 1958),79.]

that hascommonly beenobservedincommercialapplications[see,forexample,

W.P.Heising,IBMSystemsJ.2 (1963), 114-115] This rule states that 80 cent of the transactions deal with themostactive20 percent of afile;andthe

per-samerule appliesinfractalfashion to thetop 20 percent, so that 64 percent ofthe transactions deal with themostactive 4 percent,etc.In other words,

Onedistribution thatsatisfiesthisruleexactlywhenevernisa multiple of 5is

since Pi+P2+• +pn—cn&for allnin this case It not especially easy

toworkwith the probabilitiesin(11);wehave, however,n6—(n—l)9 =

(l+ 0(l/n)),so thereisa simpler distribution that approximatelyfulfills

the 80-20rule,namely

Trang 10

varyfromauniformdistribution toa Zipfian one.Applying(3)to

Astudy ofwordfrequencies carried outbyE.S.Schwartz[seethe interesting

slightlynegative value of 9 gives a betterfittothedatathanZipf’slaw(8).In

thiscasethemeanvalue

(-)

issubstantially smallerthan(9)asN —00.

Distributionslike(11)and(13)werefirststudiedbyVilfredoParetoin

connection withdisparitiesofpersonalincome andwealth [Coursd’Economie

Politique 2 (Lausanne:Rouge,1897),304-312] Ifpkisproportional to thewealth of thefcthrichestindividual,the probability that a person’s wealthexceeds or equalsxtimes the wealth of the poorest individualisk/Nwhen

,the stated probability

isx 1 /(1-0

);thisisnowcalledaPareto distribution withparameter1/(1—9)

Curiously,Pareto didn’tunderstandhisowndistribution;he believed that

a value of 9 near 0wouldcorrespond to amoreegalitariansocietythanavaluenear1! His errorwascorrectedby CorradoGini[Attidella IIIRiunione

dellaSocieta Italiana perilProgressodelleScienze(1910),reprintedinhis

person to formulateandexplain the significance ofratios likethe 80-20law(10).Peoplestilltend tomisunderstandsuch distributions; they often speakabouta

“75-25 law” or a “90-10 law” asifana-blawmakessenseonlywhena+b=100,while(12

G.UdnyYulewhenhe studied the increaseinbiological species as a function oftime,assumingvariousmodelsofevolution[Philos.Trans.B213(1924), 21-87],Yule’s distribution applieswhen9<2:

Thelimiting valuec=1/Hprorc— l/Nisusedwhen9=0 or 9=1

A“self-organizing”file Thesecalculationswith probabilities are verynice,

butinmostcaseswedon’tknow whatthe probabilitiesare Wecouldkeep acountineach record ofhowoftenithasbeenaccessed, reallocating the recordson

the basis of those counts; the formulas derivedabovesuggest thatthisprocedure

(N-0\>

(16)

Trang 11

402 SEARCHING 6.1

somuch memoryspace to thecountfields,sincewe can makebetteruse ofthat

memorybyusingone of the nonsequential search techniques that are explained

laterin thischapter

Asimplescheme,whichhasbeeninuseformanyyearsalthoughitsorigin

auxiliarycountfields: Whenevera recordhasbeensuccessfully located,it

willtend tobe locatedfairlynear the beginning of thetable,when weneed them

with each search being completely independent of previous searches,itcan be

self-organizingfiletends to the limiting value

1+ 2JE=1(,?—1)pj=2Cjv—1.Infact,Cnisalwayslessthan7r/2times theoptimal valueCn[Chung, Hajela,and Seymour,J.Comp.Syst.Sci.36(1988),148-157];thisratioisthe best possible constantingeneral,sinceit isapproachedwhenpjisproportional to l/j2

.

Let us seehowwellthe self-organizingprocedureworkswhenthekey

prob-abilitiesobeyZipf’slaw(8).Wehave

byEqs 1.2.7-(8)and1.2.7-(3).Thisissubstantially betterthan|IV,whenN

isreasonablylarge,andit isonlyaboutIn4fa1.386times asmanycomparisons

thatthe self-organizingmethod workseven betterthanour formulas predict,because successive searches are not independent (small groups of keys tend tooccurinbunches)

This self-organizingscheme wasfirstanalyzedbyJohnMcCabe[Operations

Trang 12

another interesting scheme,under whicheach successfully located key thatisnotalreadyatthebeginning of the tableissimply interchanged with the preceding

key,instead of beingmovedallthewaytothefront Heconjectured that thelimiting average search timefor thismethod, assumingindependent searches,

never exceeds(17).Several yearslater,RonaldL.Rivest provedin factthat thetranspositionmethodusesstrictlyfewer comparisonsthanthe move-to-front

probabilities are equal[CACM19(1976), 63-67],However,convergence to theasymptoticlimitismuchslowerthanforthe move-to-frontheuristic,so move-to-frontisbetter unless the processisprolonged[J.R Bitner,SICOMP8 (1979),82-110], Moreover,J.L.Bentley,C C.McGeoch,D D Sleator,andR.E,'Tarjan have proved that the move-to-frontmethodnevermakes more thanfourtimes the totalnumberofmemoryaccessesmadeby anyalgorithmonlinearlists,givenanysequence of accesseswhatevertothe data—evenifthe algorithm

thisproperty[CACM28(1985), 202-208,404-411], SeeSODA8 (1997), 53-62,

foraninteresting empirical study ofmore than40 heuristicsforself-organizing

stillanothertwist: Supposethe tablewearesearchingisstoredontape,and

the individual records have varying lengths For example,inanold-fashionedoperating system, the “system library tape”wassuch afile;standardsystem

were the records onthistape,and mostuser jobswouldstartbysearching

our previous analysis ofAlgorithmS inapplicable, since step S3 takes a variable

not the only criterion ofinterest

Let L, be the length of recordR{ ,andletptbe the probability thatthis

recordwillbe sought.Theaverage running time of the searchmethodwillnow

beapproximately proportional to

WhenLi= L2= • = LN =1,thisreduces to(3),the case already studied

ofthe tape; butthisissometimesabadidea!For example,assumethat the tapecontains justtwoprograms,AandB,whereAisneededtwice as often asBbut

it four times aslong.Thus,

N =2> PA= b LA=4, PB = b Lg =1.

IfweplaceAfirstontape, according to the“logical”principle stated above, theaverage running timeis

f-4+±-5= f;butifweusean“illogical” idea,placing

Bfirst,the average running timeisreducedto|l +|-5=ii

Trang 13

404 SEARCHING 6.1

TheoremS Let Liandpibe as defined above.The arrangementof records

inthe tableisoptimalifandonlyif

'+Pi+i(Li+ 4-Li- +Lj+i)+Pi(Li+• +Li+1 )+•

a netchangeofPiLi+1-pi+iLi ThereforeifPi/Li <pl+1/Lt+1,suchan

interchangewillimprovetheaveragerunning time,andthe givenarrangement

isnot optimal.Itfollowsthat (20) holdsinanyoptimal arrangement.Conversely,assumethat (20) holds;we needtoprove that thearrangement

isoptimal The argumentjustgivenshowsthat thearrangementis“locally

optimal”inthe sense that adjacent interchangesmake no improvement;but there

mayconceivablybea long,complicated sequence of interchanges that leads to abetter “globaloptimum.”Weshallconsidertwoproofs,one that usescomputer

scienceand onethat usesamathematicaltrick

Firstproof Assumethat(20)holds Weknowthatany permutationoftherecordscanbesorted into the orderR\R2. Rnbyusingasequence ofinter-

changes of adjacent records.Eachoftheseinterchanges replaces. RjRi by. RiRj forsomei<j,soitdecreases the searchtimebythe nonnegative

search time

Pi(e)=Pi+ P -(e1+e+• +eN)/N,

(21)

willneverhave xipi(e)-| \-xNpN(e)=yiPi(e)-\ \-VnPn(e)unlessxx=yu ,xjv=Vn\inparticular,equalitywillnot holdin (20).Considernowthe

IV!permutations of the records; atleastone ofthemisoptimum, andweknow

thatitsatisfies (20).Butonlyone permutationsatisfies(20) because there are

ofrecordsinthe tableforthe probabilitiesPi(e),whenevereissufficientlysmall

Bycontinuity,thesame arrangement mustalsobeoptimumwheneissetequal

Trang 14

(1956), 59-66 Theexercisesbelowcontain furtherresultsaboutoptimumfilearrangements

EXERCISES

1 [M20 Whenallthe search keysareequally probable,whatisthestandardation ofthenumberofcomparisonsmadeinasuccessful sequentialsearchthrough a

devi-tableofNrecords?

2 [15] Restate thestepsofAlgorithmS,usinglinked-memory notation insteadof

subscriptnotation.(IfP pointstoa recordinthetable,assume that KEY(P)isthekey,

INFO(P)isthe associated information, and LINK(P)isa pointertothe nextrecord

3 [16]Write aMIXprogramforthealgorithmof exercise2.Whatistherunningtimeofyour program,intermsofthequantitiesCandSin(l)?

4.[17] Does the ideaofAlgorithmQcarry overfromsubscriptnotationto

Butarethereany smallvalues ofCandSforwhichProgramQ'actuallytakesmore

time thanProgram Q?

6.[20] Addthreemoreinstructions toProgramQ',reducingitsrunning timeto

about (3.33C+constant)u

7.[M20] Evaluate the averagenumberofcomparisons,(3),usingthe “binary”

prob-abilitydistribution(5)

8.[HM22] Find an asymptoticseries for asnA00,whenr/1.

9.[HM28] Thetextobserves that theprobability distributionsgivenby(11), (13),

and(16)areroughly equivalentwhen0<6<1,and that themean numberof

comparisons using(13)is^_AT+ 0(Nl~e

).

)alsowhenthe

probabilities of (11)areused?

b)Whatabout(16)?

c)Howdo(11)and(16)compareto (13)when0<0?

10.[M20]Thebestarrangementofrecordsinasequential tableisspecifiedby(4);whatistheworstarrangement?Showthat the averagenumberofcomparisonsintheworstarrangement has a simplerelation totheaveragenumberofcomparisonsinthe

1+j

Trang 15

withprobability pi.After thesystem has been running a longtime,showthat

Riwillbe themthitemfrom thefrontwithlimiting probabilityPiP(N-i)(m-i),

where thesetof variablesXis(px , ,pi_x,Pi+i,.

Pnn+Pn(n-l)+‘

+PnO=Provethat,consequently,

thefront ofthelist;then evaluateCN=J2iLiPi d

i-12.[MSS]Use(17)toevaluate the averagenumberofcomparisons neededtosearchtheself-organizingfilewhenthe search keys have the binaryprobability distribution

(5).

13.[M27]Use(17)toevaluateCnforthe wedge-shapedprobability distribution

(6).

14.[M21]Given two sequences(xi,x2 , x„)and(j/i,y2 , y)of realnumbers,

whatpermutationora2 anofthesubscripts willmake]T\ Xiyaiamaximum? What

permutationwillmakeitaminimum?

15.[M22]Thetextshowshowtoarrange programs optimally on a systemlibrarytape,whenonly one programisbeing sought But anothersetofassumptionsismore

appropriatefora subroutinelibrary tape,from whichwemaywishtoload varioussubroutinescalled for inauser’sprogram

Forthiscaseletussuppose that subroutinejisdesiredwithprobabilityPj,

independentlyofwhetherornot other subroutines aredesired Then,forexample,theprobabilitythatno subroutinesatallareneededis(1-Pi)(l— P2 ) (1-P N)\and theprobabilitythat the searchwillendjust afterloading the jth subroutineis

^ J(1—Pj+i)• (1~ Pn)• IfLjisthe lengthofsubroutinej,the average search time

willthereforebeessentiallyproportionalto

LiPi(l P2 ) (1—PN)+(Lx+L2)P2(1-P3 ) (1-PN) -| f (Lx-1 1-Ln)Pn.

assump-tions?

16 [M22] (H Riesel.)Weoftenneedto testwhetherornotngiven conditionsare

allsimultaneouslytrue (For example,wemaywantto testwhether both x>0and

y<z2

,andit isnot immediatelyclearwhich condition should betestedfirst.)Supposethat thetesting ofconditionj costs Tjunits of time,and that the conditionwillbe

truewithprobability

pj,independentoftheoutcomesofallthe otherconditions.In

Trang 16

Fig.2.An“organ-pipearrangement”of probabilitiesminimizes the average seek time

ina catenatedsearch

17.[M23] (J.R Jackson.)Supposeyou havetodonjobs;thejth job takesT, units

oftime,andithas a deadline Dj In other words, the jth jobissupposedtobefinishedafteratmostDjunits oftime haveelapsed.Whatscheduleaia2 a„forprocessingthe jobswillminimize themaximumtardiness,namely

18

[M30] (Catenatedsearch.)Suppose thatNrecordsarelocatedinalineararray

Ri Rn,withprobability pjthat recordRjwillbe sought.Asearch processiscalled

“catenated”ifeach search begins where thelastoneleft off.Ifconsecutive searches

areindependent, the average time requiredwillbe ^2 1<itj<NPiPjd(i,j),whered(i,j)representstheamountoftimetodo a search thatstarts atpositioniand endsatpositionj.Thismodel can beapplied, forexample,to diskfileseektime,ifd(i,j)isthetime neededto travelfromcylinderito cylinder j

Theobjectof this exerciseisto characterizetheoptimumplacementofrecordsfor

catenatedsearches,wheneverd(i,j)isanincreasingfunctionof|—j|,thatis,whenever

wehaved(i,j)=d|j_j|ford\<d2< <djv-i.(The valueof isirrelevant.)Provethatin thiscasethe recordsareoptimallyplaced,amongallAT!permutations,ifandonlyifeither p\< Pn<P2<Pn-i<• <P[iv/2 j-i-iorPn<Pi<pjv-i< 2<

<P\N/2 ] • (Thus,an “organ-pipe arrangement”of probabilitiesisbest,asshown

inFig.2.) Hint: Consider any arrangement where therespective probabilities are

qi 92•-qksrk -r2nti tm,forsomem >0 and k>0;N =2fc+ m+1.Showthattherearrangementq[q'2 q'ksr'k r' 2r[ tmisbetter,whereq[=min(qitr,)andr'= max[q, , r,),exceptwhenq[=qtandr\=r,forall iorwhenq[=r\andr'=qt

andtj=0forall iandj.The sameholds truewhensisnot presentandN =2 +m

19.[M20]Continuingexercise 18,what are the optimal arrangementsforcatenatedsearcheswhenthe functiond(i,j)has theproperty thatd(i,j)+d(j,i)=c forall

i 7^ j ? [This situation occurs, forexample,on tapes without read-backwardscapability,

d(i,j)=a+b(Li+i- bLj)andd(j,i)=a+b(Lj+H bTjv)+r+fe(LiH bLt ),whereristherewindtime.]

[M28]Continuingexercise 18,what are the optimal arrangementsforcatenatedsearcheswhenthe functiond(i,j)ismin(d|,_j|,d„_|1_:J |),fordj<d2<•••?[Thissituation occurs, forexample,inatwo-waylinked circularlist,or inatwo-wayshift-

Trang 17

21.[M28]Consideran n-dimensional cube whoseverticeshave coordinates(di,..,,d n)withdj=0or1;twovertices are calledadjacentiftheydifferinexactlyone coordinate.Suppose that asetof 2nnumbers x0<xx< <x2~-iistobe assignedtothe 2"

vertices insuch awaythatJ2i,j

\

xi~xj\ minimized,where thesumisoverall iandj

such thatXiandXjhavebeen assignedtoadjacentvertices.Prove thatthisminimumwillbe achievedif,forallj,Xjisassignedtothe vertexwhose coordinatesarethebinary representationof j

22.[20]Suppose you wanttosearchalargefile,notforequalitybutto findthe1000records thatare closest toa givenkey, inthe sense that these 1000 records have the

smallestvaluesofd(Kj,K)forsomegiven distance functiond.Whatdata structureismost appropriateforsuch asequentialsearch?

Nothing's sohard,but searchwillfinditout

Trang 18

6.2.SEARCHING BY COMPARISON OF KEYS

In THISSECTION weshalldiscuss searchmethodsthat arebasedona linearordering of the keys, such as alphabetic order ornumericorder.Aftercomparing

thegivenargumentA' toakeyAT,inthetable,the search continuesinthree

-,,or A">K, The

sequential searchmethodsofSection6.1wereessentiallylimited toatwo-way

decision(K =KiversusK ^Ki), butifwefreeourselvesfromthe restriction

of sequential accessweareable tomakeeffectiveuse ofanorderrelation

6.2.1.SearchinganOrdered Table

toldyouto findthenameofthepersonwhosenumberis795-6841? Thereis

nobetterwayto tacklethisproblem thanto usethe sequentialmethodsofSection6.1.(Well,you mighttry todialthenumber andtalk tothepersonwho

answers; oryou mightknowhowtoobtain a special directory thatissortedby

bythe party’sname,instead ofby number,although the telephone directorycontainsallthe information necessaryinbothcases Whena largefilemust

be searched, sequential scanningisalmost out of the question, butanorderingrelationsimplifiesthejob enormously

Withsomanysortingmethodsatour disposal (Chapter5),wewillhavelittledifficultyrearrangingafileintoorder so thatitmaybesearched conveniently

Ofcourse,ifwe needtosearch the table only once, a sequential searchwould

befasterthantodoacompletesort ofthefile;butifweneed tomakerepeatedsearchesinthesamefile,wearebetteroffhavingitinorder.Thereforein this

sectionweshallconcentrateon methodsthat are appropriateforsearching atablewhosekeyssatisfy

K\ < K2 <• < Kn,

KtoKiinsuch atable,wehave either

•K <Ki [Ri, R-i+i,• >Rnareeliminatedfromconsideration];

or • AT=Ki [thesearchisdone];

Ineach of these threecases,substantial progress hasbeen made,unlessi isnearoneof theends of thetable; thisiswhythe ordering leads toanefficient

algorithm

half ofthe table shouldbesearched next,andthesameprocedure canbeusedagain,comparingKtothemiddlekey of the selectedhalf, etc After atmost

Trang 19

410 SEARCHING 6.2.1

SUCCESSFig.3.Binarysearch

thatit isnot present.This procedureissometimesknownas“logarithmic search”

or “bisection,”butit mostcommonlycalledbinarysearch

thealgorithmmakesuse oftwopointers,landu,that indicate the current lower

Algorithm B(Binarysearch) Givena table of records Ri,R2, ,Rnwhose

keys areinincreasing orderKi < K2< < K N,thisalgorithm searchesforagivenargumentK

Bl.[Initialize.]Setl<—1,u<— N

B2.[Get midpoint.](AtthispointweknowthatifKisinthetable,itsatisfies

Ki< K < Ku.Amoreprecisestatement of the situation appearsin

exer-cise 1below.)Ifu<l,the algorithm terminates unsuccessfully Otherwise,

seti<-[(l+u)/2j,theapproximate midpointofthe relevant tablearea.B3.[Compare.]IfK < Ki:go to B4;ifK >Ki, go to B5;andifAT=Ki, thealgorithm terminatessuccessfully

B4.[Adjustu.]Setu*r- i—1andreturn to B2

B5.[Adjustl

.]SetI«—i+1andreturn to B2 |

Figure 4illustratestwocasesofthisbinary search algorithm:firstto search

Trang 20

061087 154 170[275 426503]509 512 612 653 677 703 765 897 908

061 087 154 170[275]426 503 509 512 612 653 677 703 765 897 908

061087 154 170275][426503 509 512 612 653 677 703 765 897 908

Fig.4.Examplesofbinarysearch

Program B(Binarysearch) AsintheprogramsofSection6.1,we assume

here thatKiisafull-wordkey appearinginlocationKEY+i.Thefollowingcodeusesrll l,rI2:=u,rI3=i.

rightbinary1,”whichislegitimateonlyonbinary versions of MIX;forgeneralbytesize,thisinstructionshouldbereplacedby “MUL =l//2+l=”,increasing therunning time to (26C -185+20)u

Trang 21

Fig.5.Acomparisontreethat correspondstobinary searchwhenN =16.

representedbythe rootnode@inthefigure.ThenifK <Kg,thealgorithm

follows theleftsubtree,comparingKtoK^\ similarlyifK >Kg,the rightsubtreeisused Anunsuccessful searchwilllead tooneofthe external squarenodesnumbered[o]through[77] ;forexample,wereachnodeIT]ifandonlyif

Inananalogous fashion,anyalgorithmforsearchinganordered table oflengthNbymeansofcomparisonscan berepresented asanIV-node binary tree

validmethodforsearchinganorderedtable;wesimply label the nodes

0 0 0 © 0 B ® ® (i)

Ifthe searchargumentinput toAlgorithmBisK w,the algorithmmakesthecomparisonsK >Kg,K < K\2,K =Kiq.This corresponds to thepathfrom

the root to(to)in Fig 5.Similarly,the behavior ofAlgorithmBonother keyscorresponds to the other paths leadingfromthe root of thetree.The methodofconstructing the binary trees corresponding toAlgorithmBthereforemakesiteasy to prove the followingresultbyinductionon N:

TheoremB.If 2k 1

< N <2,a successful search usingAlgorithmBrequires

Trang 22

kcomparisons;andif2 1< N <2 —1,an unsuccessful search requires either

k—1ork comparisons |

equallylikelyargument;andletC'Nbetheaveragenumberofcomparisonsin

anunsuccessful search,assumingthateach of theN +1intervalsbetweenand

outside theextremevalues of the keysisequallylikely.Then wehave

Cn —1+internalpathNlength of tree C'N =externalpathlength of tree

This formula,whichisduetoT.N.Hibbard[JACM9 (1962), 16-17], holds

for allsearchmethodsthatcorrespond to binarytrees; inotherwords,itholds

successful-searchcomparisonscanalsobeexpressedinterms of the correspondingvarianceforunsuccessful searches(seeexercise25)

Fromthe formulasabovewe canseethatthe“best”waytosearchby

treeswithNinternalnodes.Fortunatelyitcanbe proved thatAlgorithmBis

binary tree hasminimumpathlengthifandonlyifitsexternalnodesalloccur

treecorresponding toAlgorithmBis

(IV+l)([lgATj+2) -2LlgivJ+1

(SeeEq.5.3.1-(34).) Fromthisformulaand(2)we can computetheexactaveragenumberofcomparisons,assumingthatallsearchargumentsareequallyprobable

Trang 23

414 SEARCHING 6.2.1

searchmethodbasedoncomparisons candobetterthanthis Theaveragerunning time ofProgramBisapproximately

(18IgA —16)w fora successful search,

(18lgN +12)u foranunsuccessful search, ^

search,it istemptingto use only two,namelythe current positionianditsrate

ofchange,5;aftereach unequal comparison,wecould thenseti<-i±<5and

&t <5/2(approximately) It possible todothis,but onlyifextremecare

ispaid to thedetails,asinthe following algorithm Simpler approaches are

Algorithm U(Uniformbinary search).Givena table of recordsRi,R2, ,Rn

fora givenargumentK.IfNiseven, the algorithmwillsometimesrefertoa

dummykeyKqthat should besetto—oo(oranyvaluelessthan K).Weassume

thatN >1.

Ul.[Initialize.]Set* \N/2~\,m•<-\_N/2\

U2.[Compare.]IfK <Ki, go to U3;ifK >Ki, go to U4;andifK = K{ ,thealgorithm terminatessuccessfully

U3.[Decreasei] (Wehave pinpointed the search toaninterval that containseithermorm-1records;ipoints just to the right ofthis interval.) Ifm =0,

the algorithm terminates unsuccessfully Otherwiseset* i-[m/2];then

U4.[Increasei.\ (Wehave pinpointed the search toaninterval that containseithermorm-1records;ipoints just to theleftofthis interval.) Ifm =0,

the algorithm terminates unsuccessfully Otherwiseset* -f- *+[m/2]; then

setm<—[m/2j andreturn to U2 |

Figure 6showsthe corresponding binary treeforthe search,whenN =10

Inanunsuccessful search, the algorithmmaymakearedundant comparisonjustbefore termination; those nodes areshadedinthefigure.We maycallthe searchprocessuniformbecause the differencebetweenthenumberofanode onlevell

onlevell.

Thetheory underlyingAlgorithmUcan be understoodasfollows:Suppose

thatwehaveaninterval of lengthn—1to search; acomparisonwith the middleelement(forneven) or with one of thetwo middleelements(fornodd) leaves uswithtwointervals of lengths [n/2\-1and[n/2]-1.After repeatingthisprocess

k times,weobtain 2k

intervals,ofwhichthe smallest has length [n/2fc

J-1and

Trang 24

6.2.1 SEARCHINGAN ORDERED TABLE 415

Fig.6.Thecomparisontree fora “uniform” binarysearch,whenN =10

“middle” element, without keeping track of the exact lengths

Theprincipaladvantage ofAlgorithmUisthatwe neednotmaintain thevalue ofmatall;we needonlyrefertoa short table of the various(5to useat

Algorithm C(Uniformbinary search) This algorithmisjustlikeAlgorithmU,butitusesanauxiliarytableinplace of the calculations involvingm Thetableentriesare

Cl.[Initialize.]Seti«—DELTA[1],j <—2

C2.[Compare.]IfA <A;,go to C3;ifA >Ki, go to C4;andifA =A*, thealgorithm terminatessuccessfully

C3.[Decrease*.] IfDELTA[j]=0,the algorithm terminates unsuccessfully.Otherwise,seti<—i—DELTA

[j],j <— j+1,andgo to C2

C4.[Increasei.] IfDELTA[j]=0,the algorithm terminates unsuccessfully.Otherwise,seti<—i+DELTA[j],j f— j+1,andgo to C2 |Exercise 8 proves thatthisalgorithmreferstotheartificialkeyA0= —ooonlywhenNiseven

Program C(Uniformbinary search) Thisprogramdoes thesamejob as

Trang 25

Ina successful search,thisalgorithm corresponds to a binary tree with the

sameinternalpathlength as the tree ofAlgorithmB, so the averagenumberofcomparisonsC isthesameas before.Inanunsuccessful search,AlgorithmC

alwaysmakesexactly|_lgN\ +1comparisons Thetotalrunning time of

Pro-gramCisnot quite symmetricalbetweenleftandright branches, sinceClisweightedmoreheavilythan C2,but exercise 11showsthatwehaveK <K,

roughly as often asK > K^henceProgramCtakes approximately

(8.5|_lgN\ +12)« foranunsuccessful search ^

bestillfasteronsomecomputers, becauseit isuniformafterthefirststep,and

itrequiresnotable ThefirststepistocompareKwithKt ,wherei=2

k=[lgN\ IfK <Ki,weuseauniformsearchwith theS'sequal to 2k~\

2~2

, 1, Onthe other hand,ifK >Ki weresetitoi'= N +1—2l

,

K >Ki1 usinga uniform search with the<P sequal to 2(_1

,2l~2

algorithms,itnevermakes more than|_lg7VJ+1comparisons; henceitmakes

inspiteof thefactthatitoccasionally goesthroughseveralredundantstepsin

Trang 26

Stillanother modification of binary search,whichincreasesthespeed ofall

alsoexercise24, foramethodthatisfaster yet

occursinsearching,whereFibonaccinumbersprovide uswithanalternativetobinary search.Theresultingmethodispreferableonsomecomputers,becauseitinvolvesonly additionandsubtraction,not divisionby2.Theprocedureweare

called“Fibonacci search,”whichisused to locate themaximumofaunimodal

function[seeFibonacci Quarterly4(1966),265-269]; the similarity ofnames

has led tosomeconfusion

TheFibonaccian search technique looks very mysteriousat firstglance,if

wesimply take theprogramandtry toexplainwhatishappening;itseemsto

treeisdisplayed.Thereforeweshallbegin ourstudy of themethod bylooking

atFibonaccitrees

Figure 8showstheFibonacci tree of order6.Itlookssomewhat morelike

areal-lifeshrubthanthe other treeswehavebeenconsidering,perhaps because

manynatural processessatisfya Fibonacci law In general, the Fibonacci tree oforderkhasF^+i—1internal(circular)nodesandFfc+iexternal (square)nodes,

andit constructed asfollows:

Ifk=0 or k=1,the.treeissimply[~0~|

.

Ifk>2,the rootisF^;theleftsubtreeistheFibonacci tree of order k—1;

andthe rightsubtreeistheFibonacci tree of order k—2withallnumbers

increasedbyF^

Trang 27

isaFibonaccinumber For example, 5=8— F4and11=8+ F4inFig.8.Whenthe differenceisFj,thecorresponding Fibonacci differenceforthenext

3=5— F3while 10=11— F2

recog-nizingthe external nodes,wearrive atthe followingmethod:

Algorithm F(Fibonaccian search) Givena table of records Ri,J?2 , • Rn

fora givenargumentK

For convenienceindescription,we assumethatN +1 isa perfect Fibonacci

suitableinitializationisprovided(seeexercise14)

FI.[Initialize.]Set*<— Fk,p<—Fk-1 ,q «—Ffc_ 2.(Throughoutthe algorithm,

F2.[Compare.]IfK < Kt ,go to step F3;ifK >Ki, go to F4;andifK =Ki,the algorithm terminatessuccessfully

F3.[Decrease*.]Ifq=0,thealgorithm terminates unsuccessfully Otherwise

seti4-i—q,andset(p,q) (q,p—q); then return to F2

F4.[Increase*.] Ifp=1,thealgorithm terminates unsuccessfully Otherwise

seti«—* -(-q,pi—p—q,then q <— q—p,andreturn to F2 |

ThefollowingMIX implementationgainsspeedbymaking twocopies of theinner loop,oneinwhich pisinrI2andqin rI3,andoneinwhichtheregistersarereversed;thissimplifiesstepF3.Infact,theprogramactuallykeepsp-1and

q-1intheregisters,instead ofp andq,inorder to simplify thetest“p—1?”

instep F4

Program F(Fibonaccian search) Wefollowtheprevious conventions, with

rA=K,rll=i,(rI2orrI3)= P~'1,(rI3orrI2)=q—1.

14 F3A DEC11,3 Cl F3.Decreasei i<—i—a.

15 DEC21.3 Cl p^p-q.

Trang 28

(Lines18-29are parallel to06-17.)

Therunning time ofthisprogramisanalyzedinexercise18.Figure 8 shows,

andthe analysis proves, that aleftbranchistakensomewhat moreoftenthanarightbranch LetC,Cl,and (C2—S)bethe respectivenumberoftimes stepsF2,F3,andF4areperformed.Then wehave

C (ave <j>k/y/5+0(1), maxk—1),

C2 — S —(ave<\>~ x

k/\J5+0(1), max|_fc/2j).

interval intotwoparts,with theleftpartabout<f>times as large as theright).Thetotalaveragerunning time ofProgramFthereforecomestoapproximately

§((18+44>)k+31-26<p)u»(7.050lgN +1.08)u (9)

fora successful search, plus(9—3<j>)ufa4.15uforanunsuccessful search.Thisis

isslightlyslower

people actually carry out a search Sometimeseverydaylifeprovides uswithcluesthat lead togoodalgorithms

beginbylookingfirstatthemiddle page, then lookingatthe 1/4 or 3/4 point,

etc.,as abinary search.It’sevenlesslikelythatyouuseaFibonaccian search!

front ofthe dictionary Infact,manydictionarieshavethumbindexes thatshow

the startingpageorthemiddle pageforthewordsbeginning with a fixedletter

speedupthe search; such algorithms are exploredinSection6.3

Yet evenaftertheinitialpoint of search hasbeenfound,your actionsstill

wordisalphabeticallymuchgreaterthanthewords onthepagebeingexamined,

Trang 29

420 SEARCHING 6.2.1

Thisisquitedifferentfromthe algorithms above,whichmake nodistinction

Suchconsiderations suggestanalgorithm thatmight becalled interpolationsearch:WhenweknowthatKliesbetweenKtandKu,wecan choose the next

that the keys arenumeric andthatthey increaseinaroughly constantmanner

throughout theinterval

Interpolation searchisasymptotically superior to binary search.Onestep ofbinary search essentially reduces theamountofuncertaintyfromnto

|n,whileone step of interpolation search essentially reducesittoi/n,whenthekeysinthetablearerandomlydistributed.Henceinterpolation search takesaboutlg lgNsteps,onthe average, to reduce the uncertaintyfromNto2.(See exercise22.)

does not decrease thenumberofcomparisons enoughtocompensatefortheextracomputingtime involved, unless the tableisratherlarge Typicalfiles

aren’tsufficientlyrandom, andthe differencebetweenlglgNandlgNisnotsubstantial unlessNexceeds,say,216=65,536.Interpolationismostsuccessful

inthe early stages of searching a large possibly externalfile;aftertherange has

dictionarylookup byhandisessentiallyanexternal, notaninternal,search.We

shalldiscuss external searchinglater.)

thatwassorted into order tofacilitatesearchingistheremarkableBabylonian

reciprocal table of Inakibit-Anu, datingfrom about200 B.C This clay tabletcontainsmore than100 pairs of values,which appeartobethe beginning of

alistofapproximately 500 multiple-precision sexagesimalnumbers andtheirreciprocals, sorted into lexicographic order.For example, thelistincluded thefollowing sequence ofentries:

puter Science(CambridgeUniv Press, 1996),Chapter11, forfurtherdetails.]

It fairlynatural tosortnumerical values into order, butanorder relation

sequenceforindividualletterswaspresent alreadyinthemostancient

alphabetic sequence, thefirstverse startingwith aleph, the second with beth,

wasusedbySemiticand Greekpeoples to denote numerals;forexample, a, 7

Trang 30

6.2.1 SEARCHING AN ORDERED TABLE 421

Theuse of alphabetic orderforentirewords seemstobe amuchlater

invention;it somethingwe mightthinkisobvious, yetithas tobetaught

to children,andatsomepointinhistoryitwasnecessary to teachitto adults.Severallistsfrom about300 B.C havebeen found ontheAegeanIslands,giving

but onlybythefirstletter,thusrepresentingonly thefirstpass ofa

left-to-rightradixsort SomeGreekpapyrifromthe years A.D.134-135 containfragments of ledgers thatshowthenamesoftaxpayers alphabetizedbythefirsttwoletters.Apollonius Sophista used alphabetic orderonthefirsttwoletters,andoftenonsubsequentletters,inhislengthyconcordance ofHomer’spoetry

notably Galen’s Hippocratic Glosses(c.200), but they are veryrare.WordswerearrangedbytheirfirstletteronlyintheEtymologiarumofSt.Isidorus(c.630,

word.Thelattertwo workswere perhaps the largestnonnumericalfilesofdata

tobe compiled during theMiddleAges

description of true alphabetical order Inhispreface,Giovanni explained that

amo precedes bibo

abeo precedes adeo

polisintheton precedes polissenus

(thereby givingexamplesof situationsinwhichthe orderingisdetermined bythe

effortwasrequired to devise theserules.“Ibegof you, therefore,goodreader,

donot scornthisgreatlabor ofmine andthisorder assomethingworthless.”

Adetailedstudy of thedevelopmentofalphabetic order,uptothetimeprintingwasinvented,hasbeenmadebyLloydW.Daly

[CollectionLatomus

90(1967),100 pages] Hefoundsomeinteresting oldmanuscripts that wereevidentlyused as worksheets while sortingwords bytheirfirstletters(seepages

(Lon-don, 1604), contains the following instructions:

Noweifthe word,which thouartdesirous tofinde,beginne with(a)thenlookeinthebeginning ofthisTable,butifwith(v)looketowards the end.Againe,ifthywordbeginnewith(ca)lookeinthebeginning of theletter(c)butifwith (cu) then looketowardtheendofthatletter.Andso ofall

therest.&c

hisdictionary;numerousmisplacedwords appear onthefirstfew pages, but the

Trang 31

422 SEARCHING 6.2.1

Techniques for the Design of Electronic DigitalComputers,editedbyG.W.terson, 1 (1946),9.7-9.8;3 (1946), 22.8-22.9],The method becamewellknown

176(1955), 565; A.I.Dumey, Computers and Automation5(December1956),7,

(February 1958),1-3.]

D H.Lehmer[Proc.Symp.Appl.Math 10(1960), 180-181]wasapparently

stepwastakenbyH.Bottenbruch[JACM9 (1962),214],whopresentedan

interesting variation ofAlgorithmBthat avoids a separatetest forequalityuntil

thevery end:Using

it—|"(f+u)/2]

instead ofi<—[(/+u) / 2JinstepB2, hesetl4—iwheneverK > Kpthen

u-ldecreasesateverystep.Eventually,whenl=u,wehaveKt< K <Ki +j,

comparison.(HeassumedthatK > Kxinitially.)This idea speedsupthe inner

ofthe algorithmswehave discussedin this section;but a successful searchwill

requireaboutonemoreiteration,onthe average, because of(2).Since the inner

andafasterloop does not save time unlessnisextremelylarge.(See exercise23.)

Onthe otherhandBottenbruch’s algorithmwillfindtherightmost occurrence of

a givenkeywhenthe table contains duplicates,andthispropertyisoccasionallyimportant

K E Iverson [AProgramming Language(Wiley, 1962),141]gave the

proce-dure ofAlgorithmB,butwithout considering thepossibilityofanunsuccessfulsearch D E.Knuth[CACM6 (1963), 556-558] presentedAlgorithmBas

search,AlgorithmC,wassuggested to the authorbyA.K.ChandraofStanfordUniversityin1971

Fibonaccian searchingwasinventedbyDavidE.Ferguson[CACM3 (1960),

AFibonacci tree without labelswasalsoexhibited as a curiosityinthefirst

edition ofHugoSteinhaus’s popularbook Mathematical Snapshots (NewYork:Stechert,1938),page28;hedrewitupsidedownandmadeitlooklikeareal

Interpolation searchingwassuggestedbyW W.Peterson[IBMJ.Res.&

Devel 1(1957),131-132], Acorrect analysis ofitsaverage behaviorwasnot

Trang 32

1.[21]Prove thatifu linstepB2ofthe binarysearch,wehaveu=l—1and

theseartificialkeysareneverreallyused by the algorithmsothey need not be present

inthe actualtable.)

(a)changed stepB5to“Zf- i”insteadof“Z -f- i+1”?(b)changedstepB4to i”insteadof «—i—1”?(c)madebothof thesechanges?

3.[15]Whatsearchingmethodcorrespondstothetree

Whatisthe averagenumberofcomparisonsmadeinasuccessfulsearch?unsuccessful search?

has lengthlessthansomejudiciouslychosenvalue.WriteanefficientMIXprogramfor

such a search and determine the best changeovervalue

7

a)bothiandmare setequalto|_iV/2J ?

b)bothiandmare setequalto\N/ 2]?

[Hint:Suppose thefirststepwere“Seti<-0,m4-N(orN +1),gotoU4.”]

8 [M20]Let5j=DELTA[j]be the jth incrementinAlgorithm C,asdefined in (6)

b)Whatare theminimumandmaximumvalues ofithatcan occurinstepC2?

9 [20]Isthereany valueofN> 1forwhich AlgorithmBandCareexactly

equivalent, inthe sense that theywillboth perform thesamesequenceofcomparisons

forallsearcharguments?

10 [21] ExplainhowtowriteaMIX programforAlgorithmCcontaining imately 7lgNinstructionsand having a running timeofabout4.5 lgNunits

approx-11 [M26]Find exact formulasfortheaverage valuesofCl, C2,andAinthe

fre-quencyanalysis ofProgramC, a functionofNandS

12.[20] Drawthebinary searchtreecorrespondingtoShar’smethod whenN =12

13 [M24] Tabulate the averagenumberofcomparisonsmadeby Shar’s method,for

1< N<16,consideringbothsuccessfuland unsuccessfulsearches

14.[21] Explainhowtoextend AlgorithmFsothatitwillapplyforallN >1.

15.[M19] Forwhatvalues ofk does the Fibonaccitreeoforder kdefinean optimal

Trang 33

424 SEARCHING 6.2.1

16 [21]Figure 9shows thelinealchart oftherabbits inFibonacci’soriginalrabbitproblem(seeSection1.2.8) Istherea simplerelationshipbetweenthisand theFibonaccitreediscussedinthetext?

wherer>1,a3>a,j++2 for1<j<r,and aT>2.Prove thatintheFibonaccitree

oforderk,thepath from the roottonode (n) has length k+1-r-ar

18.[M30]Find exact formulasfortheaverage valuesofCl,C2,andAinthequencyanalysis ofProgramF, asa functionof k,Fk ,Fk+1 ,andS

fre-19.[M42]Carry out adetailed analysis ofthe averagerunning timeofthealgorithmsuggestedinexercise 14

20.[M22]The numberofcomparisons requiredina binary searchisapproximately

log2N,andintheFibonaccian searchit roughly(<f>/\/5)log

0N.Thepurposeofthisexerciseistoshowthat these formulas arespecialcases ofamoregeneralresult

Let p andqbepositivenumbers with p+q=1.Consider a search algorithmthat,

given atable ofNnumbersinincreasing order, startsby comparing the argument withthe(plV)th key,anditerates thisprocedure on the smallerblocks.(The binary searchhas p=q=1/2;theFibonaccian search hasp=1/</>,q=l//>2

factthatpNandqNaren’texactlyintegers

a)ShowthatC(N)=logbNsatisfiesthese relations exactly, foracertainchoiceofb.For binaryand Fibonacciansearch, thisvalueof bagreeswith the formulas derived

earlier

b)Consider thefollowingargument: “Withprobability p,thesizeoftheinterval

beingscannedin thisalgorithmisdividedby1/p;withprobabilityq,theintervalsizeisdividedby1/q.Therefore theintervalisdividedby p•(1/p)+q (1 /q)=2

on theaverage, sothealgorithmisexactlyasgoodasthe binarysearch, regardless

Trang 34

6.2.1 SEARCHING AN ORDERED TABLE 425

21.[20]Drawthebinarytreecorrespondingto interpolationsearchwhenN =10

22.[M41](A.C.YaoandF.F.Yao.) Showthatan appropriate formulationofinterpolationsearchrequiresasymptoticallylg lgNcomparisons,on theaverage,when

appliedtoNindependent uniformrandomkeys thathavebeensorted.Furthermoreallsearchalgorithms on suchtablesmustmakeasymptoticallylg lgNcomparisons,ontheaverage

23.[25]Thebinary search algorithmofH Bottenbruch,mentionedatthecloseofthis section,avoidstesting for equality untiltheveryendofthesearch.(During thealgorithmweknowthatKi<K<Ku+i,and the caseof equalityisnotexamined

untill=u.)Such atrickwouldmake ProgramBrun alittlebitfaster for largeN,

sincethe “JE”instructioncouldberemoved from theinner loop.(However, the ideawouldn’treallybepractical since lgNisalways rathersmall;wewould needN >266

inordertocompensateforthe extrawork necessary on asuccessful search,because therunning time(18\gN -I6)uof(5) is“decreased”to (17.5lgiV+17)it!)Showthat every search algorithm correspondingtoa binarytreecanbe adaptedto

a search algorithm that uses two-way branching(<versus>)attheinternalnodesof

thetree,inplace ofthethree-way branching(<,=,or>)usedinthetext’sdiscussion

In particular,showhowtomodify AlgorithmCinthisway

24. [23] Wehave seeninSections2 3.4.5and5.2.3that thecomplete binarytreeis

a convenientwaytorepresent aminimum-path-lengthtree inconsecutivelocations

Deviseanefficientsearchmethodbasedonthisrepresentation.[Hint: Isitpossible to

usemultiplicationby2insteadof divisionby2 ina binarysearch?]

25. [M25]Suppose that a binarytreehasa*,internalnodesandbkexternalnodes

(fflo, ffli, • 05)=(1,2, 4,4,1,0)and(60,61, ,65)=(0,0,0,4,7, 2).

a)Showthata simplealgebraic relationshipholdsbetween the generating functionsA(z)= HkakzkandB(Z)= Hk&***•

b)Theprobability distribution forasuccessfulsearchinabinarytreehas the erating functiong(z)=zA(z) /N,andforan unsuccessful search the generatingfunctionish(z)=B(z)/(N+1). (Thusinthetext’snotationwehaveCn =

gen-mean(g),C'N=mean(A;,),and Eq.(2)givesarelationbetween thesequantities.)

Find arelationbetween var(g)and var(h)

26 [22] ShowthatFibonaccitreesare related topolyphasemergesortingon three

Prove that such a process must always takeat leastapproximatelylogfc+1Nsteps

on theaverage, asN—>00,assuming that each keyofthetableisequallylikelyasasearchargument (Hence thepotential increase inspeed over1-processorbinary search

isonly afactor oflg(fc+1),not thefactorofkwemightexpect In thissenseit ismoreefficientto assigneach processortoadifferent,independent search problem, insteadof

Trang 35

28.[M23] DefineThuetreesTn bymeansofalgebraicexpressionsina binary

opera-tor * as follows:T0 (x)=x* x,Tx(x)=x,Tn+2(x)=T„ +1(x) *Tn (x)

outinfull.ExpressthisnumberintermsofFibonacci numbers

b)Prove thatifthebinary operator* satisfiestheaxiom

((x * x) * x) * ((x * x) *x)=X,

thenI'm(Tn(x))= Tm+n_i(x)forallm>0andn>1.

29.[22](PaulFeldman,1985.) Insteadofassuming thatK\ <K2< < Kn,

assume only thatKp( X )<Kp( 2 )<• <KP^N)where the permutation p(l)p(2). p(N)

isaninvolution,andp(j)=j foralleven valuesofj.Showthatwecanlocateany givenkeyK,ordetermine thatKisnotpresent,bymakingatmost 2[lglVJ+1comparisons

30 [27] (Involutioncoding.) Using the ideaofthe previousexercise, findawayto

arrangeNdistinctkeysinsuch awaythattheir relativeorderimplicitlyencodes an

arbitrarilygiven arrayoft-bitnumbersxi,x2 , ,xm,whenm < N/4+1-2*.

Withyourarrangementitshouldbepossible todetermine the leading kbitsof Xjby

<2[lgN]+1comparisons (Thisresultisusedintheoretical studies ofdatastructures

thatareasymptoticallyefficientinboth time andspace.)

6.2.2.BinaryTreeSearching

In the precedingsection,welearned thatanimplicitbinary tree structuremakes

thebehavior of binary searchandFibonaccian search easier to understand For agiven value ofN,the treecorresponding to binary search achieves the theoretical

minimumnumberofcomparisons that are necessary to search a tablebymeans

ofkey comparisons.Butthemethodsofthe preceding section are appropriate

insertionsanddeletions rather expensive.Ifthe tableischanging dynamically,

Theuse ofanexplicitbinary tree structuremakesitpossible toinsertand

deleterecords quickly, as well as to search the tableefficiently.Asaresult,we

essentiallyhave amethodthatisusefulbothforsearchingandfor sorting.Thisgainin flexibilityisachievedbyaddingtwolinkfieldstoeach record of thetable

Techniquesforsearching a growing table are often called symboltable

algo-rithms,because assemblersandcompilersandothersystemroutines generally

each record within a compilermight beasymbolicidentifierdenoting a variable

insomeFORTRANorCprogram,andtherestofthe recordmightcontaininformationaboutthe type of that variableanditsstorage allocation.Orthekey

equivalent of that symbol.Thetreesearchandinsertion routines tobe described

in thissection are quiteefficient foruse assymboltable algorithms, especiallyin

applicationswhereit isdesirable to print out alistofthesymbolsinalphabeticorder.Other symboltablealgorithms are describedinSections 6.3and6.4

Figure 10showsa binary search tree containing thenamesofeleven signs of

Trang 36

) |4| LEO (^SCORPIO^) VIRG0~~^)

Fig 10.Abinary searchtree

root orapexofthetree,wefindit isgreaterthan CAPRICORN,sowemovetothe

right;it isgreaterthanPISCES, sowemoverightagain;it lessthanTAURUS, so

wemoveleft;andit islessthanSCORPIO, sowearriveatexternalnode[~8~|

VIRGO, LIBRA, SCORPIO,inthisorder

All ofthekeysintheleftsubtree of the rootinFig.10 are alphabetically

Asimilarstatement holdsfortheleftandrightsubtrees of every node.Itfollowsthat the keysappearin strictalphabetic sequencefromlefttoright,

ifwetraversethe treeinsymmetricorder(seeSection2.3.1),sincesymmetric

orderisbasedontraversing theleftsubtree of eachnodejustbefore that node,then traversing the right subtree

Thefollowing algorithmspellsout the searchingandinsertion processesindetail

Algorithm T(Treesearchandinsertion).Givena table of records thatformabinary tree as described above,thisalgorithm searchesfora givenargumentK

IfKisnotinthetable,anew nodecontainingKisinserted into the treeinthe

Trang 37

Thenodes of the tree areassumedtocontainat leastthe followingfields:

Null subtrees (the external nodesinFig 10) are representedbythenullpointer A

ThevariableROOTpoints to the root of thetree.For convenience,we assume

that the treeisnotempty(thatis,ROOT7^A), sincethe necessary operationsaretrivialwhen ROOT=A

Tl.[Initialize.]SetP4—ROOT.(Thepointer variable Pwillmovedownthetree.)T2.[Compare.] IfK <KEY(P),go to T3;ifK >KEY(P),go to T4;andif

K =KEY(P), the search terminatessuccessfully

Otherwise go to T5

shouldalsobeinitialized.)IfKwaslessthanKEY(P),setLLINK(P)4—Q,otherwisesetRLINK(P)4—Q (AtthispointwecouldsetP4—Qand

terminate the algorithmsuccessfully.) |

Fig 11 Tree searchandinsertion

This algorithm lendsitselftoa convenientmachinelanguage

KEY

followedperhapsbyadditionalwordsofINFO.Using anAVAILlistforthefree

Trang 38

6.2.2 BINARYTREESEARCHING 429

Program T(Treesearchandinsertion) rA=K.rll=P,rI2=Q

12 LD2 0,1 (LLINK) Cl - S T3.Moveleft.0<- LLINK(P)

23 1H ST2 0,1 (LLINK) 1-S-A LLINK(P)<-Q.

where

5 =[searchissuccessful]

OntheaveragewehaveCl =|(C+5), sinceCl + C2 = CandCl - 5hasthesameprobability distribution asC2;sotherunning timeisabout(7.5C—

animplicit tree(seeProgram6.2.1C).Byduplicatingthecode asinProgram6.2.IFwecouldeffectivelyeliminateline08 ofProgramT,reducing the runningtime to (6.5C—2.55+5)u.Ifthe searchisunsuccessful,the insertionphase of

vari-able-length records.Forexample,ifweallocatethe availablespace sequentially,

inalast-in-first-outmanner,we caneasilycreatenodes of varyingsize;thefirstwordof(l)could indicate thesize Sincethisisanefficientuse of storage,

Trang 39

430 SEARCHING 6.2.2

But whatabout the worst case? Programmersare often skeptical of rithmTwhentheyfirstseeit. Ifthekeys of Fig 10had beenentered intothe treeinalphabetic orderAQUARIUS,. VIRGOinstead of the calendar order

essentiallyspecifiesa sequential search AllLLINKs would benull.Similarly,ifthekeyscomeintheuncommonorder

weobtain a “zigzag” tree thatisjust asbad.(Tryit!)

Onthe other hand, the particular treein Fig.10 requires only 3-q-

com-parisons,onthe average,fora successful search;thisisjustalittlehigherthan

theminimumpossibleaveragenumberofcomparisons,3,achievableinthe bestpossiblebinarytree

Whenwehave afairlybalancedtree,the search timeisroughly tionalto logN,butwhen wehave a degeneratetree,the search timeisroughlyproportional toN.Exercise2 3 4.5-5 proves that the average search timewould

propor-be roughly proportional toy/Nifweconsidered eachW-nodebinary tree tobe

equallylikely.Whatbehaviorcanwereallyexpectfrom AlgorithmT?

Fortunately,itturns out that tree searchwillrequireonlyabout2lnN &

1.386IgNcomparisons,ifthe keys are inserted into the treeinrandomorder;well-balancedtreesarecommon, anddegenerate trees are veryrare

theAT!possible orderings of theNkeysisanequallylikelysequence of insertions

forbuilding thetree.Thenumberofcomparisonsneededto finda keyisexactly

entered into thetree ThereforeifCjvistheaveragenumberofcomparisonsinvolvedina successful searchandC'Nisthe averagenumberinanunsuccessfulsearch,wehave

Butthe relationbetweeninternalandexternalpathlengthtellsus that

thisisEq 6.2.1-( 2).Putting(3)togetherwith(2

Trang 40

SinceCq=0, thismeansthat

C'N — 2H n+i-2.

(5)

Exercises6, and8belowgivemoredetailed information;it ispossible to

values

Treeinsertion sorting AlgorithmTwasdevelopedforsearching,butitcanalsobe used as the basis ofaninternal sorting algorithm;in fact,we can view

itasa natural generalization oflistinsertion,Algorithm5.2.1L.Whenproperly

bestalgorithmswediscussedinChapter5.After the tree hasbeenconstructed

for allkeys,asymmetrictree traversal(Algorithm2.3.IT)will visitthe records

insorted order

Afew precautions are necessary, however.Somethingdifferentneeds tobe

solutionisto treatK =KEY(P)exactly asifK >KEY(P);thisleads to a stablesortingmethod.(Equal keyswillnot necessarilybeadjacentinthetree;theywill

onlybe adjacentinsymmetricorder.)Butifmanyduplicate keys are present,

thesamekey;thisrequiresanother linkfield,butitwillmakethe sortingfasterwhenalotofequal keys occur

Thusifweareinterested onlyin sorting,notinsearching,AlgorithmTisn’t

the best, butitisn’tbad.Andifwehaveanapplication thatcombinessearching

It interesting to note that thereisa strong relationbetweenthe analysis

of tree insertion sortingandthe analysis of quicksort, although themethods

are superficiallydissimilar IfwesuccessivelyinsertNkeys intoaninitially

every key getscomparedwithKi, andthen every keylessthanKxgetscompared

with thefirstkeylessthan Ki,etc.;inquicksort,everykey getscomparedto

toa particular elementlessthanK,etc Theaveragenumberofcomparisons

a fewmorecomparisons,inorder tospeedupthe innerloops.)

entriesitknows.Wecan easily delete anodeinwhicheitherLLINKorRLINK=A;

Tiêu đề	Searching
Trường học	University of California, Berkeley
Chuyên ngành	Computer Science
Thể loại	Book
Năm xuất bản	2011
Thành phố	Berkeley

Định dạng
Số trang	400
Dung lượng	12,58 MB