Mostofthischapterisdevoted to the study of a very simple search problem: howtofindthedata that hasbeenstoredwith a givenidentification.. | Theanalysis ofthisprogramisstraightforward;itsh
Trang 1Let’slookattherecord
— AL SMITH(1928)
Thischaptermighthavebeengiven themorepretentioustitle“Storageand
Retrieval of Information”;onthe other hand,itmightsimply havebeencalledTableLook-Up.”Weareconcerned with the process of collecting information
recovered as quickly as possible.Sometimes weareconfronted withmoredata
thanwecanreally use,anditmaybe wisest to forgetandto destroymostofit;butatother timesit isimportant to retainandorganize the givenfacts insuch
awaythatfast retrievalispossible
Mostofthischapterisdevoted to the study of a very simple search problem:
howtofindthedata that hasbeenstoredwith a givenidentification Forexample,ina numerical applicationwe might wantto find /(x),givenxand
a table of the values of/; inanonnumericalapplication,we might wanttofind
theEnglish translation of a given Russian word
In general,weshallsuppose that asetofNrecords hasbeenstored,and
isespecially appropriate, becausemanypeoplespenda great deal of time every
daysearchingfortheir keys.Wegenerally require theNkeys to bedistinct,sothateach key uniquelyidentifies itsrecord.Thecollection ofallrecordsiscalled
atableorfile,wheretheword“table”isusuallyused to indicate a smallfile,
and“file”isusually used to indicate a largetable.Alargefileoragroupoffiles
isfrequently called a database
Algorithmsforsearching are presented with a so-called argument, K,andthe
twopossibilitiescanarise:Either the searchwassuccessful,having located theunique record containing K; oritwasunsuccessful,havingdeterminedthatK
enteranewrecord,containing K, into thetable;amethodthatdoesthisiscalled
a search- and-insertion algorithm.Somehardwaredevicesknownas associative
the functioning of ahumanbrain;butweshallstudy techniquesforsearching
ona conventional general-purposedigitalcomputer
associated with K, the algorithmsin thischapter generally ignore everything but
Trang 26 SEARCHING 393
thekeys themselves.In practicewecan find the associated data oncewehavelocatedK;forexample,ifKappearsinlocationTABLE+i,the associateddata
(ora pointer toit)mightbeinlocationTABLE+*+1,orinDATA+i,etc.It
thereforeconvenient to gloss over thedetailsofwhatshouldbedoneafterKhas
beensuccessfullyfound
Searchingisthemost time-consumingpart ofmanyprograms,andthesubstitution ofagoodsearchmethodforabadone often leads to a substantialincreaseinspeed Infactwecan often arrange the data or the data structure
sothatsearchingiseliminatedentirely,byensuring thatwealwaysknowjust
achievethis; forexample, adoublylinkedlistmakesitunnecessary to searchfor
the predecessor or successor of a given item Anotherwaytoavoid searchingoccursifweareallowed tochoose the keysfreely,sincewe mightas welllet
beplacedinlocationTABLE+K.Bothofthesetechniqueswere used to
elimi-nate searchingfromthe topological sortingalgorithm discussedinSection2.2.3
sortingalgorithmhad beengivensymbolicnamesinstead ofnumbers.Efficient
algorithmsforsearchingturn out tobequiteimportantinpractice.Searchmethods can beclassified inseveralways Wemightdividethem
into internalversus external searching, just aswedivided the sorting algorithms
contents ofthe table are essentiallyunchanging(sothatit isimportant toimizethe searchtime without regardforthetime required tosetupthetable),
alsodeletions.Athirdpossibleschemeistoclassifysearchmethodsaccording to
the keys,analogous to the distinctionbetweensortingby comparisonandsorting
bydistribution.Finallywe mightdividesearching intothosemethodsthatusethe actualkeysandthose thatworkwith transformed keys
Theorganization ofthischapterisessentiallyacombinationofthelattertwo
search,then Section 6.2 discusses theimprovementsthatcanbemadebasedon
comparisonsbetweenkeys,using alphabetic ornumericorder togovern thesions.Section 6.3 treatsdigitalsearching,andSection 6.4 discussesanimportant
ofthe actual keys Eachofthese sections treatsbothinternalandexternalsearching,inboththestaticandthedynamiccase;andeach section points out
Searchingandsortingareoften closely related toeach other.Forexample,consider the followingproblem:Giventwosetsofnumbers,A ={aj, 02,
am}andB ={bi,b%, bn determinewhetherornotA CB Threesolutions
Trang 3394 SEARCHING 6
2.Sortthea’sand6’s,thenmakeone sequential passthrough bothfiles,checking the appropriate condition
3.Enter the 6/sinatable,then searchforeach of thea,
Eachof these solutionsisattractiveforadifferentrange of values ofmandn
Solution1willtake roughlyCimnunits of time,forsomeconstanta, and
solution 2willtakeaboutC2 (to lgm +nlgn)units, forsome(larger)constantc.-i
Witha suitable hashingmethod,solution 3willtake roughlyc3m +c4nunits oftime,forsome(stilllarger)constantsc3andc4.Itfollowsthat solution1isgoodforvery smalltoandn,but solution 2 soonbecomesbetter asmandn grow
memorysize;then solution 2isusually again superioruntilngetsmuchlargerstill.Thus wehave a situationwheresortingissometimesagoodsubstitutefor
searching,andsearchingissometimesagoodsubstitutefor sorting.Morecomplicated searchproblemscan often be reduced to the simpler caseconsidered here For example, suppose that the keys arewordsthatmight be
lexicographic orderandanotherinwhichthey are orderedfromright toleft(as
agreeupto half ormoreofitslengthwithanentryinone of thesetwofiles.The
searchmethodsof Sections 6.2and6.3canthereforebe adaptedto find thekeythatwasprobably intended
Arelatedproblemhas received considerable attentioninconnection with
whenthereisagoodchance that thenamewillbemisspelledduetopoor
handwriting or voice transmission.Thegoalistotransform theargumentinto
somecode that tends to bring togetherallvariants of thesame name The
followingcontemporary formof the“Soundex” method,a technique thatwas
originallydevelopedby MargaretK.Odelland RobertC.Russell[seeU.S.Patents1261167(1918),1435663(1922)],has oftenbeenusedforencodingsurnames:
1.Retain thefirstletterof thename, and dropalloccurrences ofa, h,i,o,
3.Iftwoormoreletterswith thesamecode were adjacentinthe originalname
(before step1),oradjacent exceptforinterveningh’sandw’s,omitallbutthefirst.
4.Convert to theform“letter,digit,digit, digit”byaddingtrailingzeros(ifthere arelessthanthreedigits),orbydropping rightmostdigits(ifthere
Trang 4Forexample, thenamesEuler,Gauss, Hilbert,Knuth,Lloyd, Lukasiewicz,and
Ofcoursethissystemwillbring togethernamesthat aresomewhatdifferent,
aswell asnamesthat aresimilar;thesameseven codeswould beobtainedfor
handafew relatednameslikeRogersandRodgers, or SinclairandSt.Clair,orTchebysheffand Chebyshev, remainseparate But by andlargetheSoundex
code greatly increases the chance of finding anameinone ofitsdisguises.[Forfurtherinformation, see C.P.Bourne andD F Ford,JACM8 (1961),538-
552;LeonDavidson,CACM5 (1962),169-171;Federal Population Censuses
Whenusing aschemelikeSoundex,we neednot giveuptheassumption
thatallkeys aredistinct;we can makelistsofallrecordswith equivalent codes,treatingeachlistasaunit
people oftenwanttoconsidermanydifferent fieldsofeach record as potentialkeys,with theabilitytolocateitemswhenonly part of the key informationis
talentandaFrench accent; given a largefileofbaseballstatistics,a sportswriter
maywish to determine the totalnumberofruns scoredbytheChicagoWhiteSoxin1964,during the seventh inning of night games, against left-handedpitchers.Givena largefileofdataaboutanything, peopleliketoask arbitrarilycomplicated questions Indeed,we mightconsideranentirelibrary asa database,
anda searchermaywanttofindeverything that hasbeenpublishedabout
informationretrieval.Anintroduction to the techniquesforsuchsecondary key(multi-attribute)retrievalproblemsappearsbelowinSection6.5
Before entering into a detailed study of searching,itmaybehelpfultoputthingsinhistoricalperspective Duringthepre-computerera,manybooksoflogarithmtables,trigonometrytables, etc.,were compiled, so thatmathematical
calculationscouldbereplacedbysearching.Eventually these tables were ferred topunchedcards,andusedforscientificproblemsinconnectionwith
forsearching.Withsmall internalmemories,andwith nothing but sequential
almost impossible
the1950s eventually led to the recognition that searchingwas aninteresting
ofspaceinthe early machines,programmerswere suddenly confronted with
Trang 5Thefirstsurveys of the searchingproblemwere publishedbyA.I.Dumey,
J.Research<fcDevelopment1 (1957), 130-146; A D Booth, Informationand
Control1(1958),159-164; A.S.Douglas,Comp.J.2(1959), 1-9 More
extensivetreatments were givenlaterby KennethE Iverson,AProgramming
ontreestructureswere introduced, asweshall see;andresearchaboutsearching
is stillactivelycontinuingatthe present time
6.1.SEQUENTIAL SEARCHING
This sequential procedureisthe obviouswayto search,anditmakesa usefulstarting pointforour discussion of searching becausemanyofthemoreintricatealgorithms are basedonit.Weshallseethat sequential searching involvessome
very interestingideas, inspite ofitssimplicity
Thealgorithmmight beformulatedmoreprecisely asfollows:AlgorithmS(Sequential search) Givena table of records Ri,R2, Rn,
Trang 66.1 SEQUENTIALSEARCHING 397
ProgramS(Sequential search) AssumethatKiappearsinlocationKEY+i,
andthat theremainderofrecord Ri appearsinlocationINFO+i.Thefollowing
AtlocationSUCCESS, the instruction “LDAINF0+N.1”willnowbringthe desiredinformation into rA |
Theanalysis ofthisprogramisstraightforward;itshowsthat therunningtime ofAlgorithmSdepends on twothings,
K =Ki,wehaveC i,S —1;hence the total timeis(5i+l)u.Ontheother
handifthe searchisunsuccessful,wehaveC N,S =0,fora total time of
(5N +3)u.Ifevery inputkey occurs with equal probability, the average value
ofCina successful searchwillbe
straightforwardchangemakesthealgorithmfaster,unlessthelistofrecordsisquite short:
Algorithm Q(Quicksequential search) This algorithmisthesameasrithmS,except thatitassumesthepresence ofadummyrecordRn+iatthe
Algo-endofthefile.
Ql.[Initialize.]Seti<—1,andsetK^+i K
Q2.[Compare.]IfK —Ki, go toQ4
Q3.[Advance.]Increaseiby1andreturn to Q2
terminates unsuccessfully(i= N +1) |
Program Q(Quicksequential search) rA=K,rll=*—N
Ql.Initialize
Trang 7Interms of the quantitiesCand5inthe analysis ofProgramS,the runningtime has decreased to(4C—45+10)it;thisisan improvement wheneverC >6
ina successful search,and wheneverN >8inanunsuccessful search
Thetransitionfrom AlgorithmS toAlgorithmQmakesuse ofantantspeed-upprinciple:Whenaninner loop of aprogramteststwoormore
impor-conditions,weshould try to reduce the testing to just one condition
ProgramQ'(Quicker sequential search). rA=K,rll=i—N
Theinner loop hasbeenduplicated;thisavoidsabouthalf of the +1”
instructions, soitreduces the running time to
3.5C -3.55+10+(C ~ S)mod2
2
tables arebeing searched;manyexistingprograms can be improvedin thisway
D E.Knuth, ComputingSurveys 6 (1974), 266-269.]
areinincreasing order:
Algorithm T(Sequential search in orderedtable). Givena table of records
thisalgorithm searchesfora givenargumentK For convenienceandspeed,the algorithmassumesthat thereisadummyrecordFI_\ + iwhosekey valueis
Kn+i =oo>K
Tl.[Initialize.]Seti+-1.
T2.[Compare.]IfK <AT,,go to T4
Trang 86.1 SEQUENTIALSEARCHING 399
T4.[Equality?]IfK —Ki, the algorithm terminatessuccessfully.Otherwiseitterminates unsuccessfully |
If allinput keys are equallylikely,thisalgorithm takes essentially thesame
averagetime asAlgorithmQ,fora successful search.Butunsuccessful searches
morequickly
Eachofthealgorithmsaboveuses subscripts todenote the tableentries.It
isconvenient to describe themethodsinterms of these subscripts, but thesame
searchprocedures canbeusedfortablesthathave a linked representation, sincethedataisbeing traversed sequentially (See exercises2, 3,and4.)Frequencyofaccess.Sofarwehavebeen assumingthateveryargumentoccurs
asoften asevery other Thisisnot always arealisticassumption;inageneralsituation,keyKjwilloccurwith probabilitypj,wherePi+P2+ + Pn =1-
Thetime required todoa successful searchisessentiallyproportional to the
C N =px+2p2-\ fNp N.
(3)
Ifwehave the option of putting the records into the tableinanydesired order,
thisquantityCVissmallestwhen
thatis,whenthemostfrequentlyused recordsappearnear the beginning.Let’slookatseveral probability distributions,inorder to seehow muchofasavingispossiblewhenthe records arearrangedinthe optimalmannerspecified
,byexercise7;theaveragenumberofcomparisonsis
lessthantwo,forthisdistribution,ifthe recordsappearintheproper orderwithin thetable
2
Pi=Nc, p2— (N —l)c, , pjv=c, wherec=
+ (6)
as (5).Inthiscasewefind
Trang 9Ofcourse the probability distributionsin(5
)and(6)are ratherartificial,andtheymayneverbeaverygood approximationtoreality.Amoretypicalsequence ofprobabilities,called “Zipf’s law,”has
(8)This distributionwaspopularizedbyG K.Zipf,whoobserved that thenth most
commonwordinnaturallanguage textseemstooccurwith a frequency imately proportional to 1/n [The Psycho-Biology ofLanguage(Boston, Mass.:
(Reading, Mass.:Addison-Wesley,1949).] Heobserved thesame phenomenon
incensustables,whenmetropolitan areas arerankedinorder of decreasingpopulation.IfZipf’slaw governs the frequency of the keysinatable,wehave
immediately
C N = N/H n; (9)searchingsuch afile isabout|InAtimesfasterthansearching thesamefile
Cleave,MechanicalResolution of LinguisticProblems(NewYork:Academic
Press, 1958),79.]
that hascommonly beenobservedincommercialapplications[see,forexample,
W.P.Heising,IBMSystemsJ.2 (1963), 114-115] This rule states that 80 cent of the transactions deal with themostactive20 percent of afile;andthe
per-samerule appliesinfractalfashion to thetop 20 percent, so that 64 percent ofthe transactions deal with themostactive 4 percent,etc.In other words,
Onedistribution thatsatisfiesthisruleexactlywhenevernisa multiple of 5is
since Pi+P2+• +pn—cn&for allnin this case It not especially easy
toworkwith the probabilitiesin(11);wehave, however,n6—(n—l)9 =
(l+ 0(l/n)),so thereisa simpler distribution that approximatelyfulfills
the 80-20rule,namely
Trang 10varyfromauniformdistribution toa Zipfian one.Applying(3)to
Astudy ofwordfrequencies carried outbyE.S.Schwartz[seethe interesting
slightlynegative value of 9 gives a betterfittothedatathanZipf’slaw(8).In
thiscasethemeanvalue
(-)
issubstantially smallerthan(9)asN —00.
Distributionslike(11)and(13)werefirststudiedbyVilfredoParetoin
connection withdisparitiesofpersonalincome andwealth [Coursd’Economie
Politique 2 (Lausanne:Rouge,1897),304-312] Ifpkisproportional to thewealth of thefcthrichestindividual,the probability that a person’s wealthexceeds or equalsxtimes the wealth of the poorest individualisk/Nwhen
,the stated probability
isx 1 /(1-0
);thisisnowcalledaPareto distribution withparameter1/(1—9)
Curiously,Pareto didn’tunderstandhisowndistribution;he believed that
a value of 9 near 0wouldcorrespond to amoreegalitariansocietythanavaluenear1! His errorwascorrectedby CorradoGini[Attidella IIIRiunione
dellaSocieta Italiana perilProgressodelleScienze(1910),reprintedinhis
person to formulateandexplain the significance ofratios likethe 80-20law(10).Peoplestilltend tomisunderstandsuch distributions; they often speakabouta
“75-25 law” or a “90-10 law” asifana-blawmakessenseonlywhena+b=100,while(12
G.UdnyYulewhenhe studied the increaseinbiological species as a function oftime,assumingvariousmodelsofevolution[Philos.Trans.B213(1924), 21-87],Yule’s distribution applieswhen9<2:
Thelimiting valuec=1/Hprorc— l/Nisusedwhen9=0 or 9=1
A“self-organizing”file Thesecalculationswith probabilities are verynice,
butinmostcaseswedon’tknow whatthe probabilitiesare Wecouldkeep acountineach record ofhowoftenithasbeenaccessed, reallocating the recordson
the basis of those counts; the formulas derivedabovesuggest thatthisprocedure
(N-0\>
(16)
Trang 11402 SEARCHING 6.1
somuch memoryspace to thecountfields,sincewe can makebetteruse ofthat
memorybyusingone of the nonsequential search techniques that are explained
laterin thischapter
Asimplescheme,whichhasbeeninuseformanyyearsalthoughitsorigin
auxiliarycountfields: Whenevera recordhasbeensuccessfully located,it
willtend tobe locatedfairlynear the beginning of thetable,when weneed them
with each search being completely independent of previous searches,itcan be
self-organizingfiletends to the limiting value
1+ 2JE=1(,?—1)pj=2Cjv—1.Infact,Cnisalwayslessthan7r/2times theoptimal valueCn[Chung, Hajela,and Seymour,J.Comp.Syst.Sci.36(1988),148-157];thisratioisthe best possible constantingeneral,sinceit isapproachedwhenpjisproportional to l/j2
.
Let us seehowwellthe self-organizingprocedureworkswhenthekey
prob-abilitiesobeyZipf’slaw(8).Wehave
byEqs 1.2.7-(8)and1.2.7-(3).Thisissubstantially betterthan|IV,whenN
isreasonablylarge,andit isonlyaboutIn4fa1.386times asmanycomparisons
thatthe self-organizingmethod workseven betterthanour formulas predict,because successive searches are not independent (small groups of keys tend tooccurinbunches)
This self-organizingscheme wasfirstanalyzedbyJohnMcCabe[Operations
Trang 12another interesting scheme,under whicheach successfully located key thatisnotalreadyatthebeginning of the tableissimply interchanged with the preceding
key,instead of beingmovedallthewaytothefront Heconjectured that thelimiting average search timefor thismethod, assumingindependent searches,
never exceeds(17).Several yearslater,RonaldL.Rivest provedin factthat thetranspositionmethodusesstrictlyfewer comparisonsthanthe move-to-front
probabilities are equal[CACM19(1976), 63-67],However,convergence to theasymptoticlimitismuchslowerthanforthe move-to-frontheuristic,so move-to-frontisbetter unless the processisprolonged[J.R Bitner,SICOMP8 (1979),82-110], Moreover,J.L.Bentley,C C.McGeoch,D D Sleator,andR.E,'Tarjan have proved that the move-to-frontmethodnevermakes more thanfourtimes the totalnumberofmemoryaccessesmadeby anyalgorithmonlinearlists,givenanysequence of accesseswhatevertothe data—evenifthe algorithm
thisproperty[CACM28(1985), 202-208,404-411], SeeSODA8 (1997), 53-62,
foraninteresting empirical study ofmore than40 heuristicsforself-organizing
stillanothertwist: Supposethe tablewearesearchingisstoredontape,and
the individual records have varying lengths For example,inanold-fashionedoperating system, the “system library tape”wassuch afile;standardsystem
were the records onthistape,and mostuser jobswouldstartbysearching
our previous analysis ofAlgorithmS inapplicable, since step S3 takes a variable
not the only criterion ofinterest
Let L, be the length of recordR{ ,andletptbe the probability thatthis
recordwillbe sought.Theaverage running time of the searchmethodwillnow
beapproximately proportional to
WhenLi= L2= • = LN =1,thisreduces to(3),the case already studied
ofthe tape; butthisissometimesabadidea!For example,assumethat the tapecontains justtwoprograms,AandB,whereAisneededtwice as often asBbut
it four times aslong.Thus,
N =2> PA= b LA=4, PB = b Lg =1.
IfweplaceAfirstontape, according to the“logical”principle stated above, theaverage running timeis
f-4+±-5= f;butifweusean“illogical” idea,placing
Bfirst,the average running timeisreducedto|l +|-5=ii
Trang 13404 SEARCHING 6.1
TheoremS Let Liandpibe as defined above.The arrangementof records
inthe tableisoptimalifandonlyif
'+Pi+i(Li+ 4-Li- +Lj+i)+Pi(Li+• +Li+1 )+•
a netchangeofPiLi+1-pi+iLi ThereforeifPi/Li <pl+1/Lt+1,suchan
interchangewillimprovetheaveragerunning time,andthe givenarrangement
isnot optimal.Itfollowsthat (20) holdsinanyoptimal arrangement.Conversely,assumethat (20) holds;we needtoprove that thearrangement
isoptimal The argumentjustgivenshowsthat thearrangementis“locally
optimal”inthe sense that adjacent interchangesmake no improvement;but there
mayconceivablybea long,complicated sequence of interchanges that leads to abetter “globaloptimum.”Weshallconsidertwoproofs,one that usescomputer
scienceand onethat usesamathematicaltrick
Firstproof Assumethat(20)holds Weknowthatany permutationoftherecordscanbesorted into the orderR\R2. Rnbyusingasequence ofinter-
changes of adjacent records.Eachoftheseinterchanges replaces. RjRi by. RiRj forsomei<j,soitdecreases the searchtimebythe nonnegative
search time
Pi(e)=Pi+ P -(e1+e+• +eN)/N,
(21)
willneverhave xipi(e)-| \-xNpN(e)=yiPi(e)-\ \-VnPn(e)unlessxx=yu ,xjv=Vn\inparticular,equalitywillnot holdin (20).Considernowthe
IV!permutations of the records; atleastone ofthemisoptimum, andweknow
thatitsatisfies (20).Butonlyone permutationsatisfies(20) because there are
ofrecordsinthe tableforthe probabilitiesPi(e),whenevereissufficientlysmall
Bycontinuity,thesame arrangement mustalsobeoptimumwheneissetequal
Trang 146.1 SEQUENTIALSEARCHING 405
(1956), 59-66 Theexercisesbelowcontain furtherresultsaboutoptimumfilearrangements
EXERCISES
1 [M20 Whenallthe search keysareequally probable,whatisthestandardation ofthenumberofcomparisonsmadeinasuccessful sequentialsearchthrough a
devi-tableofNrecords?
2 [15] Restate thestepsofAlgorithmS,usinglinked-memory notation insteadof
subscriptnotation.(IfP pointstoa recordinthetable,assume that KEY(P)isthekey,
INFO(P)isthe associated information, and LINK(P)isa pointertothe nextrecord
3 [16]Write aMIXprogramforthealgorithmof exercise2.Whatistherunningtimeofyour program,intermsofthequantitiesCandSin(l)?
4.[17] Does the ideaofAlgorithmQcarry overfromsubscriptnotationto
Butarethereany smallvalues ofCandSforwhichProgramQ'actuallytakesmore
time thanProgram Q?
6.[20] Addthreemoreinstructions toProgramQ',reducingitsrunning timeto
about (3.33C+constant)u
7.[M20] Evaluate the averagenumberofcomparisons,(3),usingthe “binary”
prob-abilitydistribution(5)
8.[HM22] Find an asymptoticseries for asnA00,whenr/1.
9.[HM28] Thetextobserves that theprobability distributionsgivenby(11), (13),
and(16)areroughly equivalentwhen0<6<1,and that themean numberof
comparisons using(13)is^_AT+ 0(Nl~e
).
)alsowhenthe
probabilities of (11)areused?
b)Whatabout(16)?
c)Howdo(11)and(16)compareto (13)when0<0?
10.[M20]Thebestarrangementofrecordsinasequential tableisspecifiedby(4);whatistheworstarrangement?Showthat the averagenumberofcomparisonsintheworstarrangement has a simplerelation totheaveragenumberofcomparisonsinthe
1+j
Trang 15withprobability pi.After thesystem has been running a longtime,showthat
Riwillbe themthitemfrom thefrontwithlimiting probabilityPiP(N-i)(m-i),
where thesetof variablesXis(px , ,pi_x,Pi+i,.
Pnn+Pn(n-l)+‘
+PnO=Provethat,consequently,
thefront ofthelist;then evaluateCN=J2iLiPi d
i-12.[MSS]Use(17)toevaluate the averagenumberofcomparisons neededtosearchtheself-organizingfilewhenthe search keys have the binaryprobability distribution
(5).
13.[M27]Use(17)toevaluateCnforthe wedge-shapedprobability distribution
(6).
14.[M21]Given two sequences(xi,x2 , x„)and(j/i,y2 , y)of realnumbers,
whatpermutationora2 anofthesubscripts willmake]T\ Xiyaiamaximum? What
permutationwillmakeitaminimum?
15.[M22]Thetextshowshowtoarrange programs optimally on a systemlibrarytape,whenonly one programisbeing sought But anothersetofassumptionsismore
appropriatefora subroutinelibrary tape,from whichwemaywishtoload varioussubroutinescalled for inauser’sprogram
Forthiscaseletussuppose that subroutinejisdesiredwithprobabilityPj,
independentlyofwhetherornot other subroutines aredesired Then,forexample,theprobabilitythatno subroutinesatallareneededis(1-Pi)(l— P2 ) (1-P N)\and theprobabilitythat the searchwillendjust afterloading the jth subroutineis
^ J(1—Pj+i)• (1~ Pn)• IfLjisthe lengthofsubroutinej,the average search time
willthereforebeessentiallyproportionalto
LiPi(l P2 ) (1—PN)+(Lx+L2)P2(1-P3 ) (1-PN) -| f (Lx-1 1-Ln)Pn.
assump-tions?
16 [M22] (H Riesel.)Weoftenneedto testwhetherornotngiven conditionsare
allsimultaneouslytrue (For example,wemaywantto testwhether both x>0and
y<z2
,andit isnot immediatelyclearwhich condition should betestedfirst.)Supposethat thetesting ofconditionj costs Tjunits of time,and that the conditionwillbe
truewithprobability
pj,independentoftheoutcomesofallthe otherconditions.In
Trang 166.1 SEQUENTIALSEARCHING 407
Fig.2.An“organ-pipearrangement”of probabilitiesminimizes the average seek time
ina catenatedsearch
17.[M23] (J.R Jackson.)Supposeyou havetodonjobs;thejth job takesT, units
oftime,andithas a deadline Dj In other words, the jth jobissupposedtobefinishedafteratmostDjunits oftime haveelapsed.Whatscheduleaia2 a„forprocessingthe jobswillminimize themaximumtardiness,namely
18
[M30] (Catenatedsearch.)Suppose thatNrecordsarelocatedinalineararray
Ri Rn,withprobability pjthat recordRjwillbe sought.Asearch processiscalled
“catenated”ifeach search begins where thelastoneleft off.Ifconsecutive searches
areindependent, the average time requiredwillbe ^2 1<itj<NPiPjd(i,j),whered(i,j)representstheamountoftimetodo a search thatstarts atpositioniand endsatpositionj.Thismodel can beapplied, forexample,to diskfileseektime,ifd(i,j)isthetime neededto travelfromcylinderito cylinder j
Theobjectof this exerciseisto characterizetheoptimumplacementofrecordsfor
catenatedsearches,wheneverd(i,j)isanincreasingfunctionof|—j|,thatis,whenever
wehaved(i,j)=d|j_j|ford\<d2< <djv-i.(The valueof isirrelevant.)Provethatin thiscasethe recordsareoptimallyplaced,amongallAT!permutations,ifandonlyifeither p\< Pn<P2<Pn-i<• <P[iv/2 j-i-iorPn<Pi<pjv-i< 2<
<P\N/2 ] • (Thus,an “organ-pipe arrangement”of probabilitiesisbest,asshown
inFig.2.) Hint: Consider any arrangement where therespective probabilities are
qi 92•-qksrk -r2nti tm,forsomem >0 and k>0;N =2fc+ m+1.Showthattherearrangementq[q'2 q'ksr'k r' 2r[ tmisbetter,whereq[=min(qitr,)andr'= max[q, , r,),exceptwhenq[=qtandr\=r,forall iorwhenq[=r\andr'=qt
andtj=0forall iandj.The sameholds truewhensisnot presentandN =2 +m
19.[M20]Continuingexercise 18,what are the optimal arrangementsforcatenatedsearcheswhenthe functiond(i,j)has theproperty thatd(i,j)+d(j,i)=c forall
i 7^ j ? [This situation occurs, forexample,on tapes without read-backwardscapability,
d(i,j)=a+b(Li+i- bLj)andd(j,i)=a+b(Lj+H bTjv)+r+fe(LiH bLt ),whereristherewindtime.]
[M28]Continuingexercise 18,what are the optimal arrangementsforcatenatedsearcheswhenthe functiond(i,j)ismin(d|,_j|,d„_|1_:J |),fordj<d2<•••?[Thissituation occurs, forexample,inatwo-waylinked circularlist,or inatwo-wayshift-
Trang 1721.[M28]Consideran n-dimensional cube whoseverticeshave coordinates(di,..,,d n)withdj=0or1;twovertices are calledadjacentiftheydifferinexactlyone coordinate.Suppose that asetof 2nnumbers x0<xx< <x2~-iistobe assignedtothe 2"
vertices insuch awaythatJ2i,j
\
xi~xj\ minimized,where thesumisoverall iandj
such thatXiandXjhavebeen assignedtoadjacentvertices.Prove thatthisminimumwillbe achievedif,forallj,Xjisassignedtothe vertexwhose coordinatesarethebinary representationof j
22.[20]Suppose you wanttosearchalargefile,notforequalitybutto findthe1000records thatare closest toa givenkey, inthe sense that these 1000 records have the
smallestvaluesofd(Kj,K)forsomegiven distance functiond.Whatdata structureismost appropriateforsuch asequentialsearch?
Nothing's sohard,but searchwillfinditout
Trang 186.2.SEARCHING BY COMPARISON OF KEYS
In THISSECTION weshalldiscuss searchmethodsthat arebasedona linearordering of the keys, such as alphabetic order ornumericorder.Aftercomparing
thegivenargumentA' toakeyAT,inthetable,the search continuesinthree
-,,or A">K, The
sequential searchmethodsofSection6.1wereessentiallylimited toatwo-way
decision(K =KiversusK ^Ki), butifwefreeourselvesfromthe restriction
of sequential accessweareable tomakeeffectiveuse ofanorderrelation
6.2.1.SearchinganOrdered Table
toldyouto findthenameofthepersonwhosenumberis795-6841? Thereis
nobetterwayto tacklethisproblem thanto usethe sequentialmethodsofSection6.1.(Well,you mighttry todialthenumber andtalk tothepersonwho
answers; oryou mightknowhowtoobtain a special directory thatissortedby
bythe party’sname,instead ofby number,although the telephone directorycontainsallthe information necessaryinbothcases Whena largefilemust
be searched, sequential scanningisalmost out of the question, butanorderingrelationsimplifiesthejob enormously
Withsomanysortingmethodsatour disposal (Chapter5),wewillhavelittledifficultyrearrangingafileintoorder so thatitmaybesearched conveniently
Ofcourse,ifwe needtosearch the table only once, a sequential searchwould
befasterthantodoacompletesort ofthefile;butifweneed tomakerepeatedsearchesinthesamefile,wearebetteroffhavingitinorder.Thereforein this
sectionweshallconcentrateon methodsthat are appropriateforsearching atablewhosekeyssatisfy
K\ < K2 <• < Kn,
KtoKiinsuch atable,wehave either
•K <Ki [Ri, R-i+i,• >Rnareeliminatedfromconsideration];
or • AT=Ki [thesearchisdone];
Ineach of these threecases,substantial progress hasbeen made,unlessi isnearoneof theends of thetable; thisiswhythe ordering leads toanefficient
algorithm
half ofthe table shouldbesearched next,andthesameprocedure canbeusedagain,comparingKtothemiddlekey of the selectedhalf, etc After atmost
Trang 19410 SEARCHING 6.2.1
SUCCESSFig.3.Binarysearch
thatit isnot present.This procedureissometimesknownas“logarithmic search”
or “bisection,”butit mostcommonlycalledbinarysearch
thealgorithmmakesuse oftwopointers,landu,that indicate the current lower
Algorithm B(Binarysearch) Givena table of records Ri,R2, ,Rnwhose
keys areinincreasing orderKi < K2< < K N,thisalgorithm searchesforagivenargumentK
Bl.[Initialize.]Setl<—1,u<— N
B2.[Get midpoint.](AtthispointweknowthatifKisinthetable,itsatisfies
Ki< K < Ku.Amoreprecisestatement of the situation appearsin
exer-cise 1below.)Ifu<l,the algorithm terminates unsuccessfully Otherwise,
seti<-[(l+u)/2j,theapproximate midpointofthe relevant tablearea.B3.[Compare.]IfK < Ki:go to B4;ifK >Ki, go to B5;andifAT=Ki, thealgorithm terminatessuccessfully
B4.[Adjustu.]Setu*r- i—1andreturn to B2
B5.[Adjustl
.]SetI«—i+1andreturn to B2 |
Figure 4illustratestwocasesofthisbinary search algorithm:firstto search
Trang 20061087 154 170[275 426503]509 512 612 653 677 703 765 897 908
061 087 154 170[275]426 503 509 512 612 653 677 703 765 897 908
061087 154 170275][426503 509 512 612 653 677 703 765 897 908
Fig.4.Examplesofbinarysearch
Program B(Binarysearch) AsintheprogramsofSection6.1,we assume
here thatKiisafull-wordkey appearinginlocationKEY+i.Thefollowingcodeusesrll l,rI2:=u,rI3=i.
rightbinary1,”whichislegitimateonlyonbinary versions of MIX;forgeneralbytesize,thisinstructionshouldbereplacedby “MUL =l//2+l=”,increasing therunning time to (26C -185+20)u
Trang 21Fig.5.Acomparisontreethat correspondstobinary searchwhenN =16.
representedbythe rootnode@inthefigure.ThenifK <Kg,thealgorithm
follows theleftsubtree,comparingKtoK^\ similarlyifK >Kg,the rightsubtreeisused Anunsuccessful searchwilllead tooneofthe external squarenodesnumbered[o]through[77] ;forexample,wereachnodeIT]ifandonlyif
Inananalogous fashion,anyalgorithmforsearchinganordered table oflengthNbymeansofcomparisonscan berepresented asanIV-node binary tree
validmethodforsearchinganorderedtable;wesimply label the nodes
0 0 0 © 0 B ® ® (i)
Ifthe searchargumentinput toAlgorithmBisK w,the algorithmmakesthecomparisonsK >Kg,K < K\2,K =Kiq.This corresponds to thepathfrom
the root to(to)in Fig 5.Similarly,the behavior ofAlgorithmBonother keyscorresponds to the other paths leadingfromthe root of thetree.The methodofconstructing the binary trees corresponding toAlgorithmBthereforemakesiteasy to prove the followingresultbyinductionon N:
TheoremB.If 2k 1
< N <2,a successful search usingAlgorithmBrequires
Trang 22kcomparisons;andif2 1< N <2 —1,an unsuccessful search requires either
k—1ork comparisons |
equallylikelyargument;andletC'Nbetheaveragenumberofcomparisonsin
anunsuccessful search,assumingthateach of theN +1intervalsbetweenand
outside theextremevalues of the keysisequallylikely.Then wehave
Cn —1+internalpathNlength of tree C'N =externalpathlength of tree
This formula,whichisduetoT.N.Hibbard[JACM9 (1962), 16-17], holds
for allsearchmethodsthatcorrespond to binarytrees; inotherwords,itholds
successful-searchcomparisonscanalsobeexpressedinterms of the correspondingvarianceforunsuccessful searches(seeexercise25)
Fromthe formulasabovewe canseethatthe“best”waytosearchby
treeswithNinternalnodes.Fortunatelyitcanbe proved thatAlgorithmBis
binary tree hasminimumpathlengthifandonlyifitsexternalnodesalloccur
treecorresponding toAlgorithmBis
(IV+l)([lgATj+2) -2LlgivJ+1
(SeeEq.5.3.1-(34).) Fromthisformulaand(2)we can computetheexactaveragenumberofcomparisons,assumingthatallsearchargumentsareequallyprobable
Trang 23414 SEARCHING 6.2.1
searchmethodbasedoncomparisons candobetterthanthis Theaveragerunning time ofProgramBisapproximately
(18IgA —16)w fora successful search,
(18lgN +12)u foranunsuccessful search, ^
search,it istemptingto use only two,namelythe current positionianditsrate
ofchange,5;aftereach unequal comparison,wecould thenseti<-i±<5and
&t <5/2(approximately) It possible todothis,but onlyifextremecare
ispaid to thedetails,asinthe following algorithm Simpler approaches are
Algorithm U(Uniformbinary search).Givena table of recordsRi,R2, ,Rn
fora givenargumentK.IfNiseven, the algorithmwillsometimesrefertoa
dummykeyKqthat should besetto—oo(oranyvaluelessthan K).Weassume
thatN >1.
Ul.[Initialize.]Set* \N/2~\,m•<-\_N/2\
U2.[Compare.]IfK <Ki, go to U3;ifK >Ki, go to U4;andifK = K{ ,thealgorithm terminatessuccessfully
U3.[Decreasei] (Wehave pinpointed the search toaninterval that containseithermorm-1records;ipoints just to the right ofthis interval.) Ifm =0,
the algorithm terminates unsuccessfully Otherwiseset* i-[m/2];then
U4.[Increasei.\ (Wehave pinpointed the search toaninterval that containseithermorm-1records;ipoints just to theleftofthis interval.) Ifm =0,
the algorithm terminates unsuccessfully Otherwiseset* -f- *+[m/2]; then
setm<—[m/2j andreturn to U2 |
Figure 6showsthe corresponding binary treeforthe search,whenN =10
Inanunsuccessful search, the algorithmmaymakearedundant comparisonjustbefore termination; those nodes areshadedinthefigure.We maycallthe searchprocessuniformbecause the differencebetweenthenumberofanode onlevell
onlevell.
Thetheory underlyingAlgorithmUcan be understoodasfollows:Suppose
thatwehaveaninterval of lengthn—1to search; acomparisonwith the middleelement(forneven) or with one of thetwo middleelements(fornodd) leaves uswithtwointervals of lengths [n/2\-1and[n/2]-1.After repeatingthisprocess
k times,weobtain 2k
intervals,ofwhichthe smallest has length [n/2fc
J-1and
Trang 246.2.1 SEARCHINGAN ORDERED TABLE 415
Fig.6.Thecomparisontree fora “uniform” binarysearch,whenN =10
“middle” element, without keeping track of the exact lengths
Theprincipaladvantage ofAlgorithmUisthatwe neednotmaintain thevalue ofmatall;we needonlyrefertoa short table of the various(5to useat
Algorithm C(Uniformbinary search) This algorithmisjustlikeAlgorithmU,butitusesanauxiliarytableinplace of the calculations involvingm Thetableentriesare
Cl.[Initialize.]Seti«—DELTA[1],j <—2
C2.[Compare.]IfA <A;,go to C3;ifA >Ki, go to C4;andifA =A*, thealgorithm terminatessuccessfully
C3.[Decrease*.] IfDELTA[j]=0,the algorithm terminates unsuccessfully.Otherwise,seti<—i—DELTA
[j],j <— j+1,andgo to C2
C4.[Increasei.] IfDELTA[j]=0,the algorithm terminates unsuccessfully.Otherwise,seti<—i+DELTA[j],j f— j+1,andgo to C2 |Exercise 8 proves thatthisalgorithmreferstotheartificialkeyA0= —ooonlywhenNiseven
Program C(Uniformbinary search) Thisprogramdoes thesamejob as
Trang 25Ina successful search,thisalgorithm corresponds to a binary tree with the
sameinternalpathlength as the tree ofAlgorithmB, so the averagenumberofcomparisonsC isthesameas before.Inanunsuccessful search,AlgorithmC
alwaysmakesexactly|_lgN\ +1comparisons Thetotalrunning time of
Pro-gramCisnot quite symmetricalbetweenleftandright branches, sinceClisweightedmoreheavilythan C2,but exercise 11showsthatwehaveK <K,
roughly as often asK > K^henceProgramCtakes approximately
(8.5|_lgN\ +12)« foranunsuccessful search ^
bestillfasteronsomecomputers, becauseit isuniformafterthefirststep,and
itrequiresnotable ThefirststepistocompareKwithKt ,wherei=2
k=[lgN\ IfK <Ki,weuseauniformsearchwith theS'sequal to 2k~\
2~2
, 1, Onthe other hand,ifK >Ki weresetitoi'= N +1—2l
,
K >Ki1 usinga uniform search with the<P sequal to 2(_1
,2l~2
algorithms,itnevermakes more than|_lg7VJ+1comparisons; henceitmakes
inspiteof thefactthatitoccasionally goesthroughseveralredundantstepsin
Trang 26Stillanother modification of binary search,whichincreasesthespeed ofall
alsoexercise24, foramethodthatisfaster yet
occursinsearching,whereFibonaccinumbersprovide uswithanalternativetobinary search.Theresultingmethodispreferableonsomecomputers,becauseitinvolvesonly additionandsubtraction,not divisionby2.Theprocedureweare
called“Fibonacci search,”whichisused to locate themaximumofaunimodal
function[seeFibonacci Quarterly4(1966),265-269]; the similarity ofnames
has led tosomeconfusion
TheFibonaccian search technique looks very mysteriousat firstglance,if
wesimply take theprogramandtry toexplainwhatishappening;itseemsto
treeisdisplayed.Thereforeweshallbegin ourstudy of themethod bylooking
atFibonaccitrees
Figure 8showstheFibonacci tree of order6.Itlookssomewhat morelike
areal-lifeshrubthanthe other treeswehavebeenconsidering,perhaps because
manynatural processessatisfya Fibonacci law In general, the Fibonacci tree oforderkhasF^+i—1internal(circular)nodesandFfc+iexternal (square)nodes,
andit constructed asfollows:
Ifk=0 or k=1,the.treeissimply[~0~|
.
Ifk>2,the rootisF^;theleftsubtreeistheFibonacci tree of order k—1;
andthe rightsubtreeistheFibonacci tree of order k—2withallnumbers
increasedbyF^
Trang 27isaFibonaccinumber For example, 5=8— F4and11=8+ F4inFig.8.Whenthe differenceisFj,thecorresponding Fibonacci differenceforthenext
3=5— F3while 10=11— F2
recog-nizingthe external nodes,wearrive atthe followingmethod:
Algorithm F(Fibonaccian search) Givena table of records Ri,J?2 , • Rn
fora givenargumentK
For convenienceindescription,we assumethatN +1 isa perfect Fibonacci
suitableinitializationisprovided(seeexercise14)
FI.[Initialize.]Set*<— Fk,p<—Fk-1 ,q «—Ffc_ 2.(Throughoutthe algorithm,
F2.[Compare.]IfK < Kt ,go to step F3;ifK >Ki, go to F4;andifK =Ki,the algorithm terminatessuccessfully
F3.[Decrease*.]Ifq=0,thealgorithm terminates unsuccessfully Otherwise
seti4-i—q,andset(p,q) (q,p—q); then return to F2
F4.[Increase*.] Ifp=1,thealgorithm terminates unsuccessfully Otherwise
seti«—* -(-q,pi—p—q,then q <— q—p,andreturn to F2 |
ThefollowingMIX implementationgainsspeedbymaking twocopies of theinner loop,oneinwhich pisinrI2andqin rI3,andoneinwhichtheregistersarereversed;thissimplifiesstepF3.Infact,theprogramactuallykeepsp-1and
q-1intheregisters,instead ofp andq,inorder to simplify thetest“p—1?”
instep F4
Program F(Fibonaccian search) Wefollowtheprevious conventions, with
rA=K,rll=i,(rI2orrI3)= P~'1,(rI3orrI2)=q—1.
14 F3A DEC11,3 Cl F3.Decreasei i<—i—a.
15 DEC21.3 Cl p^p-q.
Trang 28(Lines18-29are parallel to06-17.)
Therunning time ofthisprogramisanalyzedinexercise18.Figure 8 shows,
andthe analysis proves, that aleftbranchistakensomewhat moreoftenthanarightbranch LetC,Cl,and (C2—S)bethe respectivenumberoftimes stepsF2,F3,andF4areperformed.Then wehave
C (ave <j>k/y/5+0(1), maxk—1),
C2 — S —(ave<\>~ x
k/\J5+0(1), max|_fc/2j).
interval intotwoparts,with theleftpartabout<f>times as large as theright).Thetotalaveragerunning time ofProgramFthereforecomestoapproximately
§((18+44>)k+31-26<p)u»(7.050lgN +1.08)u (9)
fora successful search, plus(9—3<j>)ufa4.15uforanunsuccessful search.Thisis
isslightlyslower
people actually carry out a search Sometimeseverydaylifeprovides uswithcluesthat lead togoodalgorithms
beginbylookingfirstatthemiddle page, then lookingatthe 1/4 or 3/4 point,
etc.,as abinary search.It’sevenlesslikelythatyouuseaFibonaccian search!
front ofthe dictionary Infact,manydictionarieshavethumbindexes thatshow
the startingpageorthemiddle pageforthewordsbeginning with a fixedletter
speedupthe search; such algorithms are exploredinSection6.3
Yet evenaftertheinitialpoint of search hasbeenfound,your actionsstill
wordisalphabeticallymuchgreaterthanthewords onthepagebeingexamined,
Trang 29420 SEARCHING 6.2.1
Thisisquitedifferentfromthe algorithms above,whichmake nodistinction
Suchconsiderations suggestanalgorithm thatmight becalled interpolationsearch:WhenweknowthatKliesbetweenKtandKu,wecan choose the next
that the keys arenumeric andthatthey increaseinaroughly constantmanner
throughout theinterval
Interpolation searchisasymptotically superior to binary search.Onestep ofbinary search essentially reduces theamountofuncertaintyfromnto
|n,whileone step of interpolation search essentially reducesittoi/n,whenthekeysinthetablearerandomlydistributed.Henceinterpolation search takesaboutlg lgNsteps,onthe average, to reduce the uncertaintyfromNto2.(See exercise22.)
does not decrease thenumberofcomparisons enoughtocompensatefortheextracomputingtime involved, unless the tableisratherlarge Typicalfiles
aren’tsufficientlyrandom, andthe differencebetweenlglgNandlgNisnotsubstantial unlessNexceeds,say,216=65,536.Interpolationismostsuccessful
inthe early stages of searching a large possibly externalfile;aftertherange has
dictionarylookup byhandisessentiallyanexternal, notaninternal,search.We
shalldiscuss external searchinglater.)
thatwassorted into order tofacilitatesearchingistheremarkableBabylonian
reciprocal table of Inakibit-Anu, datingfrom about200 B.C This clay tabletcontainsmore than100 pairs of values,which appeartobethe beginning of
alistofapproximately 500 multiple-precision sexagesimalnumbers andtheirreciprocals, sorted into lexicographic order.For example, thelistincluded thefollowing sequence ofentries:
puter Science(CambridgeUniv Press, 1996),Chapter11, forfurtherdetails.]
It fairlynatural tosortnumerical values into order, butanorder relation
sequenceforindividualletterswaspresent alreadyinthemostancient
alphabetic sequence, thefirstverse startingwith aleph, the second with beth,
wasusedbySemiticand Greekpeoples to denote numerals;forexample, a, 7
Trang 306.2.1 SEARCHING AN ORDERED TABLE 421
Theuse of alphabetic orderforentirewords seemstobe amuchlater
invention;it somethingwe mightthinkisobvious, yetithas tobetaught
to children,andatsomepointinhistoryitwasnecessary to teachitto adults.Severallistsfrom about300 B.C havebeen found ontheAegeanIslands,giving
but onlybythefirstletter,thusrepresentingonly thefirstpass ofa
left-to-rightradixsort SomeGreekpapyrifromthe years A.D.134-135 containfragments of ledgers thatshowthenamesoftaxpayers alphabetizedbythefirsttwoletters.Apollonius Sophista used alphabetic orderonthefirsttwoletters,andoftenonsubsequentletters,inhislengthyconcordance ofHomer’spoetry
notably Galen’s Hippocratic Glosses(c.200), but they are veryrare.WordswerearrangedbytheirfirstletteronlyintheEtymologiarumofSt.Isidorus(c.630,
word.Thelattertwo workswere perhaps the largestnonnumericalfilesofdata
tobe compiled during theMiddleAges
description of true alphabetical order Inhispreface,Giovanni explained that
amo precedes bibo
abeo precedes adeo
polisintheton precedes polissenus
(thereby givingexamplesof situationsinwhichthe orderingisdetermined bythe
effortwasrequired to devise theserules.“Ibegof you, therefore,goodreader,
donot scornthisgreatlabor ofmine andthisorder assomethingworthless.”
Adetailedstudy of thedevelopmentofalphabetic order,uptothetimeprintingwasinvented,hasbeenmadebyLloydW.Daly
[CollectionLatomus
90(1967),100 pages] Hefoundsomeinteresting oldmanuscripts that wereevidentlyused as worksheets while sortingwords bytheirfirstletters(seepages
(Lon-don, 1604), contains the following instructions:
Noweifthe word,which thouartdesirous tofinde,beginne with(a)thenlookeinthebeginning ofthisTable,butifwith(v)looketowards the end.Againe,ifthywordbeginnewith(ca)lookeinthebeginning of theletter(c)butifwith (cu) then looketowardtheendofthatletter.Andso ofall
therest.&c
hisdictionary;numerousmisplacedwords appear onthefirstfew pages, but the
Trang 31422 SEARCHING 6.2.1
Techniques for the Design of Electronic DigitalComputers,editedbyG.W.terson, 1 (1946),9.7-9.8;3 (1946), 22.8-22.9],The method becamewellknown
176(1955), 565; A.I.Dumey, Computers and Automation5(December1956),7,
(February 1958),1-3.]
D H.Lehmer[Proc.Symp.Appl.Math 10(1960), 180-181]wasapparently
stepwastakenbyH.Bottenbruch[JACM9 (1962),214],whopresentedan
interesting variation ofAlgorithmBthat avoids a separatetest forequalityuntil
thevery end:Using
it—|"(f+u)/2]
instead ofi<—[(/+u) / 2JinstepB2, hesetl4—iwheneverK > Kpthen
u-ldecreasesateverystep.Eventually,whenl=u,wehaveKt< K <Ki +j,
comparison.(HeassumedthatK > Kxinitially.)This idea speedsupthe inner
ofthe algorithmswehave discussedin this section;but a successful searchwill
requireaboutonemoreiteration,onthe average, because of(2).Since the inner
andafasterloop does not save time unlessnisextremelylarge.(See exercise23.)
Onthe otherhandBottenbruch’s algorithmwillfindtherightmost occurrence of
a givenkeywhenthe table contains duplicates,andthispropertyisoccasionallyimportant
K E Iverson [AProgramming Language(Wiley, 1962),141]gave the
proce-dure ofAlgorithmB,butwithout considering thepossibilityofanunsuccessfulsearch D E.Knuth[CACM6 (1963), 556-558] presentedAlgorithmBas
search,AlgorithmC,wassuggested to the authorbyA.K.ChandraofStanfordUniversityin1971
Fibonaccian searchingwasinventedbyDavidE.Ferguson[CACM3 (1960),
AFibonacci tree without labelswasalsoexhibited as a curiosityinthefirst
edition ofHugoSteinhaus’s popularbook Mathematical Snapshots (NewYork:Stechert,1938),page28;hedrewitupsidedownandmadeitlooklikeareal
Interpolation searchingwassuggestedbyW W.Peterson[IBMJ.Res.&
Devel 1(1957),131-132], Acorrect analysis ofitsaverage behaviorwasnot
Trang 321.[21]Prove thatifu linstepB2ofthe binarysearch,wehaveu=l—1and
theseartificialkeysareneverreallyused by the algorithmsothey need not be present
inthe actualtable.)
(a)changed stepB5to“Zf- i”insteadof“Z -f- i+1”?(b)changedstepB4to i”insteadof «—i—1”?(c)madebothof thesechanges?
3.[15]Whatsearchingmethodcorrespondstothetree
Whatisthe averagenumberofcomparisonsmadeinasuccessfulsearch?unsuccessful search?
has lengthlessthansomejudiciouslychosenvalue.WriteanefficientMIXprogramfor
such a search and determine the best changeovervalue
7
a)bothiandmare setequalto|_iV/2J ?
b)bothiandmare setequalto\N/ 2]?
[Hint:Suppose thefirststepwere“Seti<-0,m4-N(orN +1),gotoU4.”]
8 [M20]Let5j=DELTA[j]be the jth incrementinAlgorithm C,asdefined in (6)
b)Whatare theminimumandmaximumvalues ofithatcan occurinstepC2?
9 [20]Isthereany valueofN> 1forwhich AlgorithmBandCareexactly
equivalent, inthe sense that theywillboth perform thesamesequenceofcomparisons
forallsearcharguments?
10 [21] ExplainhowtowriteaMIX programforAlgorithmCcontaining imately 7lgNinstructionsand having a running timeofabout4.5 lgNunits
approx-11 [M26]Find exact formulasfortheaverage valuesofCl, C2,andAinthe
fre-quencyanalysis ofProgramC, a functionofNandS
12.[20] Drawthebinary searchtreecorrespondingtoShar’smethod whenN =12
13 [M24] Tabulate the averagenumberofcomparisonsmadeby Shar’s method,for
1< N<16,consideringbothsuccessfuland unsuccessfulsearches
14.[21] Explainhowtoextend AlgorithmFsothatitwillapplyforallN >1.
15.[M19] Forwhatvalues ofk does the Fibonaccitreeoforder kdefinean optimal
Trang 33424 SEARCHING 6.2.1
16 [21]Figure 9shows thelinealchart oftherabbits inFibonacci’soriginalrabbitproblem(seeSection1.2.8) Istherea simplerelationshipbetweenthisand theFibonaccitreediscussedinthetext?
wherer>1,a3>a,j++2 for1<j<r,and aT>2.Prove thatintheFibonaccitree
oforderk,thepath from the roottonode (n) has length k+1-r-ar
18.[M30]Find exact formulasfortheaverage valuesofCl,C2,andAinthequencyanalysis ofProgramF, asa functionof k,Fk ,Fk+1 ,andS
fre-19.[M42]Carry out adetailed analysis ofthe averagerunning timeofthealgorithmsuggestedinexercise 14
20.[M22]The numberofcomparisons requiredina binary searchisapproximately
log2N,andintheFibonaccian searchit roughly(<f>/\/5)log
0N.Thepurposeofthisexerciseistoshowthat these formulas arespecialcases ofamoregeneralresult
Let p andqbepositivenumbers with p+q=1.Consider a search algorithmthat,
given atable ofNnumbersinincreasing order, startsby comparing the argument withthe(plV)th key,anditerates thisprocedure on the smallerblocks.(The binary searchhas p=q=1/2;theFibonaccian search hasp=1/</>,q=l//>2
factthatpNandqNaren’texactlyintegers
a)ShowthatC(N)=logbNsatisfiesthese relations exactly, foracertainchoiceofb.For binaryand Fibonacciansearch, thisvalueof bagreeswith the formulas derived
earlier
b)Consider thefollowingargument: “Withprobability p,thesizeoftheinterval
beingscannedin thisalgorithmisdividedby1/p;withprobabilityq,theintervalsizeisdividedby1/q.Therefore theintervalisdividedby p•(1/p)+q (1 /q)=2
on theaverage, sothealgorithmisexactlyasgoodasthe binarysearch, regardless
Trang 346.2.1 SEARCHING AN ORDERED TABLE 425
21.[20]Drawthebinarytreecorrespondingto interpolationsearchwhenN =10
22.[M41](A.C.YaoandF.F.Yao.) Showthatan appropriate formulationofinterpolationsearchrequiresasymptoticallylg lgNcomparisons,on theaverage,when
appliedtoNindependent uniformrandomkeys thathavebeensorted.Furthermoreallsearchalgorithms on suchtablesmustmakeasymptoticallylg lgNcomparisons,ontheaverage
23.[25]Thebinary search algorithmofH Bottenbruch,mentionedatthecloseofthis section,avoidstesting for equality untiltheveryendofthesearch.(During thealgorithmweknowthatKi<K<Ku+i,and the caseof equalityisnotexamined
untill=u.)Such atrickwouldmake ProgramBrun alittlebitfaster for largeN,
sincethe “JE”instructioncouldberemoved from theinner loop.(However, the ideawouldn’treallybepractical since lgNisalways rathersmall;wewould needN >266
inordertocompensateforthe extrawork necessary on asuccessful search,because therunning time(18\gN -I6)uof(5) is“decreased”to (17.5lgiV+17)it!)Showthat every search algorithm correspondingtoa binarytreecanbe adaptedto
a search algorithm that uses two-way branching(<versus>)attheinternalnodesof
thetree,inplace ofthethree-way branching(<,=,or>)usedinthetext’sdiscussion
In particular,showhowtomodify AlgorithmCinthisway
24. [23] Wehave seeninSections2 3.4.5and5.2.3that thecomplete binarytreeis
a convenientwaytorepresent aminimum-path-lengthtree inconsecutivelocations
Deviseanefficientsearchmethodbasedonthisrepresentation.[Hint: Isitpossible to
usemultiplicationby2insteadof divisionby2 ina binarysearch?]
25. [M25]Suppose that a binarytreehasa*,internalnodesandbkexternalnodes
(fflo, ffli, • 05)=(1,2, 4,4,1,0)and(60,61, ,65)=(0,0,0,4,7, 2).
a)Showthata simplealgebraic relationshipholdsbetween the generating functionsA(z)= HkakzkandB(Z)= Hk&***•
b)Theprobability distribution forasuccessfulsearchinabinarytreehas the erating functiong(z)=zA(z) /N,andforan unsuccessful search the generatingfunctionish(z)=B(z)/(N+1). (Thusinthetext’snotationwehaveCn =
gen-mean(g),C'N=mean(A;,),and Eq.(2)givesarelationbetween thesequantities.)
Find arelationbetween var(g)and var(h)
26 [22] ShowthatFibonaccitreesare related topolyphasemergesortingon three
Prove that such a process must always takeat leastapproximatelylogfc+1Nsteps
on theaverage, asN—>00,assuming that each keyofthetableisequallylikelyasasearchargument (Hence thepotential increase inspeed over1-processorbinary search
isonly afactor oflg(fc+1),not thefactorofkwemightexpect In thissenseit ismoreefficientto assigneach processortoadifferent,independent search problem, insteadof
Trang 3528.[M23] DefineThuetreesTn bymeansofalgebraicexpressionsina binary
opera-tor * as follows:T0 (x)=x* x,Tx(x)=x,Tn+2(x)=T„ +1(x) *Tn (x)
outinfull.ExpressthisnumberintermsofFibonacci numbers
b)Prove thatifthebinary operator* satisfiestheaxiom
((x * x) * x) * ((x * x) *x)=X,
thenI'm(Tn(x))= Tm+n_i(x)forallm>0andn>1.
29.[22](PaulFeldman,1985.) Insteadofassuming thatK\ <K2< < Kn,
assume only thatKp( X )<Kp( 2 )<• <KP^N)where the permutation p(l)p(2). p(N)
isaninvolution,andp(j)=j foralleven valuesofj.Showthatwecanlocateany givenkeyK,ordetermine thatKisnotpresent,bymakingatmost 2[lglVJ+1comparisons
30 [27] (Involutioncoding.) Using the ideaofthe previousexercise, findawayto
arrangeNdistinctkeysinsuch awaythattheir relativeorderimplicitlyencodes an
arbitrarilygiven arrayoft-bitnumbersxi,x2 , ,xm,whenm < N/4+1-2*.
Withyourarrangementitshouldbepossible todetermine the leading kbitsof Xjby
<2[lgN]+1comparisons (Thisresultisusedintheoretical studies ofdatastructures
thatareasymptoticallyefficientinboth time andspace.)
6.2.2.BinaryTreeSearching
In the precedingsection,welearned thatanimplicitbinary tree structuremakes
thebehavior of binary searchandFibonaccian search easier to understand For agiven value ofN,the treecorresponding to binary search achieves the theoretical
minimumnumberofcomparisons that are necessary to search a tablebymeans
ofkey comparisons.Butthemethodsofthe preceding section are appropriate
insertionsanddeletions rather expensive.Ifthe tableischanging dynamically,
Theuse ofanexplicitbinary tree structuremakesitpossible toinsertand
deleterecords quickly, as well as to search the tableefficiently.Asaresult,we
essentiallyhave amethodthatisusefulbothforsearchingandfor sorting.Thisgainin flexibilityisachievedbyaddingtwolinkfieldstoeach record of thetable
Techniquesforsearching a growing table are often called symboltable
algo-rithms,because assemblersandcompilersandothersystemroutines generally
each record within a compilermight beasymbolicidentifierdenoting a variable
insomeFORTRANorCprogram,andtherestofthe recordmightcontaininformationaboutthe type of that variableanditsstorage allocation.Orthekey
equivalent of that symbol.Thetreesearchandinsertion routines tobe described
in thissection are quiteefficient foruse assymboltable algorithms, especiallyin
applicationswhereit isdesirable to print out alistofthesymbolsinalphabeticorder.Other symboltablealgorithms are describedinSections 6.3and6.4
Figure 10showsa binary search tree containing thenamesofeleven signs of
Trang 36) |4| LEO (^SCORPIO^) VIRG0~~^)
Fig 10.Abinary searchtree
root orapexofthetree,wefindit isgreaterthan CAPRICORN,sowemovetothe
right;it isgreaterthanPISCES, sowemoverightagain;it lessthanTAURUS, so
wemoveleft;andit islessthanSCORPIO, sowearriveatexternalnode[~8~|
VIRGO, LIBRA, SCORPIO,inthisorder
All ofthekeysintheleftsubtree of the rootinFig.10 are alphabetically
Asimilarstatement holdsfortheleftandrightsubtrees of every node.Itfollowsthat the keysappearin strictalphabetic sequencefromlefttoright,
ifwetraversethe treeinsymmetricorder(seeSection2.3.1),sincesymmetric
orderisbasedontraversing theleftsubtree of eachnodejustbefore that node,then traversing the right subtree
Thefollowing algorithmspellsout the searchingandinsertion processesindetail
Algorithm T(Treesearchandinsertion).Givena table of records thatformabinary tree as described above,thisalgorithm searchesfora givenargumentK
IfKisnotinthetable,anew nodecontainingKisinserted into the treeinthe
Trang 37Thenodes of the tree areassumedtocontainat leastthe followingfields:
Null subtrees (the external nodesinFig 10) are representedbythenullpointer A
ThevariableROOTpoints to the root of thetree.For convenience,we assume
that the treeisnotempty(thatis,ROOT7^A), sincethe necessary operationsaretrivialwhen ROOT=A
Tl.[Initialize.]SetP4—ROOT.(Thepointer variable Pwillmovedownthetree.)T2.[Compare.] IfK <KEY(P),go to T3;ifK >KEY(P),go to T4;andif
K =KEY(P), the search terminatessuccessfully
Otherwise go to T5
shouldalsobeinitialized.)IfKwaslessthanKEY(P),setLLINK(P)4—Q,otherwisesetRLINK(P)4—Q (AtthispointwecouldsetP4—Qand
terminate the algorithmsuccessfully.) |
Fig 11 Tree searchandinsertion
This algorithm lendsitselftoa convenientmachinelanguage
KEY
followedperhapsbyadditionalwordsofINFO.Using anAVAILlistforthefree
Trang 386.2.2 BINARYTREESEARCHING 429
Program T(Treesearchandinsertion) rA=K.rll=P,rI2=Q
12 LD2 0,1 (LLINK) Cl - S T3.Moveleft.0<- LLINK(P)
23 1H ST2 0,1 (LLINK) 1-S-A LLINK(P)<-Q.
where
5 =[searchissuccessful]
OntheaveragewehaveCl =|(C+5), sinceCl + C2 = CandCl - 5hasthesameprobability distribution asC2;sotherunning timeisabout(7.5C—
animplicit tree(seeProgram6.2.1C).Byduplicatingthecode asinProgram6.2.IFwecouldeffectivelyeliminateline08 ofProgramT,reducing the runningtime to (6.5C—2.55+5)u.Ifthe searchisunsuccessful,the insertionphase of
vari-able-length records.Forexample,ifweallocatethe availablespace sequentially,
inalast-in-first-outmanner,we caneasilycreatenodes of varyingsize;thefirstwordof(l)could indicate thesize Sincethisisanefficientuse of storage,
Trang 39430 SEARCHING 6.2.2
But whatabout the worst case? Programmersare often skeptical of rithmTwhentheyfirstseeit. Ifthekeys of Fig 10had beenentered intothe treeinalphabetic orderAQUARIUS,. VIRGOinstead of the calendar order
essentiallyspecifiesa sequential search AllLLINKs would benull.Similarly,ifthekeyscomeintheuncommonorder
weobtain a “zigzag” tree thatisjust asbad.(Tryit!)
Onthe other hand, the particular treein Fig.10 requires only 3-q-
com-parisons,onthe average,fora successful search;thisisjustalittlehigherthan
theminimumpossibleaveragenumberofcomparisons,3,achievableinthe bestpossiblebinarytree
Whenwehave afairlybalancedtree,the search timeisroughly tionalto logN,butwhen wehave a degeneratetree,the search timeisroughlyproportional toN.Exercise2 3 4.5-5 proves that the average search timewould
propor-be roughly proportional toy/Nifweconsidered eachW-nodebinary tree tobe
equallylikely.Whatbehaviorcanwereallyexpectfrom AlgorithmT?
Fortunately,itturns out that tree searchwillrequireonlyabout2lnN &
1.386IgNcomparisons,ifthe keys are inserted into the treeinrandomorder;well-balancedtreesarecommon, anddegenerate trees are veryrare
theAT!possible orderings of theNkeysisanequallylikelysequence of insertions
forbuilding thetree.Thenumberofcomparisonsneededto finda keyisexactly
entered into thetree ThereforeifCjvistheaveragenumberofcomparisonsinvolvedina successful searchandC'Nisthe averagenumberinanunsuccessfulsearch,wehave
Butthe relationbetweeninternalandexternalpathlengthtellsus that
thisisEq 6.2.1-( 2).Putting(3)togetherwith(2
Trang 40SinceCq=0, thismeansthat
C'N — 2H n+i-2.
(5)
Exercises6, and8belowgivemoredetailed information;it ispossible to
values
Treeinsertion sorting AlgorithmTwasdevelopedforsearching,butitcanalsobe used as the basis ofaninternal sorting algorithm;in fact,we can view
itasa natural generalization oflistinsertion,Algorithm5.2.1L.Whenproperly
bestalgorithmswediscussedinChapter5.After the tree hasbeenconstructed
for allkeys,asymmetrictree traversal(Algorithm2.3.IT)will visitthe records
insorted order
Afew precautions are necessary, however.Somethingdifferentneeds tobe
solutionisto treatK =KEY(P)exactly asifK >KEY(P);thisleads to a stablesortingmethod.(Equal keyswillnot necessarilybeadjacentinthetree;theywill
onlybe adjacentinsymmetricorder.)Butifmanyduplicate keys are present,
thesamekey;thisrequiresanother linkfield,butitwillmakethe sortingfasterwhenalotofequal keys occur
Thusifweareinterested onlyin sorting,notinsearching,AlgorithmTisn’t
the best, butitisn’tbad.Andifwehaveanapplication thatcombinessearching
It interesting to note that thereisa strong relationbetweenthe analysis
of tree insertion sortingandthe analysis of quicksort, although themethods
are superficiallydissimilar IfwesuccessivelyinsertNkeys intoaninitially
every key getscomparedwithKi, andthen every keylessthanKxgetscompared
with thefirstkeylessthan Ki,etc.;inquicksort,everykey getscomparedto
toa particular elementlessthanK,etc Theaveragenumberofcomparisons
a fewmorecomparisons,inorder tospeedupthe innerloops.)
entriesitknows.Wecan easily delete anodeinwhicheitherLLINKorRLINK=A;