Indeed,Ibelievethat virtually every important aspect ofprogrammingarisessomewhereinthe context of sorting or searching!. ThisvolumecomprisesChapters5and6 of thecompleteseries.Chapter5 is
Trang 1the classic work
Trang 2Third Edition (0-201-89683-4)Thisfirstvolumebeginswithbasicprogrammingconcepts and techniques, then focuses oninformation structures—the representation
of informationinside acomputer, thestructuralrelationshipsbetweendata elements andhow
todealwiththemefficiently.Elementaryapplicationsaregiventosimulation,numericalmethods, symbolic computing, software andsystemdesign
Volume2/ Seminumerical AlgorithmsThird Edition (0-201-89684-2)
Thesecondvolumeoffers acompleteintroduction to thefieldof seminumericalalgorithms,with separate chapters onrandomnumbersandarithmetic.The booksummarizesthemajor paradigms andbasictheory of suchalgorithms,thereby providingacomprehensiveinterfacebetween computer programming
and numericalanalysis
Volume3/ SortingandSearching
SecondEdition (0-201-89685-0)
Thethirdvolumecomprises themost
comprehensive survey ofclassicalcomputer
techniquesfor sortingandsearching.Itextendsthe treatment of data structuresinVolumeI
to consider bothlargeandsmalldatabases andinternaland external memories
Volume 4A/Combinatorial Algorithms,Part1(0-201-03804-8)
Thisvolumeintroduces techniquesthatallowcomputers todeal efficientlywithgiganticproblems.Itscoveragebeginswith Booleanfunctionsandbitwise tricksand techniques,thentreatsindepth the generation ofall
tuplesand permutations,allcombinations
Trang 3jui m
Trang 5THE ART OF
COMPUTER PROGRAMMING
SECOND EDITION
Trang 7THE ART OF
SECOND EDITION
UpperSaddleRiver,NJ •Boston •Indianapolis •San Francisco
NewYork•Toronto Montreal•London •Munich •Paris•Madrid
Trang 8T^XisatrademarkoftheAmerican Mathematical Society
METflFONTisatrademarkofAddison-Wesley
Theauthorand publisher have taken careinthepreparationof thisbook, butmakenoexpressedorimpliedwarrantyofany kind and assume noresponsibility for errors oromissions.Noliabilityisassumedforincidental orconsequentialdamagesinconnectionwithor arisingoutofthe useofthe informationorprograms containedherein.Thepublisheroffersexcellentdiscountsonthisbookwhenorderedinquantityforbulkpurposesor special sales,whichmayinclude electronic versionsand/or customcovers
and contentparticular toyourbusiness, training goals,marketingfocus,and brandinginterests.Formoreinformation,please contact:
U.S.CorporateandGovernmentSales (800)382-3419
corpsalesOpearsontechgroup.com
Forsalesoutside theU.S., please contact:
International Sales international0pearsoned.com
Visituson theWeb:informit.com/aw
LibraryofCongressCataloging-in-PublicationData
Knuth,Donald Ervin,
1938-The art of computer programming/Donald Ervin Knuth.
xiv,782p. 24cm
Includes bibliographical references and index
Contents:v.1.Fundamental algorithms —v.2.Seminumericalalgorithms —v.3.Sorting and searching —v.4a.Combinatorialalgorithms
Copyright©1998byAddison -Wesley
All rights reserved PrintedintheUnitedStates ofAmerica This publicationisprotectedbycopyright,and permission must be obtained from the publisherpriortoany prohibited reproduction, storageinaretrievalsystem,ortransmissioninany form
orby any means,electronic,mechanical, photocopying,recording,or likewise Forinformation regarding permissions,write to:
Pearson Education,Inc
Rightsand Contracts Department
501 BoylstonStreet,Suite900
Trang 9anoblescience;cooksaregentlemen
—TITUSLIVIUS,AbUrbe ConditaXXXIX.vi(Robert Burton,Anatomyof Melancholy1.2.2.2)
ThisBOOKforms a natural sequel to the materialoninformation structures
inChapter2 ofVolume1,becauseitadds the concept of linearly ordered data tothe other basic structuralideas
Thetitle SortingandSearching”maysoundasifthisbookisonlyforthosesystemsprogrammerswhoareconcerned with the preparation of general-purposesorting routines or applications to informationretrieval.Butin factthe area ofsortingandsearching providesanidealframeworkfordiscussing awidevariety
ofimportant generalissues:
•Howaregoodalgorithms discovered?
•Howcan given algorithmsand programsbeimproved?
•Howcan theefficiencyofalgorithms be analyzed mathematically?
•Howcan a person choose rationallybetweendifferentalgorithms
forthe
sametask?
•Inwhatsensescanalgorithmsbeproved “best possible”?
•Howdoes the theory ofcomputinginteractwith practical considerations?
•Howcan externalmemoriesliketapes,drums,or disksbeusedefficientlywith large databases?
Indeed,Ibelievethat virtually every important aspect ofprogrammingarisessomewhereinthe context of sorting or searching!
ThisvolumecomprisesChapters5and6 of thecompleteseries.Chapter5
isconcerned with sorting into order;thisisa large subject that hasbeendividedchieflyintotwoparts,internal sortingandexternalsorting Therealsoare
supplementarysections,whichdevelop auxiliary theoriesaboutpermutations(Section5.1)and about optimumtechniquesforsorting (Section5.3).Chapter6dealswith theproblemof searchingforspecifieditemsintables orfiles;thisissubdivided intomethodsthat search sequentially, orby comparisonof keys, or
bydigitalproperties,orbyhashing,andthen themoredifficultproblemof
Trang 10between bothchapters,with strong analogies tying the topics together Twoimportantvarietiesofinformation structures are also discussed,inaddition tothose consideredinChapter2,namelypriorityqueues (Section5.2.3)andlinearlistsrepresented as balanced trees (Section6.2.3).
LikeVolumes1and2,thisbookincludes alotofmaterial that does not
appearinother publications Manypeoplehave kindly written tomeabout
theirideas,orspokentomeaboutthem,andIhopethatIhave not distortedthe material too badlywhenIhave presenteditinmy ownwords
Ihave nothadtime to search the patent literature systematically; indeed,
Idecry the currenttendencytoseek patentsonalgorithms(seeSection5.4.5)
Ifsomebodysendsmeacopyofa relevant patent not presently citedin thisbook,Iwilldutifullyrefertoitinfutureeditions.However,Iwanttoencouragepeople to continue the centuries-oldmathematicaltradition of puttingnewly
discovered algorithms into the publicdomain Thereare betterwaystoearn alivingthantoprevent other peoplefrommakinguse of one’s contributions to
computerscience
BeforeIretiredfromteaching,Iusedthisbookasa textfora student’ssecond courseindata structures, at the junior-to-graduatelevel,omittingmost
ofthemathematicalmaterial.Ialsoused themathematicalportions ofthisbook
asthe basisforgraduate-level coursesinthe analysis of algorithms,emphasizing
especially Sections5.1,5.2.2, 6.3,and6.4.Agraduate-level courseonconcretecomputational complexity couldalsobe basedonSections5.3,and5.4.4,togetherwith Sections4.3.3, 4.6.3,and4.6.4ofVolume2
For themostpartthisbookisself-contained, exceptforoccasional sions relating to theMIX computerexplainedinVolume1.AppendixBcontains asummaryofthemathematicalnotations used,someofwhicharealittledifferentfromthosefoundintraditionalmathematicsbooks
discus-Prefacetothe SecondEdition
Thisneweditionmatchesthe third editions ofVolumes1and2, inwhichIhave
beenable to celebrate the completion ofT^XandMETFIFONTbyapplying thosesystems to the publications they were designedfor
Theconversion to electronic format has givenmethe opportunity to goover everywordofthe textandevery punctuationmark I’ve triedto retainthe youthful exuberance ofmyoriginal sentences while perhaps addingsomemore maturejudgment Dozensofnewexerciseshavebeenadded; dozens ofoldexerciseshavebeengivennew and improvedanswers Changes appear
everywhere, butmostsignificantly inSections5.1.4(about permutationsand
tableaux),5.3(aboutoptimumsorting), 5.4.9(about disksorting), 6.2.2(about
entropy), 6.4 (about universal hashing),and6.5(about multidimensional trees
Trang 11/^\The ArtofComputerProgrammingis,however,stillaworkinprogress.
JL Researchonsortingandsearching continues togrowataphenomenalrate.Thereforesomeparts ofthisbookareheadedby an“under construction”icon,
toapologizeforthefactthatthe materialisnot up-to-date.Forexample,if Iwere teachinganundergraduateclassondata structures today,Iwouldsurelydiscussrandomizedstructuressuch as treapsatsomelength;butatpresent,I
amonly able tocitethe principalpapersonthe subject,andtoannounceplansfora future Section6.2.5 (seepage478).Myfilesareburstingwith importantmaterial thatIplan to includeinthefinal,glorious,third edition ofVolume3,perhaps 17 yearsfromnow.ButImustfinishVolumes4and5first,andIdo
notwanttodelay their publicationanymore thanabsolutely necessary
Iamenormouslygratefultothemanyhundredsofpeoplewhohave helped
metogatherandrefine thismaterial during the past 35years Mostofthehardworkofpreparing theneweditionwasaccomplishedbyPhyllisWinkler(whoput the text of thefirstedition into form),bySilvioLevy (who
editeditextensivelyandhelped to prepare severaldozenillustrations),and by
JeffreyOldham(whoconvertedmore than250 of the originalillustrationstoMETflPOSTformat) Theproductionstaff atAddison-Wesleyhas alsobeen
extremelyhelpful,as usual
Ihave corrected every error thatalertreaders detectedinthefirstedition—
aswell assomemistakes that,alas,nobodynoticed—andIhave tried to avoidintroducingnewerrorsinthenewmaterial.However,Isupposesomedefectsstillremain,andIwanttofixthemassoon aspossible.ThereforeIwillcheerfully
award$2.56 to thefirstfinder ofeach technical, typographical, orhistorical error.The webpagecitedon pageivcontains a currentlistingofallcorrections thathavebeenreported tome
Stanford,California D E.K.February1998
Thereare certaincommonPrivilegesofaWriter,theBenefit whereof,Ihope,therewillbe no Reasontodoubt;Particularly,thatwhereIamnot understood,itshallbe concluded,thatsomethingvery usefuland profoundiscoucht underneath
— JONATHANSWIFT,TaleofaTub,Preface(1704)
Trang 13TheEXERCISESin this setofbookshavebeendesignedforself-study as well
asforclassroom study.Itisdifficult,ifnot impossible,foranyonetolearnasubject purelybyreadingaboutit,without applying the information tospecificproblems andthereby beingencouragedtothinkaboutwhathasbeenread.Furthermore,wealllearn best the things thatwehave discoveredforourselves.Therefore the exercisesformamajorpart ofthiswork; adefiniteattempthas
beenmadetokeepthemasinformative as possibleandtoselectproblemsthatareenjoyable as well asinstructive
Inmanybooks, easy exercises are foundmixed randomlyamongextremelydifficultones.Amotley mixtureis,however, often unfortunate because readersliketoknowinadvancehowlongaproblemought to take—otherwise theymayjustskip overalltheproblems.Aclassicexampleofsuch a situationisthebookDynamic Programming byRichard Bellman;thisisanimportant,pioneeringworkinwhichagroupofproblemsiscollectedtogetherattheend
ofsomechaptersundertheheading “ExercisesandResearch Problems,” withextremelytrivialquestions appearinginthemidst of deep, unsolved problems
It rumoredthatsomeoneonce asked Dr.Bellmanhowtotellthe exercisesapartfromthe research problems,andhereplied,“Ifyoucan solveit, it isanexercise;otherwiseit’sa research problem.”
Goodarguments can bemadeforincludingbothresearchproblemsand
very easy exercisesinabookofthiskind;therefore,tosave the readerfrom
the possibledilemmaofdeterminingwhicharewhich, ratingnumbershavebeen
provided to indicate thelevelofdifficulty Thesenumbershave the followinggeneralsignificance:
Rating Interpretation
00 Anextremely easy exercise that canbe answered immediatelyifthematerial of the text hasbeenunderstood; suchanexercisecanalmostalwaysbeworked“inyour head.”
10 Asimpleproblemthatmakes youthink over the material just read, but
isby nomeansdifficult.Youshouldbeable todothis inoneminuteatmost; penciland papermaybe usefulinobtaining the solution
20 Anaverageproblemthattestsbasicunderstanding of the text rial,butyoumayneedaboutfifteenortwenty minutes toansweritcompletely
Trang 14mate-30 Aproblemofmoderatedifficultyand/orcomplexity;thisonemayinvolvemorethantwohours’workto solvesatisfactorily,orevenmore
iftheTVison
40 Quite adifficultorlengthyproblemthatwould besuitableforaterm
projectinclassroomsituations.Astudent should be able to solve the
problemina reasonableamountof time,but the solutionisnottrivial
50 Aresearchproblemthathas not yetbeensolvedsatisfactorily,asfar
asthe authorknewatthetime of writing, althoughmanypeoplehavetried Ifyouhave foundan answertosuch a problem,youought towriteitupforpublication; furthermore, the author ofthisbook would
appreciate hearingaboutthe solution as soon as possible (provided that
it iscorrect)
Byinterpolationinthis“logarithmic”scale,the significance of other rating
numbers becomesclear.For example, a rating of17wouldindicateanexercisethatisabitsimplerthanaverage Problemswith a rating of 50 that aresubsequently solvedbysomereadermayappearwith a45ratingin latereditions
ofthe book,andinthe errata postedonthe Internet(seepageiv).The remainderofthe ratingnumberdividedby5 indicates theamountofdetailedworkrequired.Thus,anexercise rated 2\maytake longer to solvethan
anexercise thatisrated25,but thelatter willrequiremorecreativity.Theauthor has tried earnestly to assign accurate ratingnumbers,butit is
difficult forthe personwhomakes upaproblemtoknowjusthowformidableit
willbeforsomeoneelseto finda solution;andeveryone hasmoreaptitudeforcertaintypes ofproblems thanforothers.Itishopedthat the ratingnumbers
represent agoodguessatthelevelofdifficulty,but they should be taken asgeneral guidelines, not as absolute indicators
Thisbookhasbeenwrittenforreaderswith varying degrees ofmathematical
trainingandsophistication; asaresult,someofthe exercises are intended onlyforthe use ofmoremathematically inclined readers.Theratingisprecededby anM
ifthe exercise involvesmathematicalconcepts or motivation to a greater extent
thannecessaryforsomeonewhoisprimarily interested onlyinprogramming
the algorithms themselves.Anexerciseismarkedwith theletters“HM”ifitssolution necessarily involves aknowledgeof calculus or other highermathematics
not developedin thisbook AnU
HM"designation does not necessarilyimplydifficulty
Someexercises areprecededby anarrowhead, thisdesignates lems that are especially instructiveandespeciallyrecommended Ofcourse,no
prob-reader/studentisexpected toworkallof theexercises,so those thatseemto
be themostvaluablehavebeensingled out (This distinctionisnotmeanttodetractfromthe otherexercises!)Eachreader shouldat leastmake an attempt
to solvealloftheproblems whoseratingis10 orless;andthearrowsmayhelp
toindicatewhichoftheproblemswith a higher rating shouldbegivenpriority.Solutions tomostofthe exercisesappearintheanswersection.Please use
Trang 15solvetheproblem byyourself,orunlessyouabsolutelydonothave time toworkthisparticularproblem After getting yourownsolution or givingtheproblemadecenttry,youmayfindtheanswerinstructiveandhelpful.Thesolutiongivenwilloftenbe quiteshort,anditwillsketch thedetailsundertheassumption
thatyouhave earnestly tried to solveitbyyourownmeansfirst.Sometimesthesolution giveslessinformationthanwasasked;oftenitgivesmore It quitepossiblethatyoumayhave a betteranswer thantheone published here, oryou
mayhavefoundanerrorinthepublished solution;insuch acase,theauthorwillbe pleased toknowthedetails.Later printings ofthisbookwillgivethe
improvedsolutionstogetherwith thesolver’snamewhereappropriate.Whenworking anexerciseyoumaygenerally use theanswers to previousexercises,unlessspecificallyforbiddenfromdoingso.Theratingnumbershave
beenassignedwiththisinmind; thusit possibleforexercisen+1tohave alower ratingthanexercisen,eventhoughitincludestheresultof exercisenas
a specialcase
Summaryofcodes: 00 Immediate
10 Simple (one minute)
20 Medium(quarterhour)
Recommended 30 Moderately hard
M Mathematicallyoriented 40 Termproject
HM Requiring “highermath” 50 Research problemEXERCISES
1.[00]Whatdoes therating“M20”mean?
2.[10] Ofwhat value can theexercises inatextbookbetothereader?
3 [HM45]Prove thatwhennisaninteger,n>2,theequationx +y =z has
nosolution in positive integersx,y,z
Twohours'dailyexercise willbeenough
tokeepahackfitfor hiswork
—M.H.MAHON,The HandyHorseBook(1865)
Trang 16055.2.3.SortingbySelection 13g5.2.4 Sortingby Merging 1585.2.5 Sortingby Distribution Igg5.3.OptimumSorting
18g5.3.1.Minimum-ComparisonSorting Igg
*5.3.2.Minimum-ComparisonMerging ig7
*5.3.3.Minimum-ComparisonSelection 207
*5.3.4.NetworksforSorting 2195.4 External Sorting
24g5.4.1.Multiway Merging and ReplacementSelection 252
*5.4.2 ThePolyphaseMerge 267
*5.4.3.TheCascadeMerge 288
*5.4.4 ReadingTapeBackwards 299
*5.4.5.TheOscillatingSort 3H
*5.4.6 PracticalConsiderationsforTape Merging 317
*5.4.7.ExternalRadix Sorting 343
*5.4.8.Two-TapeSorting 34g
*5.4.9.DisksandDrums 3gg5.5.Summary,History,and Bibliography 3gg
Chapter6—Searching 3926.1 Sequential Searching ggg6.2 Searching by ComparisonofKeys 4gg6.2.1 Searching an Ordered Table 4Qg6.2.2 Binary Tree Searching 42g6.2.3 Balanced Trees 4gg6.2.4 Multiway Trees 4gl
Trang 176.3 DigitalSearching 492
6.5.Retrievalon Secondary Keys 559
Answersto Exercises 584
AppendixA —Tables ofNumericalQuantities 748
1. Fundamental Constants (decimal) 748
2. Fundamental Constants(octal) 749
3. HarmonicNumbers, Bernoulli Numbers, FibonacciNumbers . 750
AppendixB —Index toNotations 752
AppendixC —Index toAlgorithmsandTheorems 757IndexandGlossary 759
Trang 19Thereisnothingmoredifficulttotakeinhand,
moreperilous toconduct,ormoreuncertainin itssuccess,thantotake theleadinthe introductionof
aneworderofthings
— NICCOLOMACHIAVELLI, ThePrince(1513)
"Butyou can'ttookupallthoselicensenumbersintime,"Drakeobjected
"Wedon’thaveto,Paul Wemerely arrange alist
and lookforduplications."
— PERRY MASON,inTheCase of theAngryMourner (1951)
"Treesort"Computer—Withthisnew'computer-approach'
tonature studyyou canquickly identifyover260different treesofU.S.,Alaska,and Canada,even palms, deserttrees,and otherexotics
Tosort,you simplyinserttheneedle
— EDMUNDSCIENTIFICCOMPANY,Catalog(1964)
InTHISCHAPTERweshallstudy a topic thatarisesfrequentlyinprogramming:
therearrangement of items into ascending or descending order Imaginehow
harditwouldbe to use a dictionaryifitswords werenot alphabetized! We
willsee that,ina similar way, the orderinwhichitems are storedincomputer
memoryoftenhas aprofoundinfluenceonthespeedandsimplicity ofalgorithmsthatmanipulatethose items
Althoughdictionaries oftheEnglish language define “sorting” as the process
ofseparating or arranging things according toclassor kind,computer merstraditionally use thewordinthemuchmorespecial sense ofmarshalingthings into ascending or descending order.Theprocessshould perhaps be calledordering,notsorting;butanyonewhotriestocallit “ordering”issoon ledintoconfusionbecause of themanydifferentmeaningsattached to that word.Consider the following sentence,forexample:“Sinceonlytwoofour tape driveswereinworkingorder,Iwasordered to ordermoretape unitsinshort order,
program-inorder to order the data several orders ofmagnitudefaster.” Mathematical
terminologyaboundswithstillmoresenses of order (the order of a group, theorder of a permutation, the order of abranchpoint, relations of order,etc.,etc.).Thus wefindthattheword“order”canlead to chaos
Somepeoplehavesuggested that “sequencing”would bethe bestnameforthe process of sorting into order; butthiswordoftenseemsto lacktheright
Trang 20connotation,especiallywhenequal elements are present, anditoccasionallyconflictswith other terminology It quite true that “sorting”isitselfan
overusedword(“Iwassortofout ofsortsaftersorting thatsortof data”),butithasbecomefirmly establishedincomputingparlance Thereforeweshalluse theword“sorting”chiefly inthestrictsense of sorting into order, withoutfurther apologies
Someofthemostimportant applications of sortingare:
a)Solvingthe“togetherness”problem,inwhichallitems with thesameficationarebrought together Supposethatwehave 10000 itemsinarbitraryorder,manyofwhichhave equal values;andsuppose thatwe wantto rearrangethe data so thatallitems with equal valuesappearinconsecutive positions This
identi-isessentiallytheproblemof sortinginthe older sense of the word;anditcanbesolvedeasilybysorting thefileinthenewsense of the word, so that the valuesareinascending order,Vi<v2< <tqoooo•Theefficiencyachievablein thisprocedure explainswhythe originalmeaningof “sorting” has changed.b)Matchingitems in two ormorefiles.Ifseveralfileshavebeensorted into the
sameorder,it ispossible to findallofthematchingentriesinone sequential pass
throughthem, without backing up Thisisthe principle that PerryMasonused
tohelp solve amurdercase(seethe quotationatthe beginning ofthischapter)
Wecanusually process alistof informationmostquicklybytraversingitinsequencefrombeginning to end, instead of skippingaroundatrandominthelist,unless the entirelist issmallenoughtofitina high-speed random-access
memory.Sortingmakesitpossible to use sequential accessingonlargefiles,as
afeasiblesubstitutefordirect addressing
c) Searching for information by keyvalues.Sortingisalsoanaid to searching,
asweshallseeinChapter6,henceithelps usmake computeroutputmore
suitableforhumanconsumption Infact,alistingthathasbeensorted intoalphabetic order often looks quite authoritative evenwhenthe associated nu-merical information hasbeenincorrectlycomputed
Althoughsorting has traditionallybeenused mostlyforbusiness data cessing,it isactually a basic tool that everyprogrammershould keepinmindforuseinawidevariety ofsituations Wehave discusseditsuseforsimplify-ing algebraic formulas,inexercise 2.3.2-17 Theexercisesbelowillustratethediversity of typical applications
pro-Oneof thefirstlarge-scale software systems todemonstratetheversatility
of sortingwastheLARCScientificCompilerdevelopedbyJ.Erdwinn,D E.Ferguson,andtheirassociatesatComputerSciences Corporationin1960.Thisoptimizing compilerforanextendedFORTRANlanguagemadeheavyuse ofsorting so that the various compilation algorithmswerepresented with relevantparts of the sourceprogramina convenient sequence Thefirstpasswasalexicalscan that divided theFORTRANsource code into individual tokens, each
representinganidentifierora constant oranoperator,etc Eachtokenwas
assigned several sequencenumbers;whensortedonthenameand anappropriate
Trang 21“definingentries”by whicha userwouldspecifywhether anidentifierstoodforafunctionname,aparameter, or adimensionedvariablewere given low sequence
numbers,sothattheywould appearfirstamongthetokenshaving a givenidentifier;thismadeiteasy tocheckforconflictingusageandto allocatestoragewith respect toEQUIVALENCEdeclarations.Theinformation thus gatheredabout
eachidentifierwasnowattached toeach token;inthisway no “symboltable”
ofidentifiersneededtobemaintainedinthehigh-speedmemory.The updated
tokens werethen sortedonanothersequencenumber, whichessentiallybroughtthesourceprogram backintoitsoriginalorderexcept that thenumbering schemewascleverlydesigned to put arithmetic expressions into amoreconvenient
“Polishprefix”form Sortingwasalsousedinlaterphases of compilation, tofacilitateloop optimization, tomergeerrormessages into thelisting,etc Inshort,thecompilerwasdesigned so that virtuallyallthe processingcouldbe
donesequentiallyfromhiesthatwere storedinanauxiliarydrummemory,sinceappropriatesequencenumberswere attached to the datainsuch awaythatitcouldbe sorted into various convenient arrangements
Computermanufacturers of the 1960s estimated thatmore than25 percent
oftherunning timeontheircomputerswasspentonsorting,whenalltheircustomerswere taken into account Infact,thereweremanyinstallationsinwhichthetask of sortingwasresponsibleformore thanhalfofthecomputing
time Fromthesestatisticswemayconclude that either(i)there aremanyimportant applications ofsorting,or(ii)manypeople sortwhenthey shouldn’t,
or(iii)inefficientsortingalgorithmshavebeenincommonuse.Therealtruthprobably involvesallthree of thesepossibilities,butinanyeventwecan see thatsortingisworthyof serious study, asa practical matter
Evenifsortingwere almostuseless,therewould beplenty of rewardingsonsforstudyingitanyway!Theingenious algorithms thathavebeendiscovered
rea-showthat sortingisanextremely interesting topic to explorein itsownright.Manyfascinatingunsolvedproblemsremainin thisarea,as well as quitea fewsolved ones
Fromabroader perspectivewewillfind alsothat sortingalgorithmsmakeavaluable case study ofhowtoattackcomputerprogramming problemsingeneral.Manyimportant principles of data structure manipulationwillbeillustrated inthischapter.Wewillbe examiningthe evolution of various sortingtechniques
inan attemptto indicatehowthe ideaswere discoveredinthefirstplace.Byextrapolatingthiscasestudywecan learn agooddealaboutstrategiesthat help
us designgoodalgorithmsforothercomputerproblems
Sortingtechniques also provide excellentillustrationsofthe general ideasinvolvedinthe analysis of algorithms—the ideasused todetermineperformance
characteristicsofalgorithms so thatanintelligentchoicecanbemadebetweencompeting methods Readerswhoaremathematically inclinedwillfindquiteafew instructive techniquesin thischapterforestimating the speed ofcomputer
algorithmsandforsolvingcomplicated recurrencerelations.Onthe otherhand,the materialhasbeenarranged so that readerswithout amathematicalbentcan
skipover these calculations
Trang 224 SORTING
5Before going on,weought to define ourproblemalittlemoreclearly,and
introducesometerminology.Weare givenNitems
Ri,R2, ,Rn
tobe sorted;weshall callthemrecords,andthe entire collection ofNrecordswillbe calleda, file.EachrecordRjhas akey,Kj,whichgoverns the sortingprocess Additional data, besides thekey,isusually also present;thisextrasatelliteinformation” hasnoeffectonsorting except thatitmust becarriedalong as part of each record
Anordering relation“<”isspecifiedonthe keys so that the followingconditions aresatisfied foranykey valuesa, c:
1 Exactly one of thepossibilitiesa<b,a=b,b<aistrue.(Thisiscalledthelaw of trichotomy.)
ii)Ifa<bandb<c,then a<c.(Thisisthe familiar law oftransitivity.)Properties(i)and(ii)characterize themathematicalconcept of linear ordering,also calledtotalordering.Anyrelationship“<”satisfying thesetwopropertiescanbesortedby mostof themethodstobe mentionedin thischapter, although
somesorting techniques are designed toworkonly with numerical or alphabetickeys that have the usual ordering
Thegoal of sortingisto determine apermutationp(l) p(2) p(N) of theindices{1,2, A}thatwillput the keys into nondecreasing order:
Kp(i)<-Kp(2)<•••< KP
(N)• (i)Thesortingiscalledstable ifwe makethe further requirement that records withequal keys should retain their originalrelativeorder In other words, stablesorting has the additional property that
P(l )< PU whenever Kp(l)= Kp{]) and *<j (2)
Insomecaseswewillwantthe records to be physically rearrangedinstorage
so that their keys areinorder.Butinother casesitwillbesufficientmerely tohaveanauxiliary table thatspecifiesthepermutationinsomeway, so that therecordscanbe accessedinorder of their keys
Afew of the sortingmethodsin thischapterassumethe existence of either
orbothof the values “oo”and oo”,whichare defined to be greaterthanorlessthanallkeys, respectively:
-oo< Kj <oo, for 1<j<N
(3)
Such extremevalues are occasionally used asartificialkeys or as sentinel tors.Thecase of equalityisexcludedin(3);ifequalitycan occur, the algorithmscanbemodified so that theywillstillwork, but usually at the expense ofsome
indica-eleganceandefficiency
Sorting can beclassifiedgenerally into internalsorting, inwhichthe recordsarekept entirelyinthe computer’s high-speed random-accessmemory, andex-
Trang 23memoryatonce.Internalsortingallowsmoreflexibilityinthe structuringand
accessing ofthe data, while external sortingshowsushowtolivewith ratherstringent accessing constraints
Thetime required to sortNrecords,usinga decent general-purpose sortingalgorithm,isroughly proportional toNlogIV;wemake aboutlogA?'“passes”over the data.Thisistheminimumpossible time, asweshallseeinSection5.3.1,
ifthe records areinrandomorderandifsortingisdonebypairwisecomparisons
of keys Thusifwedouble thenumberof records,itwilltakealittlemorethantwice aslong tosortthem,allother thingsbeing equal (Actually,asN
approachesinfinity,a better indication of the timeneededto sortisN(\ogN)2
be accomplishedinO(N)stepsonthe average
EXERCISES —FirstSet
1.[M20]Prove,from the lawsoftrichotomyandtransitivity,thatthe permutationp(l)p(2)..p(N)isuniquelydeterminedwhenthesortingisassumedtobestable
2.[21]Assumethateach recordRjinacertainfilecontainstwokeys,a “major key”
Kjand a “minor key”kj,with alinearordering<definedon eachofthesetsof keys.Then wecandefinelexicographicorderbetweenpairs ofkeys(K,k) inthe usualway:
(Ki,ki)<(Kj,kj) if Ki< Kj orif Ki=Kj and ki<kj.Alicetookthisfileandsortedit firston themajorkeys,obtainingngroupsofrecordswith equalmajor keysineach group,
Ap(i)— Ap(q)<--^p(*i+i)—* —A”p(i2
) "^ *
^p(in—i+1)—* —A^p(i n),
wherei„=N.Thenshe sortedeachofthengroupsRp(i_1+i), ,Rp(i )ontheirminorkeys
Billtook thesameoriginalfileandsortedit firston the minorkeys;then hetooktheresultingfile,andsortediton the majorkeys
Christook thesameoriginalfileand did asinglesortingoperationonit,usinglexicographicorderon themajorand minor keys (Kj,kj)
Did everyone obtain thesameresult?
3 [M25] Let<be arelationon K\,. Knthatsatisfiesthelawoftrichotomy butnot thetransitivelaw.Prove that even without thetransitivelawit ispossible to sorttherecordsinastablemanner, meeting conditions(l)and(2);in fact,thereare atleastthreearrangements thatsatisfytheconditions!
4 [21] Lexicographers don’tactuallyusestrictlexicographicorderindictionaries,becauseuppercaseand lowercaselettersmust beinterfiled.Thustheywant an orderingsuchas this:
a< A<aa<AA<AAA <Aachen<aah<• <zzz<ZZZ
Trang 246 SORTING
5
5 [M28]Design a binary codeforallnonnegativeintegers sothatifnisencodedasthestringp(n)wehavem< nifand onlyifp(rn)islexicographically lessthanp(n).Moreover, p(m) should not be aprefixofp(n)foranym #n.Ifpossible,thelengthofp(n) should beatmost lgn+O(loglogn)foralllarge n.(Such a codeisusefulifwe
wantto sort textsthatmixwords and numbers,orifwewanttomaparbitrarily largealphabetsintobinarystrings.)
6.[15] Mr B C Dull(aMIX programmer) wantedtoknowifthenumberstoredinlocationAisgreaterthan,lessthan,orequaltothenumberstoredinlocation B.So
hewrote‘LDAA;SUBB”andtestedwhetherregisterAwaspositive,negative, or zero.Whatseriousmistake did he make, and what should he have done instead?
7.[17] Write a MIX subroutineformultiprecision comparisonof keys,having thefollowing specifications:
Calling sequence: JMP COMPARE
Entryconditions: rll=n;CONTENTS(A+k)=akand CONTENTS(B+k)=bk ,for
1<A;<n;assume thatn >1.
Exitconditions: Cl=GREATER,if(a„, ,ai)>(b n, &i)
Cl=EQUAL, if(a„,. ai)=(bn, ,b1
)-Cl=LESS, if(a„,. ai)<(bn, ,bi);
rXandrllare possibly affected
Here therelation (a„,. ,ai)<(b n, ,bi)denoteslexicographicordering fromlefttoright;thatis,thereisan indexjsuch that ak=bkforn>k>j,but a3<b3
8.[30] Locations Aand B contain two numbers a andb,respectively.Showthatit is
possible towrite a MIXprogram that computes andstoresmin(a,b) location C,withoutusinganyjumpoperators.(Caution: Since youwillnotbeable to testwhetherornotarithmetic overflow has occurred,it wisetoguarantee that overflowisimpossibleregardless ofthevalues ofaandb.)
9
[M27 AfterNindependent, uniformlydistributedrandomvariablesbetween 0and1have been sortedintonondecreasingorder,whatistheprobabilitythat the rthsmallest ofthesenumbersis<x?
EXERCISES —SecondSet
Eachofthefollowing exercises statesaproblem that a computerprogrammermighthavehadto solve intheolddayswhencomputersdidn’thavemuchrandom-accessmemory Suggest a “good”wayto solvetheproblem, assumingthatonlya few thousandwordsof internalmemoryare available,supplemented by abouthalfa dozen tapeunits(enough tapeunits for sorting).Algorithms that workwellunder suchlimitations alsoprovetobeefficientonmodernmachines
10 [15]Youaregiven a tape containing onemillionwordsof data Howdo youdeterminehowmanydistinctwordsarepresenton the tape?
11.[18] YouaretheU.S.InternalRevenueService;youreceive millions oftionforms from organizationstellinghow muchincome they have paidto people,andmillions oftax forms from peopletellinghow muchincome they have beenpaid.How
“informa-do you catch peoplewhodon’t reportallof theirincome?
12.[M25] (Transposing amatrix.) Youaregiven a magnetic tape containing onemillionwords, representing the elementsofa 1000X 1000 matrix storedinorderbyrows:
Trang 25elementsarestoredby columns 1 u 2,1 a1000, 1 1 2 a1000.2.uiooo,ioooinstead?(Trytomakelessthan a dozen passes over thedata.)
13.[M26]Howcouldyou“shuffle”alargefileofNwordsintoarandomment?
rearrange-14.[20] Youareworking with two computer systems that havedifferentconventionsforthe“collatingsequence” thatdefinestheorderingofalphamericcharacters.Howdoyoumakeonecomputersortalphamericfilesintheorder usedby the other computer?
15.[IS] Youaregiven alistofthenamesofafairlylargenumberofpeopleborninthe U.S.A., together with thenameofthestatewhere they were born.Howdo youcount thenumberofpeoplebornineachstate?(Assume thatnobodyappearsinthelistmorethanonce.)
16.[20] Inordertomakeiteasiertomakechangesto largeFORTRANprograms,youwanttodesign a“cross-reference”routine;such a routine takesFORTRANprograms
asinputandprintsthemtogetherwithan index that shows each useofeachidentifier(thatis,eachname)intheprogram.Howshould such a routine be designed?
17 [33](Library cardsorting.) Before the daysofcomputerized databases, everylibrarymaintained a catalogofcardssothatuserscouldfindthebooks they wanted.But the taskofputting catalog cardsintoan order convenientforhumanuseturned out
tobequitecomplicatedas library collectionsgrew.Thefollowing “alphabetical” listingindicatesmanyoftheproceduresrecommendedintheAmerican Library AssociationRulesforFilingCatalog Cards (Chicago:1942):
Textofcard
R.Accademianazionaledei Lincei,Rome
1812; ein historischerRoman
Bibliothequed’histoire revolutionnaire
Bibliotheque descuriosites
Brown, Mrs.J.Crosby
Brown, John
Brown, John, mathematician
Brown, John,ofBoston
LeXIXesiecle frangais
The1847issueof S.stamps
1812overture
Remarks
Ignoreforeignroyalty(exceptBritish)AchtzehnhundertzwolfTreat apostropheasspaceinFrenchIgnore accentsonlettersIgnore designationofrank
Nameswith datesfollowthosewithout
. and thelatteraresubarranged
bydescriptivewordsArrangeidenticalnamesby birthdate
Works“about”followworks “by”Sometimes birthdate must be estimatedIgnore designationofrankTreathyphenasspace
Booktitlesfollowcompound names
&inEnglishbecomes “and”Ignore apostropheinnames
Ignoreaninitial article
. providedit’sinnominative case
NamesprecedewordsDix-huitcentdouzeDix-neuviemeEighteenforty-sevenEighteen twelve
Trang 26IBMjournalof researchand development
ha-Iha-ehad
Ia;alove story
InternationalBusiness Machines Corporation
al-KhuwarizmT,MuhammadibnMusa,
fl.813-846
Labour.Amagazineforallworkers
Laborresearch association
Labour,seeLabor
UncleTom’scabin
U.S.bureauofthecensus
Vandermonde, Alexandre Theophile,
1735-1796
VanValkenburg,MacElwyn,
1921-Von Neumann,John,1903-1957
Thewholeart oflegerdemain
Who’safraid ofVirginiaWoolf?
Wijngaarden, Adriaanvan,
1916-RemarksInitialsare like one-letterwordsIgnoreinitial articleIgnore punctuationin titlesIgnoreinitial“al-”inArabicnames
Respellit“Labor”
Cross-reference cardIgnore apostropheinEnglish
Me =Mac
TreathyphenasspaceIgnore designationofrank
“Mrs.”=“Mistress”
Don’t ignoreBritish royalty
“St.”=“Saint”,eveninGerman
TreathyphenasspaceSainte(abook by Donald Ervin Knuth)(abook by Harriet Beecher Stowe)
“U.S.”=“UnitedStates”Ignore spaceafter prefix insurnamesIgnoreinitialarticle
Ignore apostropheinEnglish
Surnamebegins with uppercaseletterexceptions,and therearemanyotherrules(Mostoftheserules aresubjectto certain
notillustrated here.)
Ifyou were given the jobof sorting large quantities ofcatalog cardsby computer,and eventually maintaining a verylargefileofsuchcards,andifyou had no chancetochangetheselong-standingpoliciesofcardfiling,howwould you arrange the datainsuch awaythat thesortingand merging operationsare facilitated?
18 [M25](E.T.Parker.)Leonhard Euler once conjectured [Nova Acta Acad.Sci.Petropolitanae 13(1795),45-63,§3;writtenin 1778]that therearenosolutions totheequation
Trang 275 SORTING 9Infinitelymanycounterexampleswhen n=4were subsequently found byNoamElkies[Math.Comp.51(1988),825-835],Canyou thinkofawayinwhichsortingwouldhelpinthe searchforcounterexamplesto Euler’sconjecturewhen n=6?
19 [24 Given afilecontaining amillion or so distinct 30-bitbinarywords xi, ,xN,
whatisagoodwayto findallcomplementarypairs{xi,Xj} thatarepresent?(Two
wordsarecomplementarywhenone has 0 wherever the other has1,andconversely;thus theyarecomplementaryifand onlyiftheirsumis(11. 1)2 ,whentheyaretreatedasbinary numbers.)
20 [25] Given afilecontaining100030-bitwordsx\, ,Xiooo,howwould youpare alistofallpairs(Xi,Xj such that xt=Xjexceptinatmost twobitpositions?
pre-21 [22]Howwould you go about lookingforfive-letteranagrams suchasCARET,CARTE, CATER, CRATE, REACT, RECTA, TRACE; CRUEL, LUCRE, ULCER; DOWRY, ROWDY, WORDY?[One might wishtoknowwhether thereareanysetsoftenormorefive-letterEnglishanagramsbesidestheremarkableset
APERS, ASPER, PARES, PARSE, PEARS, PRASE, PRESA, RAPES, REAPS, SPAER, SPARE, SPEAR,
towhichwemight add the French wordAPRES.]
22.[M28]Given thespecifications ofafairlylargenumberofdirectedgraphs,what
approachwillbeuseful forgrouping the isomorphic ones together? (Directed graphsareisomorphicifthereisa one-to-one correspondence betweentheir verticesand aone-to-one correspondence betweentheir arcs,where the correspondences preserveincidencebetweenverticesandarcs.)
23.[30] Inacertaingroupof4096people,everyone has about 100 acquaintances
Afilehasbeen preparedlistingallpairs ofpeoplewhoareacquaintances.(Therelation
issymmetric:Ifxisacquainted withy,then yisacquainted withx.Therefore thefilecontains roughly 200,000entries.)Howwould you design an algorithmtolist allthefc-person cliques in thisgroupof people,givenk?(Acliqueisan instanceofmutualacquaintances: Everyoneinthecliqueisacquainted with everyoneelse.)Assumethatthere arenocliques of size 25, sothetotalnumberof cliquescannot be enormous
24 [30]Threemillionmenwithdistinctnameswerelaidend-to-end, reachingfromNewYorkto California.Eachparticipantwas given aslipofpaperon which he wrote
downhisownnameand thenameofthe person immediately westofhimintheline.
Themanattheextreme western enddidn’tunderstandwhatto do, sohe threwhispaper away; the remaining 2,999,999slipsofpaperwere putintoahuge basket andtakentothe National ArchivesinWashington, D.C Here the contentsofthe basketwereshuffledcompletelyandtransferred tomagnetictapes
Atthispointan informationscientistobserved thattherewas enough information
on the tapestoreconstruct thelistofpeopleintheir original order.Andacomputerscientistdiscovered awaytodo the reconstruction with fewer than 1000 passes throughthedatatapes,using only•sequentialaccessingoftapefilesand a smallamountofrandom-accessmemory.Howwas thatpossible?
[Inother words,given thepairs(xi,Xj+i),for1<i<N,inrandomorder,where theXiare distinct,howcan the sequence X\X2 xjvbe obtained,restricting
alloperationsto serialtechniquessuitable forusewith magnetic tapes?Thisistheproblemof sorting intoorderwhenthereisno easywaytotellwhichoftwo given keys
Trang 2825 [M21](Discretelogarithms.)You knowthat pisa(rather large)prime number,and that aisaprimitive rootmodulop.Therefore,forallb inthe range1<b<p,thereisaunique n such that an
modp =6, 1< n<p (Thisniscalledthe index
of bmodulop,withrespect to a.)Explainhowto find n,givenb,without needingO(n)steps [Hint: Letm = Wp)andtry to solveamni=ba~n2(modulop)for
0<ni ,«2<m.]
Trang 29*5.1.COMBINATORIAL PROPERTIES OF PERMUTATIONS
A PERMUTATIONofafinitesetisan arrangementofitselements into a row
Permutationsare of specialimportanceinthestudy of sorting algorithms, sincethey represent the unsorted input data In order to study theefficiencyofdifferentsortingmethods,wewillwanttobeabletocount thenumberofpermutations that cause a certain step of a sorting procedure to be executed
a certainnumberof times
Wehave, of course,metpermutations frequentlyinprevious chapters Forexample,inSection1.2.5wediscussedtwobasic theoreticalmethodsof con-structing then\permutations ofnobjects;inSection1.3.3weanalyzedsome
algorithmsdealing with the cycle structureandmultiplicative properties ofpermutations;inSection3.3.2westudied their“runs up”and“runsdown.”Thepurpose of the present sectionistostudy several other properties of per-mutations,andto consider the general casewhereequal elements are allowed toappear In the course ofthisstudywewilllearnagooddealaboutcombinatorialmathematics
Theproperties of permutations aresufficientlypleasing tobe
interestingintheirownright,andit isconvenient to developthemsystematicallyinone placeinstead of scattering the material throughoutthischapter Butreaderswhoarenot mathematically inclinedandreaderswhoareanxious to dive right intosorting techniques are advised to goonto Section5.2immediately, since thepresent section actually haslittledirectconnection tosorting
*5.1.1.Inversions
Let a1a2 a n be apermutationoftheset{1,2, , n).Ifi<jandat>aj,
the pair(ai; ci,) iscalledaninversion of the permutation;forexample, the
permutation314 2 has three inversions:(3, 1), (3, 2),and(4, Eachinversionis
a pair of elements thatisout ofsort,sothe onlypermutationwithnoinversionsisthe sortedpermutation12 n.This connection with sortingisthe chief reasonwhywewillbe so interestedininversions, althoughwehave already used theconcept to analyze adynamicstorage allocation algorithm(seeexercise2 2.2-9)
Theconcept of inversionswasintroducedbyG.Cramerin1750[Intr.a
VAnalysedes LignesCourbesAlgebriques (Geneva: 1750), 657-659; seeThomasMuir,TheoryofDeterminants1 (1906), 11-14],inconnection withhisfamous
ruleforsolvinglinearequations In essence,Cramerdefined the determinant of
ann xnmatrixinthe following way:
E(
-1
)
inv(.1«2-«")xlai a:2a2 x,summedoverallpermutations cq a2.an of{1, 2, ,n},whereinv(aj a2. a n)
isthenumberof inversions of the permutation.
Theinversiontableb\b2.,bn of thepermutationcqa2. anisobtainedby
Trang 3012 SORTING 5.1.1
In otherwords,bjisthenumberof inversionswhosesecondcomponentisj
Itfollows, forexample, that thepermutation
5 18 2 6 4 7 3 (r)has the inversion table
23 6 4 0 2 210, (2)since 5and9 are to theleftof1;5, 9,8areto theleftof2; etc.Thispermutation
has 20 inversionsinall.Bydefinitionthenumbersbjwillalwayssatisfy
0<bi<n-1, 0<62<n-2, , 0<6n _!<1, b =0 (3)Perhapsthemostimportantfactaboutinversionsisthe simple observationthatan inversion table uniquely determines the corresponding permutation.We
can gobackfrom anyinversion table b1b2. bnsatisfying
(3)totheunique
permutationthatproducesit,bysuccessivelydetermining therelativeplacement
of theelementsn,n—1, ,1 (in thisorder).For example,wecan construct the
permutationcorresponding to(2)asfollows: Writedownthenumber9;thenplace 8after 9,sincebg=1. Similarly,put 7afterboth8and9,since67=2.Then6mustfollowtwoof thenumbersalready writtendown,because be=2;the partialresultsofaristherefore
9 8 67
Continuebyplacing 5attheleft,sinceb5=0;put 4afterfour ofthenumbers;andput 3after sixnumbers (namelyattheextremeright),giving
5 9 8 6 4 73
Theinsertion of 2and1inananalogouswayyields(1)
This correspondenceisimportant becausewecan often translate aproblem
statedinterms of permutations intoanequivalentproblemstatedintermsofinversiontables,andthelatterproblemmaybeeasiertosolve For example,consider the simplest question ofall:How manypermutations of{1, 2,.
,n}arepossible?The answer must bethenumberof possible inversiontables,andtheyareeasilyenumeratedsincethere arenchoicesfor 61,independentlyn-1choicesfor 62,. 1choicefor bn,making n(n—) 1=n!choicesinall.Inversions areeasy to count, because theb’sarecompletely independent of each other, whilethea’smust bemutuallydistinct
InSection 1.2.10weanalyzed thenumberoflocalmaximathatoccurwhen
apermutationisreadfromright toleft;inotherwords,wecountedhowmanyelements are largerthan anyof their successors.(Theright-to-leftmaximain (1),forexample, are3, 8,and9.)Thisisthenumberof j such thatbjhasitsmaximumvalue,n-j Sincebiwillequaln-1with probability 1/n,and
(independently)b2 willbe equal ton-2with probability l/(n-1),etc.,it is
Trang 31Fig.1.Thetruncated octahedron, which shows the changeininversionswhenadjacentelementsofapermutationareinterchanged.
Thecorresponding generating functionisalsoeasilyderivedina similar way
Ifweinterchangetwoadjacent elements of a permutation,it iseasy to seethat the totalnumberof inversionswillincrease or decreasebyunity.Figure1showsthe 24 permutations of {1,2, 3,4}, withlinesjoiningpermutations thatdifferby aninterchange of adjacent elements; followinganylinedownwardinvertsexactly onenewpair.Hencethenumberof inversions of apermutation7r thelength of adownward path from1234 totcinFig.1;allsuch pathsmusthavethesamelength
Incidentally, thediagraminFig.1maybeviewed as a three-dimensionalsolid,the“truncated octahedron,”whichhas 8 hexagonal facesand6squarefaces.Thisisone of theclassicaluniform polyhedra attributed toArchimedes(seeexercise10)
Thereader should not confuse inversions of apermutationwith the inverse
ofa permutation Recall thatwe canwrite apermutationintwo-lineform
the inverse a[
a'-2a'3. a'n ofthispermutationisthepermutationobtainedby
Trang 32of thenewtop row:
andinversions:Theinverse of apermutationhas exactly asmanyinversions asthepermutationitself.Rothe’s proof ofthis factwasnot the simplest possibleone,butit isinstructiveandquite pretty nevertheless.Weconstructann x n
chessboard having a dotincolumnj ofrowiwhenevera,=j Then weput
xsin allsquares that have dots lyingboth below(inthesame column) andtotheirright (inthesamerow) For example, thediagramfor5 918 2 6 4 7 3is
n
°nsince
The numberofx’s isthenumberofinversions,sinceit iseasy to see thatbjisthe
numberof x’sincolumnj.Nowifwetranspose thediagram—interchangingrowsand columns wegetthediagramcorresponding to the inverse of theoriginal permutation Hencethenumberofx’s(thenumberof inversions)isthesameinbothcases.Rotheusedthis facttoprove that the determinant of amatrixisunchanged whenthe matrixistransposed
Theanalysis of several sorting algorithms involves theknowledgeofhow
manypermutations ofnelements have exactly k inversions Let us denote that
number byI(k)'.Table1 liststhefirstfew values ofthisfunction
Byconsidering the inversion tablebxb2. b, it isobvious that /„(0)=1,/„(!)=n—1,andthereisasymmetryproperty
n
)~k)=
Trang 335.1.1 INVERSIONS 15
Table1PERMUTATIONS WITHkINVERSIONS
nIn( 0)Ml)In{ 2) In (3)In( 4)/»(5) In( 6)In( 7)/n(8)In( 9)In (10)/n(ll)
isnotdifficultto see that the generating function
Gn(z)=In(0)+In (X)z+In (2)z2
+ (7)satisfiesGn(z)=(1+z+• +zn ~1
)Gn _i(z); henceithas the comparativelysimpleformnoticedbyO Rodrigues[J.deMath.4(1839), 236-240]:(1+2+• +2"-1
) (1+2)(1)=(1-2") (1-22)(1-Z )/(1-2)” (8)Fromthisgenerating function,wecaneasilyextend Table1,andwe canverifythat thenumbers belowthe zigzaglineinthat tablesatisfy
In(k)—In (k-1)+J„_i(fc), for k<n (9)(This relation does not hold above the zigzagline.)Amorecomplicated argu-
ment(seeexercise 14)showsthat,in fact,wehave the formulas
2—j)/ 2isa so-called “pentagonalnumber.”
IfwedivideGn(z)byn!wegetthe generating function gn(z)forthe
Trang 34Aremarkable discoveryaboutthe distribution of inversionswasmadeby
P.A.MacMahon[Amer.J.Math 35(1913), 281-322],Let us define the index
ofthepermutationai a2. a n as thesumofallsubscripts j such thata.j>a3+\,
1<j<n.For example, the index of591826473is2+4+6+8=20.Bycoincidence the indexisthesameasthenumberof inversionsin this case.Ifwe
listthe 24 permutations of {1,2,3, 4},namely
Permutation Index Inversions Permutation Index Inversions
Atfirstthis factmight appeartobe almost obvious, but further scrutiny
makesitvery mysterious.MacMahongaveaningenious indirect proof, asfollows:Let ind(aia2. a„)betheindex of thepermutationa\a2 a n,andlet
Trang 355.1.1 INVERSIONS 17definea one-to-one correspondencebetweenarbitrary n-tuples(qi,q2, qn)ofnonnegativeintegers,ontheonehand,andordered pairs of n-tuples
Thegenerating function z9l+92+’+9n
)summedoveralln-tuples oftiveintegers (qi,q2, ,qn), isQn(z)=1/(1—z)n;andthegenerating function
nonnega-£zpi+P2+-+p» summed overalln-tuples of integers (pi,p2 , •,Pn) such that
Pi>P2>' >Pn>0,
Pn(z)=1/(1-Z)( 1-Z2
) (1-z), (16)
asshowninexercise15.Inview of(15),the one-to-one correspondenceweare
aboutto establishwillprove thatQn(z)= Hn(z)Pn(z),thatis,
^Qanina stablemanner, wherea,ia2. a nisapermutationsuch that qa.=
qaj+1impliesa,j<aj+ Weset(pi,P2,. ,p„)=(qai,qa2, ,qan)andthen,for
1<j<n,subtract1fromeach of pi, p3foreach j such that a3>aj+1.We
stillhave Pi>p2>• >p„, because pjwasstrictlygreaterthan pJ+iwhenever
Conversely,wecaneasilygobackto(qltq2, qn)when01 a2 anand
(PiiP2) • Pn) are given.(See exercise17.) Sothe desired correspondence has
beenestablished,andMacMahon’sindextheoremhasbeenproved
D.FoataandM.P.Schiitzenberger discovered a surprising extension of
MacMahon’stheorem,about65 yearsafterMacMahon’soriginal publication:
Thenumberofpermutationsofnelements thathavek inversionsandindexlisthesameas thenumberthathavelinversionsandindexk.Infact,Foataand
Schiitzenbergerfound a simple one-to-one correspondencebetweenpermutations
ofthefirstkindandpermutations of the second(seeexercise25).EXERCISES
1.[10]Whatistheinversion table forthepermutation271845936?Whatmutation has theinversion table50121200?
per-2 [M20]IntheclassicalproblemofJosephus(exercise1.3.2—22),nmenare initiallyarrangedinacircle;themthmanisexecuted, thecirclecloses,and everymthmanis
Trang 3618 SORTING
5.1.1
°f{1,2 For example,when n=8andm =4 the orderis54613872(man1
is5thout, etc.);theinversion tablecorrespondingto thispermutationis36310010Give a simple recurrencerelation forthe elementsb1b2 bn oftheinversion table
mthe general Josephus problemfornmen,wheneverymth manisexecuted
3 [18]Ifthe permutation a1a2 ancorrespondstotheinversion table bi b2 bnwhatisthe permutationoia2 o„ that correspondstotheinversion table
(n-1-&i)(r 2—bo (0—b n)?
4 [20] Design an algorithmsuitable forcomputer implementation that constructsthe permutation a2 ancorrespondingtoa giveninversion table bxb2 b n satis-fying (3)- [Hint:Consider a linked-memorytechnique.]
5 [35]Thealgorithmof exercise4requiresan execution time roughly proportional
ton+ +• +6 nontypicalcomputers, andthisis©(n2
)on theaverage Is thereanalgorithm whose worst-case running timeissubstantially betterthan orderra 2?
6.[26]Design an algorithm that computes theinversion tablebib2 b ning toa given permutationa,a2 anof{l,2, ,n},wheretherunning timeis
correspond-essentiallyproportionaltonlogn ontypicalcomputers
7 [20] Several other kindsof inversion tablescan bedefined,correspondingtoagiven permutationehned «ia2 ,.a n of {1,2, n}, besidestheparticular table b2 bn
inthetext; in this exercisewewillconsider three other typesof inversion tablesthatarise in applications
LetCjbe thenumberof inversionswhosefirstcomponentisj,thatis,thenumber
ofelementstothertght of jthatare lessthanj.[Correspondingto (i)wehave thetable0 0014 2157;clearly0<e,<j.]LetBj=bajand C,=c0.
Showthat 0<Bj<jand0<Cj< n -j,for1<j<n;furthermoreshow
hat the permutationaia2 ancan be determined uniquelywheneithercic2 c
orBiB2 Bn orC\C2 Cnisgiven
8.[M2t ] Continuing the notationof exercise7,leta\ a'2 a'nbe theinverse of
he permutation axa2 .a,andletthe correspondinginversion tablesbeb\ b'2 b'n.
Ci c2 •c„,B1B2 Bn,and C, C'2 C'n.Findasmanyinteresting relations asyoucan between the numbersa,-,bhC] ,Bj, Cj,a'j, b), c',B),C'j.
9 [MSI]Provethat, inthe notationof exercise7,the permutation a1a2 a„isaninvolution (thatis, itsowninverse)ifand onlyifbj=Cjfor1<j<n
10 [HM20]ConsiderFig.1asa polyhedroninthreedimensions.Whatistheeterofthe truncated octahedron(thedistance between vertex 1234 and vertex 4321)
diam-if allofitsedges haveunitlength? '
Trang 375.1.1 INVERSIONS 19b)Conversely,letEbe anytransitivesubsetofT ={(a;,?/)
1<y< <n}whosecomplementE = T\Eisalso transitive.Prove that thereexistsapermutationnsuch thatE(n)=E
12.[M28] Continuing the notationofthe previousexercise,prove thatif 7Tiand7T2
arepermutationsandifEisthesmallest transitive setcontaining E(ni)UE(iV2 ),then
Eistransitive.[Hence,ifwesaymis“above”7t 2wheneverE(7Ti)CE(7 r 2 ),alattice
ofpermutationsisdefined; thereisaunique“lowest”permutation “above” two givenpermutations Figure1 isthelatticediagramwhenn=4.]
13.[M23]It iswellknownthathalfofthetermsintheexpansionofa determinanthave aplus sign,andhalfhave aminussign.Inother words, thereare just asmany
permutations with an evennumberofinversions aswithan odd number,when n>2.
Showthat,ingeneral,thenumberofpermutations having anumberof inversionscongruenttotmodulomisn!/m,regardless oftheintegert.whenevern > m
14.[M24] (F.Franklin.) Apartition ofnintokdistinctpartsisa representation
n=Pi+P2+• +Pk,wherepi> 2>• >Pk>0.Forexample, thepartitions of7into distinctpartsare7,6+1,5+2,4+3,4+2+1. Let fk{n)be thenumberofpartitions ofnintokdistinct parts;prove that Y.k(~l
)
k
fk(n)=0,unlessnhas theform(3j2±j)/2,forsomenonnegativeintegerj;inthelattercasethesumis(-1)+For example,when n=7thesumis-1 + 3-1 =1,and 7=(3•22
+2)/2.[Hint:Represent apartition asan arrayof dots,putting ptdotsintheithrow,for1<i<k.Find thesmallest jsuch that p3+i<pj—1,andencircletherightmost dotsinthefirst
j rows If j<pk,thesejdotscanusuallybe removed,tilted45°,and placedasanew(fc+l)strow.Ontheotherhandifj>pk ,thefcthrowofdotscanusuallybe removed,tilted45,and placedtotherightofthecircled dots.(See Fig.2.)This processpairsoffpartitionshaving an oddnumberofrows withpartitionshaving an evennumberofrows, inmostcases,soonlyunpairedpartitionsmust be consideredinthesum.]
Fig.2.Franklin’scorrespondence betweenpartitionswithdistinct parts.Note:Asa consequence,weobtainEuler’sformula
Trang 3820 SORTING 5.1.1
15.[MSS]Prove that(16)isthe generating functionforpartitions into atmostnparts;thatis,prove that thecoefficientof zminl/(l-z)(l-z2
) (l-z) isthe
numberofwaystowritem =pi+p2+• +p„ withpi>p2>• >p„>0.
[Hint:Drawing dotsas in exercise 14,showthatthereisa one-to-one correspondencebetween n-tuples(pi.p2 pn)such thatPi> 2> •> n>0and sequences(Pi, P2, P3, )such thatn Pi> P2>P3>>0,with the property that
Pi+P2H \-Pn=P\+P2+Ps-\ .Inother words,partitions into atmostnpartscorrespondto partitions intopartsnot exceedingn.]
16 [M25](L.Euler.)Prove thefollowing identitiesbyinterpretingbothsidesoftheequationsintermsof partitions:
18.[M30](T.Hibbard,CACM6(1963), 210.)Letn>0,and assume that a sequence
of 2nn-bit integersXq,. X2n ~ihasbeen generatedatrandom, where eachbitofeachnumberisindependently equalto1withprobability p Consider the sequence
Xo®0,Xi©1, X2n-i©(2n—1),where©denotes the“exclusive or”operation
on the binaryrepresentations.Thusifp=0,thesequenceis0,1, ,
19 [M28](C.Meyer.)When misrelativelyprimeto n,weknowthat the sequence
(mmodn)(2mmodn). ((n— l)mmodn)isapermutationof (1,2, n 1}.Show
that thenumberofinversions of thispermutation can be expressedintermsofDedekind
sums(seeSection3.3.3)
20.[M43]ThefollowingfamousidentityduetoJacobi[Fundaments NovaTheoriseFunctionum Ellipticarum(1829), §64]isthebasisofmanyremarkablerelationshipsinvolving elliptic functions:
Trang 395.1.1 INVERSIONS 21For example,ifwesetu—z,v=z2
,weobtainEuler’sformulaof exercise 14.Ifwesetz=\/u/v,q=y/uv,weobtain
n(l-g2fc"1
^)(l-q2A;-1^1
)(l-q2 ' t
)= £ (-1Tzn \k>l —oo<n<oo
Istherea combinatorial proofof Jacobi’s identity,analogoustoFranklin’sproof
ofthe specialcaseinexercise14? (Thuswewanttoconsider“complexpartitions”
m +ni=(pi+q\i)+(p2+92*)H 1-{pk+qki
represen-21.[M25] (G.D Knott.) Showthatthepermutation a\ a„isobtainable with
astack, inthe senseof exercise 2.2.1-5or 2.3 1-6,ifand onlyifCj<Cj+i+1for
1<j<ninthenotationof exercise7.
22.[M26 Given a permutationaio2 an of {1,2, , n},lethjbe thenumberofindicesi<jsuch that at6{aj+1, a.j+2, aJ+i } (Ifa]+\<aj,theelementsof thisset“wrap around” fromntol.Whenj=n weuse theset{a„+l,a„+2,. n}.)Forexample, thepermutation591826473leads to hi hg=00 1 214 6.
a)Prove thataio2 a can be reconstructed from the numbershih2 h.
b)Prove thathi+h2+• +h istheindexof oia2 a„
23.[M27] (Russianroulette.)Agroupofn condemnedmen whopreferprobabilitytheorytonumbertheorymight choosetocommitsuicidebysitting inacircleandmodifying Josephus’smethod(exercise 2) as follows:Thefirstprisonerholds agunand aimsitat hishead;withprobabilityp hediesandleavesthecircle Thenthesecondmantakesthegun and proceedsinthesameway Play continuescyclically,with constantprobabilityp 0, untileveryoneisdead
Leta,j=kifmankisthejthto die Prove that the death orderaio2 .a„occurswith aprobabilitythatisa function onlyof n, p,and the indexofthedualpermutation(n+1—a) (n+1—a2 )(n+1—ai).Whatdeath orderisleast likely?
24.[M26]Givenintegers f(l) t(2). t(n)witht(j)>j,thegeneralizedindexofapermutationaia2 a isthesumofallsubscriptsj such thataj>t(aj+1 ),plusthetotalnumberof inversionssuch thati<jandt(aj)>Oj>aj.Thuswhent(j)=j for
allj, thegeneralizedindexisthesameastheindex;butwhent(j)>nforalljit the
numberof inversions.Prove that thenumberofpermutationswhose generalized indexequalskisthesameasthenumberofpermutations having kinversions.[Hint:Showthat,ifwetakeany permutationai. an -i of {1,. n—1}andinsertthenumbern
inallpossible places,weincreasethegeneralizedindexby thenumbers{0,1, n—1}
insomeorder.]
25.[M30](FoataandSchiitzenberger.)Ifa=ai..a nisa permutation,letind(a)
beitsindex,andletinv(a)countitsinversions
a)Definea one-to-one correspondence that takes each permutationaof {1,. ,n}
toapermutation /(a) that has thefollowingtwoproperties: (i)ind(/(a))=inv(a);(ii)for1<j<n,thenumberjappearstotheleftof j+1inf(a)
Trang 4022 SORTING 5.1.1constructionassign tof(a)when a= 198263745?Forwhatpermutationaisf(a)= 198263745? [Hint:Ifn>1,writea=xiaix2a2 XkOtkan,where
Xi,. xkarealltheelements<a ifa\<a,otherwisex\,. Xkarealltheelements>a;theother elementsappearin(possiblyempty)strings ai,. afc
Comparethenumberof inversions ofh(a)=axxia2X2 ctkXk to inv(a); in thisconstructionthenumbera„ does not appearin h(a).]
b)Use /to defineanother one-to-one correspondence g having thefollowingtwoproperties: (i)ind(g(a))=inv(a);(ii)inv(g(a)) =ind(a) [Hint:Considerinversepermutations.]
26.[M25]Whatisthestatisticalcorrelation coefficientbetween thenumberof sionsand the indexofarandompermutation?(SeeEq.3.3.2-( ).)
inver-27.[M37]Provethat,inadditionto(15),thereisa simplerelationshipbetweeninv(oi 02. an)and the n-tuple(91,92, •9n)-Usethis factto generalizethederiva-tionof(17),obtaininganalgebraic characterization ofthebivariategenerating function
H„(w,z)= J2winV{ai“2•a n)
;z
ind(a1a2 an
)
where thesumisoveralln!permutations axa2 n
-28.[25] Ifaia2 a„isapermutationof {1,2, ,n},itstotaldisplacementis
defined tobe
1aJ~j\- Find upperand lower boundsfortotaldisplacement
intermsofthenumberof inversions
29.[28] If7 r=a\a2 a„andn'=a[a2 a'n arepermutationsof {1,2, , n},theirproduct7T 7r' is a'ai a'„ 2 a'an.Letinv(7 r)denote thenumberof inversions, as inexercise 25.Showthatinv(7 T 7 r')<inv(7r) -t-inv(Tr'),and thatequalityholdsifand only
if 7T 7 r' is“below”k' inthe senseof exercise 12.
*5.1.2.Permutationsof a Multiset
Sofarwehavebeendiscussingpermutations of asetofelements;thisisjustaspecial case ofthe concept of permutations of amultiset.(Amultisetislikeasetexcept thatitcan have repetitions of identical elements.Somebasic properties
ofmultisetshavebeendiscussedinexercise 4.6.3-19.)
Forexample, consider the multiset
M ={a, a, a, b, b, c,d, d, d,d}, (1)
whichcontains 3a’s,2b's,1c,and4d’s.We mayalsoindicate themultiplicities
ofelementsinanother way,namely
How manypermutations ofMare possible? Ifweregarded the elements
ofMasdistinct,bysubscriptingthemax ,a2,a3,bx ,b2 ,ci,dx ,d2 ,d3,d4,