88 SORTING 5.2.1 the average number of inversions in a random 2 -ordered permutation is

Một phần của tài liệu The art of computer programming volume 3 sorting and searching (second edition 2011) part 1 (Trang 106 - 110)

[n/2\

byStirling’sapproximationthisisasymptoticallyyV/128 n3/2ô0.15 n

3 / 2.The maximum numberof inversionsiseasilyseentobe

A"/2J +1\ 1 2 V 2 )~ 8”'

Itisinstructive tostudythe distribution of inversionsmorecarefully,by examining the generatingfunctions

hi{z)=1,

*2(z)= 1+2,

M-2)=l+2z, ^

hi(z)=1+3 z-\-z2-\-z3, ... ,

asinexercise15. Inthisway wefindthatthestandard deviationisalso proportionalton3/2,sothe distributionisnot extremelystableabout the mean.

Nowletusconsiderthegeneraltwo-passcase ofAlgorithm D,whenthe incrementsarehand1:

Theorem H. Theaveragenumberofinversions inan h-ordered permutation of{1,2,...,n}is

f(n,h)=2

2q~1 q! q\

(29+1)! (4)

whereq=|_n/hjandr— n modh.

ThistheoremisduetoDouglas H.Hunt[Bachelor’s thesis,Princeton University (April 1967)].Notethatwhenh> ntheformulacorrectly givesf(n,h)=1(”)

.

Proof.Anh-ordered permutationcontains r sortedsubsequencesoflengthq+1, and h-roflengthq.Eachinversioncomes from apair of distinctsubsequences, and a givenpair of distinctsubsequencesinarandomh-ordered permutation definesarandom2-ordered permutation. Theaveragenumberof inversions isthereforethesumoftheaveragenumberof inversionsbetween eachpair of distinctsubsequences,namely

A-2q+2 f2q + 2\

W+i)

+r{h—r) f(n,h). |

CorollaryH. Ifthesequence of increments ht_j,...,hx,h0satisfiesthe condition

h s+1modhs=0, fort-1>s>0, (5)

5.2.1 SORTING BYINSERTION 89

Fig. 12.Theaveragenumber, f(n,h),ofinversionsinan/i-orderedfileofnelements, shownforn=64.

thentheaveragenumberofmoveoperationsinAlgorithmDis

X(rs/(<7a+l,hs+1/hs)+(ha-rs)f(qs,hs+1/hs )), (6)

t>8>0

wherers= N modhs,qs= [N/hs \,ht=TV/i

f_i,and fisdefined in(4).

Proof. Theprocessof/t.,-sortingconsistsofastraightinsertionsortonrs (l>s+i/hs)-°rdered subfiles oflengthqs+1,and on(ha—rs )suchsubfilesof lengthqs.Thedivisibilityconditionimplies thateachofthese subfilesisaran-

dom(/is+i//i 3)-orderedpermutation,inthe sense thateach(/is+1//i s)-ordered permutationisequallylikely,sinceweareassumingthattheoriginalinputwas arandompermutationof distinct elements. |

Condition(5)inthiscorollaryisalwayssatisfiedfortwo-passshellsorts, whentheincrementsarehand1.Ifq=[N/h\ andr=TVmodh,thequantity

BinProgram Dwillhavean average valueof rf(q+1-N) + (h-r)f(q,N

)

+/(TV,fc)=I(ô+X

)+ +/(TV, h).

Toafirstapproximation, the function f{n,h)equals(0F/8)n3 / 2h1/2;wecan, forexample,compareittothesmoothcurveinFig.12when n —64.Hencethe running timeforatwo-passProgram Disapproximately proportionalto

2TV2/h+VttNVi.

Thebest choice ofhisthereforeapproximately ^/16N/ttsa1.72VN;and with thischoice ofhwegetan average running time proportionalto TV5 / 3.

90 SORTING 5.2.1 Thus wecanmakeasubstantialimprovementover straight insertion,from 0(N2)to0(N1'667),justby usingshellsortwithtwo increments. Clearlywe cando evenbetterwhen moreincrementsare used. Exercise18discussesthe optimumchoice ofht-1,..

.,howhentisfixedandwhentheh’sareconstrained bythe divisibility condition; therunning timedecreases to0(N15+e/ 2),where e=1/(2*—1),forlargeN. Wecannot break theN15barrierby using the formulas above,sincethelastpassalwayscontributes

f{NM)-{V^im3/2h\/2

inversionstothesum.

But ourintuitiontellsusthatwecando evenbetterwhentheincrements /it_ i,...,ho do notsatisfythe divisibilitycondition

(5).Forexample,8-sorting followedby4-sortingfollowedby2-sortingdoes notallowanyinteractionbetween keysinevenandoddpositions;thereforethefinal1-sortingpassisinevitably facedwith0(1V3 / 2

)inversions,on theaverage.Bycontrast,7-sortingfollowed by5-sortingfollowedby3-sortingjumblesthingsupinsuch awaythatthefinal 1-sortingpasscannot encountermorethan2Ninversions! (See exercise 26.) Indeed,anastonishingphenomenonoccurs:

Theorem K. Ifa k-orderedfileish-sorted,itremainsk-ordered.

Thusafilethatisfirst7-sorted,then5-sorted,becomes both 7-ordered and 5-ordered.Andifwe3-sortit,the resultisorderedby7s, 5s,and3s.Examples of thisremarkable property can be seeninTable4on page85.

Proof.Exercise 20showsthatTheorem Kisaconsequenceof the followingfact:

LemmaL. Letm,n,rbenonnegativeintegers,andlet(aq,..

.,xm+r)and

(y1,...,yn+r)beany sequences ofnumberssuchthat

Vl— ^m+1) V2fsxm+2i •••t Vr5)%m+r- (7) Ifthe x’sandy’saresortedindependently,so thatX\< •<xm+randiq<

••<y„+r,the relations

(7)willstillbevalid.

Proof.Allbutmofthex’sareknowntodominate(thatis,tobegreaterthan orequalto)somey,wheredistinct x’sdominatedistincty’s. Let1<j<r.

Sincexm+jaftersortingdominatesm +j ofthex’s, itdominatesat least j of they’s;thereforeitdominates thesmallest j ofthey’s;hencexm+j>yjafter sorting. | |

Theorem Ksuggests thatitisdesirable to sortwithrelativelyprimeincre- ments,butitdoes notlead directly toexact estimatesofthenumberofmoves madeinAlgorithm D. Moreover,thenumberofpermutationsof{l,2,...,n}

that arebothh-orderedand/(-orderedisnot always adivisor ofn!,sowecansee thatTheorem Kdoes nottellthewholestory;somek-and/i-orderedfilesare obtainedmoreoftenthanothers after k-and/i-sorting.Thereforethe average- case analysis ofAlgorithmDforgeneralincrements ht-1,...,ho hasbaffled everyoneso farwhent>3.Thereisnotevenan obviouswayto find theworst

5.2.1 SORTING BYINSERTION 91 case,when Nand(ht-i,

,

fto)are given. Wecan,however,derive several factsabouttheapproximatemaximumrunningtimewhentheincrementshave certainforms:

TheoremP.Therunning time ofAlgorithmDis0(N3 / 2),whenhs=2S+1—1 for0<s<t—[lg1VJ

.

Proof. Itsufficestobound Bs,thenumberofmovesinpasss,insuchaway thatBt_i+•••+Bo— 0(N3 / 2).Duringthefirstt/2 passes, fort>s>t/2, we mayusetheobviousbound Bs—0(/is(Ar//is )

2

);andforsubsequentpasses we mayuse the result of exercise 23,Bs= 0(Nhs+ 2h s+ i/hs). Consequently

£f_i+•••+ Bo = 0(N{2 +22+•••+2*/2+24/ 2+•••+2))=0(1V3 / 2). |

ThistheoremisduetoA.A.Papernovand G. V.Stasevich,Problemy PeredachiInformatsii1,3(1965), 81-98.Itgivesan upperboundonthe worst- caserunning timeofthealgorithm, notmerely aboundontheaveragerunning time. Theresultisnottrivialsincethemaximumrunning timewhentheft’s satisfythedivisibilityconstraint

(5)isoforderN2;andexercise24showsthat theexponent 3/2 cannot belowered.

AninterestingimprovementofTheorem PwasdiscoveredbyVaughanPratt in1969: Iftheincrementsarechosentobethe setofallnumbersoftheform2p3q thatare lessthanN,therunning time of AlgorithmDisoforderN(logN)2.In thiscasewecanalsomakeseveralimportantsimplificationstothe algorithm; see exercises30and31.However, even withthese simplifications, Pratt’smethod requiresasubstantialoverhead becauseitmakesquite afewpasses overthedata.

Thereforehisincrementsdon’t actually sort fasterthan thoseofTheorem Pin practice, unlessNisastronomicallylarge.Thebestsequencesforreal-worldN

appearto satisfyhsôps,wherethe ratiopssh s+ i/hsisroughlyindependent ofsbutmaydepend onN.

Wehave observedthatitisunwisetochooseincrementsinsuch awaythat eachisadivisor ofall itspredecessors;butweshould notconcludethatthe best incrementsare relativelyprimetoallof theirpredecessors.Indeed,everyelement ofafilethatisgrft-sortedandgfc-sortedwithft_Lkhasatmost|(ft—1)(k—1) inversionswhen weare gr-sorting. (See exercise 21.) Pratt’ssequence{2P 39} winsasN —00byexploiting this fact,butitgrows tooslowly for practical use.

JanetIncerpiand Robert Sedgewick[J.Comp.Syst.Sci.31(1985),210-224;

see alsoLectureNotesinComp.Sci.1136(1996), 1-11]havefound awaytohave thebest ofbothworlds,by showinghowtoconstructasequenceofincrements forwhich hssapsyeteachincrementisthegcdoftwoofitspredecessors.Given anynumberp>1,theystartbydefiningabasesequencea±,a2,

.. .

,

where is the least integer>pksuchthata3J_ a*,for1<j<k.Ifp—2.5,forexample, thebasesequenceis

aua2,a3,...—3,7,16,41, 101, 247, 613, 1529, 3821, 9539, ....

Nowtheydefinetheincrementsbysettingfto=1and for(])<<<('/)

h' g

h' g

“pCLy* (8)

Một phần của tài liệu The art of computer programming volume 3 sorting and searching (second edition 2011) part 1 (Trang 106 - 110)

Tải bản đầy đủ (PDF)

(412 trang)