A REALTIME SOFTWARE SOLUTION FOR RESYNCHRONIZING FILTERED MPEG2 TRANSPORT STREAM

In short, the distance between original time stamps will be changed non-uniformly when the video frames are resized, and decoders will fail to reconstruct the encoding clock from the res

Trang 1

A REALTIME SOFTWARE SOLUTION FOR RESYNCHRONIZING FILTERED MPEG2

TRANSPORT STREAM

Bin Yu, Klara Nahrstedt

Department of Computer Science University of Illinois at Urbana-Champaign DCL, 1304 W Springfield, Urbana IL 61801

binyu, klara@cs.uiuc.edu

ABSTRACT

With the increasing demand and popularity of multimedia

streaming applications over the current Internet,

manipulat-ing MPEG streams in a real-time software manner is

gain-ing more and more importance In this work, we

stud-ied the synchronization problem that arises when a gateway

changes the data content carried in an MPEG2 Transport

stream In short, the distance between original time stamps

will be changed non-uniformly when the video frames are

resized, and decoders will fail to reconstruct the encoding

clock from the resulting stream We propose a cheap

soft-ware real-time approach to solve this problem, which

basi-cally reuses the original time stamp packets and adapts their

distance to accommodate the changes in bit rate

Experi-mental results from a real-time HDTV stream filter shows

that our approach is correct and efficient

1 INTRODUCTION

Video streaming is gaining more and more attention from

both the academy and industry world, and primarily 3 things

are behind this popularity: a widely-accepted video

com-pression standard – MPEG2 [4], a widely available

Inter-net with high bandwidth becoming commonplace, and an

ever-growing user demands for the more easily understood

visual presentation of information Beyond simply sending

the video content, people are working on adapting the

con-tent at intermediate gateways before it reaches the client,

ei-ther to tackle heterogeneity in resource availability or to

in-crease client customization and interaction Example

proto-type systems include ProxyNet [1], IBM Transcoding proxy

[8], UC-Berkeley TranSend [5] and Content Service

Net-work [10] There could be many kinds of of video editing

services, such as watermarking, frequency-domain low-pass

filtering, frame/color dropping, external content embedding

[11] and so on

As we focus on the case of HDTV streaming, the

prob-lem of streaming vs decoding becomes obvious On the

one hand, the Internet is bringing to end hosts video streams above 10Mbits thanks to the technologies like IP multicast

on the mBone, Fast Ethernet [9] and Gigabit Ethernet [3]

in office buildings and xDSL [2] and Cable modem [6] at home On the other hand, PCs and even gateway servers are still not able to decode or do non-trivial video manipu-lation on the high volume HD streams in real-time for lack

of enough computing power and real-time support For ex-ample, using ordinary 30frame per second HDTV stream at 18Mbits, a 100M local area network could afford 4 or 5 high definition video conference sessions in the office building, but even the most advanced desktop computer could only decode and render two frames per second Also, the PC monitor could never match the great experience rendered

by TV screens and big video walls, and in many situations the high definition video needs to be shown on large screens for a large audience

In such a situation, we propose to combine the software video delivery channel with the hardware decoding/rendering interfaces by using desktop PCs to receive/process the HD video streams and then feed the resulting streams into a hardware decoding board For example, in [11], we pre-sented how we implemented software real-time Picture-in-Picture for HDTV streams in this way However, one key problem we are facing is that hardware decoding boards rely

on the time-stamps contained in MPEG2 Transport Layer Streams to maintain their hardware clock, while almost all software editing operations would compromise these time-stamps This problem has to be solved before any similar software video manipulations could be applied to HD video streams, and in this paper we will present our solutions to

it Our solutions are cheap in the sense that it is simple and easy to implement, and no hardware real-time support

is necessary This way, it could be adopted by desktop PCs

or intermediate gateways servers with minimum extra cost This paper is organized as follows: In section 2, we will briefly introduce how the synchronization between MPEG encoder and decoder works according to the MPEG2 stan-dard and the problem of re-synchronization that arises after

Trang 2

Figure 1: Synchronization between the encoder and decoder

video editing operations Our solution is then discussed in

detail in section 3 and experiment results follow in section

4 Finally we discuss some related work and conclude this

paper in section 5 and 6

2 THE SYNCHRONIZATION PROBLEM

In this section, we will first briefly review how the

times-tamp encoded in MPEG2 Transport stream is used by the

decoder to reconstruct the encoder’s clock, and then we

in-troduce what kind of video editing system we are focusing

on and how it affects the synchronization between the

en-coder and the deen-coder

2.1 The MPEG2 Transport Layer Stream Timestamps

Figure 1 shows how MPEG2 Transport streams manage to

maintain synchronization between the sender, which encodes

the stream, and the receiver, which decodes it As the

ele-mentary streams carrying video and audio content are

pack-etized, their target Decoding Time Stamp (DTS) and

Pre-sentation Time Stamp (PTS) are determined based on the

current sender clock and inserted into the packet headers

For video streams, the access unit is a frame, and both DTS

and PTS are given only for the first bit of each frame

af-ter its picture header These time stamps are laaf-ter used by

the decoder to control the timing at which it starts to do

decoding and presentation For example, if at time

s

an encoded frame comes to the multiplexing stage at the

sending side and the encoder believes (based on calculation

using predefined parameters) that the decoder should begin

to decode this frame

s after it receives it and output the decoded frame s thereafter Assuming the decoder could

reconstruct the encoder clock and the time it receives this

frame would also be

, then the DTS should be set to

s and the PTS

s After that, as all of these packetized elementary stream packets are further multiplexed together, the final stream is time-stamped with Program Clock Ref-erence(PCR), which is given by periodically sampling the encoder clock This resulting transport layer stream is then sent over the network to the receiver, or stored in storage devices for the decoder to read in the future As long as the delay the whole stream experiences remains constant from the receiver’s point of view, the receiver should be able to reconstruct the sender’s clock that has been used when the stream was encoded The accuracy and stability of this re-covered clock is very important, since the decoder will try

to match the PTS and DTS against this clock to guide its decoding and displaying activities

Figure 2: MPEG2 Transport Stream Syntax Knowing the general idea in timing, we now introduce how the Transport Layer syntax works, as shown in Figure

2 All sub-streams (video, audio, data and time stamps) are segmented into small packets of constant size (188 bytes), and the Packet ID(PID) field in the 4-byte header of each packet tells which sub-stream that packet belongs to The PCR packets are placed at constant intervals, and they form

a running time line along which all other packets are posi-tioned at the target time point On this time line, each 188-byte packet occupies one time slot, and the exact time stamp

of each packet/slot could be interpolated using neighboring PCR packets Data packets arrive and are read into the

Trang 3

de-Figure 3: Layered coding scheme of MPEG-2 Transport Stream

coder buffer at constant rate, and this rate can be calculated

by dividing the number of bits between any two

consecu-tive PCR packets by the time difference between their time

stamps In other words, if the number of packets between

any two PCR packets remains constant, then the difference

between their time stamps should also be constant In an

ideal state, packets are read into the decoder at the constant

bit rate, and whenever a new PCR packet arrives, its time

stamp should match exactly with the receiver clock, which

confirms the decoder that so far it has successfully

recon-structed the same clock as the encoder However, since PCR

packets may have experienced jitter in network transmission

or storage device accessing before they arrive at the receiver,

we can not simply set the receiver’s local clock to be the

same as the time stamp carried by the next incoming PCR no

matter when it comes To smooth out the jitter and maintain

a stable clock with a limited buffer size at the receiver,

gen-erally the receiver will resort to some smoothing technique

like the Phase-Locked-Loop(PLL) [7] to generate a stable

clock from the jittered PCR packets PLL is a feedback

loop that uses an external signal(the incoming PCR

pack-ets in our case) to tune a local signal source(generated by

a local oscillator in our case) to generate a relatively more

stable result signal(the receiver’s reconstructed local clock

in our case) So long as the timing relation between PCR

packets is correct, the jitter can be smoothed out with PLL

2.2 HDTV Stream Editing/Streaming Test Bed

In the following discussion, we will base our discussion on a video editing/streaming test bed as shown in Figure 3 Live High definition digital TV stream from the satellite or the

HD storage device is feed into the server PC, which then encodes it into MPEG2 Transport stream and multicasts this stream over the high speed Local Area Network Players on the client PC’s join this multicast group to receive the HD stream, and then feed this stream into the decoding board The decoded analogue signal is then sent to the wide-screen

TV for display Our filter receives this stream in the same way as a normal player, and performs various kinds of video editing operations on this stream in real time, such as low pass filtering, frame/color dropping and visual information embedding [11] There could be multiple editing operations done to the same stream in a chain, and the resulting streams

at all stages are available to clients through other multicast groups

2.3 How Video Editing Affects Clock Reconstruction

Since the timing and spacing of PCR packets are very im-portant for clock reconstruction, it is obvious that video editing operations will cause malfunctions since it changes both

First, all intermediate operations, a video stream goes

Trang 4

through before it reaches the decoder, contribute to the

de-lay and jitter of the PCR sub-stream Different filtering

op-erations, such as low pass filtering and Picture-In-Picture,

or even the same operation, take varying processing time to

do the necessary calculation for different frames or

differ-ent parts of the same frame In compensation, traditional

solutions would either try to adjust the resulting stream at

each intermediate point or push all the trouble to the final

client The former solution would suffer from the fact that

processing time for different operations/frames tends to be

quite different and varying, which makes it very hard to find

a local optimal answer The latter solution implies that the

client needs to have a very large buffer and a long waiting

time because of the unpredictable delay and jitter of the

in-coming stream We will see later how our solutions solve

this problem by utilizing the inherent PCR time stamps of

the streams

The second problem of changing spacing between PCR

packets is even more intractable As we said above, each

ac-cess unit (video frame or audio packet) should be positioned

within the time line formed by the PCR sub-stream If a

video frame arrives at the receiver at its destined time point,

the decoder would be able to correctly schedule where and

how long to buffer it before decoding it However, normally

after the filtering operations, a video frame becomes smaller

or larger It takes less or more packets to carry, and so its

following frames are dragged earlier or pushed later along

the time line In such circumstances, if we keep both the

time stamp and the spacing of the PCR packets unchanged,

then the receiver’s clock can still be correctly reconstructed,

but the arriving time of each frame will be skewed along

the time line For example, if the stream is low pass

fil-tered, then every frame becomes shorter, and so following

frames are dragged forward to pack up the vacancy spared

out If the decoder still reads in data at the original speed, it

feels that more and more future frames begin to come earlier

and earlier Since they are all buffered until their stamped

time for decoding, the buffer will be overflowed in the long

run no matter how large it is The fundamental problem is

that after the filtering, the actual bit rate becomes lower or

higher, but the data is still read in by the decoder at the

orig-inal rate since the timing and spacing of PCR packets are

not changed So if the new rate is lower, more and more

future frames are read in by the decoder, causing the

receiv-ing buffer for the network connection to be emptied while

the decoder’s decoding buffer is overflowed; on the other

hand, if the new rate is higher, then at some point in the

fu-ture, the data will remain in the receiving buffer and not be

read in by the decoder even at its decoding time

3 OUR SOLUTIONS

To solve the problems described above, an immediate thought would be to do the same kind of clock reconstruction as the decoder does at the filter, and then re-generate the PCR packets to reflect the changes at the filter output However,

we know that the smoothing mechanisms like PLL are im-plemented in hardware circuits containing a voltage con-trolled oscillator that generates high frequency signals to be tuned with the incoming PCR time stamps This is not easy,

if not impossible, to be done in software on computers with-out real-time support in hardware Therefore, a pure soft-ware mechanism that does not require hardsoft-ware real-time support would enable us to distribute the video editing ser-vice across the network to any point on the streaming path Another goal is to achieve a cheap and efficient solution that could be easily implemented and carried out by any com-puter with modest CPU and memory resource available The key idea behind our solution comes from the ob-servation that the DTS and PTS are only associated with the beginning bit of each frame Consequently, so long as

we manage to fix that point to the correct position on the time line, the decoder should be working fine even if the re-maining bits of that frame following the starting point are stretched shorter or longer

3.1 Simple Solution: Padding

Following the discussion above, we have designed a simple solution that works for bit rate reducing video editing op-erations We do not change the timestamp and the position

of any PCR packet along the time line within the stream, and we also preserve the position of the frame header and

so that of the beginning bit of every frame What is changed here is the size of each frame in terms of number of bits, and we just pack the filtered bits of a frame closely follow-ing the picture header Since each frame takes less 188-byte packets to carry, yet the frame headers are still positioned at their original time points, we can imagine that there would

be some “white space” left between the last bit of one frame and the first bit of the header of the next frame Actually the capacity of this space is the same as the reduction in the number of bits used to encode this frame as a result of the video editing operations, and we can simply pad this space with empty packets (NULL packets)

This solution is very simple to understand and imple-ment, and it preserves the timing synchronization, since we only need to pack the filtered bits of each frame continu-ously after the picture header and then insert NULL packets until the header of the next frame No new time stamps need

to be generated in real-time, and the bit rate remains stably

at the original rate However, it inevitably has some draw-backs First, it can only handle bit rate reduction operations

We only try to fix the header of each frame to its original

Trang 5

position on the time line, which means the changed frames

should not occupy more bits larger than the distance

be-tween the current frame header and the next This property

does not always hold, since some filtering operations like

information embedding and watermarking may increase the

frame size in bits Secondly, the saved bits are padded with

NULL packets to maintain the original constant bit rate and

the starting point of each frame, and this ironically runs

counter to our initial goal of bit rate reduction for some

op-erations like low pass filtering and color/frame dropping

The resulting stream contains the same number of packets

as the original one The only difference is that the

num-ber of bits representing each frame has been shrunk, yet this

saving is spent immediately by padding NULL packets at

the end of each frame

Here we want to mention that there does exist another

approach to bypass the second problem Up to now we have

been using a filter model that is transparent to the client

player, which confines us strictly to the MPEG2 standard

syntax However, if some of the filtering intelligence is

ex-ported to the end hosts, then some saving can be expected

For example, instead of inserting NULL packets, we may

compress them by insert only a special packet saying the

next packets should be NULL packets At the end host, a

stub proxy could be watching this incoming stream, and on

seeing this packet, it replaces this packet with the supposed

amount of padding packets before sending the stream to the

client player Note that this padding is important to maintain

correct timing, especially if the client is using some

stan-dard hardware decoding board This way, the bandwidth

is indeed saved, but at the price of relying on non-standard

protocol outside MPEG2 Of course, this will in turn

intro-duce problems associated with non-standardized solutions,

such as difficulty in software maintenance and upgrading

Therefore, we only consider this as a secondary choice, and

not as a major solution

Figure 4: Example: 2/3 shrinking

3.2 Enhanced Solution: Time-Invariant Bitrate

Scal-ing

To ultimately solve the synchronization problem, a more

general algorithm has been designed The key insight

be-hind it is that we can change the bit rate to another constant

value while preserving the PCR time stamps by changing

the number of packets between any PCR pair to another constant value This way, we can scale the PCR packets’ distance and achieve a fixed new bit rate, as if the time line

is scaled looser or tighter to carry more or less packets, yet

we do not need to re-generate new PCR time stamps which relies on hardware real-time support All non-video stream packets can be simply mapped to the new position on the scaled output time line that corresponds to the same time point as on the original input time line In case that no exact mapping is available because the packets are aligned at unit

of 188-byte, we could simply use the nearest time point on the new time line without introducing any serious problem For video stream, the same kind of picture header fixing and frame data packing are conducted as in the first solution but

in a scaled way

An example of shrinking the stream to its 2/3 band-width is given in Figure 4 All non-video packets and video packets that carry picture headers are mapped to their corre-sponding position on the new time line, and so their distance

is also shrunk to 2/3 of the original After video editing operations, the resulting video packets are packed closely and as early as possible within the new stream following the frame header Intuitively, the filtered video data is squeezed into the remaining space between all non-video packets and picture header packets For example, if in the input stream, packet 6 is a frame header, and packet 9 is an audio packet, and packet 7, 8, 10 through 24 are video data from the same frame of packet 6 After 2/3 shrinking, packet 6 is posi-tioned in slot 4, and packet 9 goes to slot 6 The other video data packets are processed by a video editing filter, and the resulting bits are packed again into packets of 188-byte each Therefore, all empty slots, such as slot 5 and 7 through 16, are used to carry the resulting bits If the filter shrinks the video frame to occupy less than 2/3 of its origi-nal number of bits, then the new slots should be enough to carry the resulting frame

This algorithm is also very simple to implement For each non-video packet, its distance (in number of packets) from the last PCR packet is multiplied by a scaling factor

, and the result is used to set the distance between this packet and the last PCR packet in the output stream For video frames, the header containing DTS and PTS is scaled and positioned in the same way, and the remaining bits are closely appended to the header in the result stream Note that when

is set to always be 1, this reduces to the simple solution above

Now the only problem is how to determine

for a spe-cific streaming path If we shrink the time line too much and for some frames the bit rate reducing operation does not have a significant effect, then again we will not have enough space to squeeze in this frame, which will push the beginning bit of the next frame behind schedule On the other hand, if we shrink the time line too little or expand (

Trang 6

Figure 5: Result of time line scaling

) it too much, then more space will be padded using

NULL packets to preserve important time points, leading to

a waste of bandwidth There exists one optimal scale

fac-tor

that can balance these two strengths if it fulfills the

condition that

the filtered frame data will always be squeezed into

the scaled stream;

the number of NULL packets for padding purpose is

minimum

However, this optimal scale factor is hard to estimate in

ad-vance since for different operations with various parameters

have quite varying effect on distinct video clips in terms of

bit rate changing Therefore, in our current implementation,

we simply use a slightly exaggerated scale factor based on

the operation type and parameters For example, for low

pass filtering with a threshold of 5, a scaling factor of 0.80

will work almost for all streams Even if we meet a frame

that still occupies more than 0.9 number of packets after

the filtering, only the next few frames may be slightly

af-fected Since a smaller-than-average frame is expected to

follow shortly, this local skew can be absorbed by the

de-coder easily and does not have any chain effect

Our next step will be looking into how to “learn” this optimal scale factor by analyzing history bit rate change of

a stream and adjust this factor

on the fly It is not speci-fied by the MPEG standard how a decoder, especially hard-ware decoding board, should react if the incoming stream changes from one constant bit rate to another, and it is also

an open question how quickly it would adapt to the new rate

4 EXPERIMENTAL RESULT

Figure 5 shows the effect of the time line scaling approach for a Low Passing Filter with threshold 5 Each point on the

x axis represents an occurrence of a PCR packet, and the y axis shows in three colors how many video packets, NULL packets or packets for other data stream are in between each two PCR packets We can see that the distribution of the three areas is kept almost constant for the original frame except for more NULL packets at the end of a frame How-ever, without scaling, the number of video packets varies across different PCR intervals and a lot of extra space is padded with NULL packets as shown in the upper right sub-figure On the other hand, if we do scaling with a scaling factor of 80%, then the padding occurs mostly only at the end of frames and the stream contains mostly only useful

Trang 7

LP (10) LP (5) PIP Original BR (Mbps) 18.0 18.0 18.0

Average resulting BR (Mbps) 15.45 13.71 19.04

Average relative change 0.86 0.76 1.06

Suggested

0.90 0.80 1.10 Table 1: Final Statistics

data

One thing we need to point out here is that the skew of

video access units along the time line still exists with this

scaling approach What happens is that after the filtering

operation, each frame shrinks to a size mostly less than 80%

of its original size If we mask out all other packets, we can

see that in the video stream, frames are packed closely one

after another If one frame takes more space than its share,

then the next frame may be pushed behind its time point, but

this skew will be compensated later by another frame with

a larger shrink effect As we said before, this kind of small

jitter around the exact time point on the scaled time line

is acceptable, and it is the change in the bit rate at which

the decoder reads in the data that fundamentally makes our

scaling algorithm able to solve the problem

Another experiment is done on how to determine the

scaling factor

for a particular kind of video editing

oper-ation Three kinds of operations are tested: Low Pass

Fil-tering with threshold being 10 and 5 and Picture-in-Picture

The original stream is a HD stream “stars1.mpg” with bit

rate 18 Mbps in MPEG2 Transport Layer format The

em-bedded frame used for Picture-In-Picture is a shrunk

ver-sion of another HD stream “football1.mpg” Since the

con-tent of this stream is more condense than the background

stream (i.e., more DCT coefficients are used to describe

each block), it is expected that the bit rate will increase

after this Picture-In-Picture operation The final statistics

are shown in Table 1 Experiment results have shown that

with the suggested scaling factor based on real world

statis-tics, our time-invariant scaling algorithm could successfully

solve the synchronization problem

5 CONCLUSION

In this paper, we focus on the scenario of streaming HD

video in MPEG2 Transport Layer syntax with software

stream-ing/processing and hardware decoding, which would be

com-monplace until the processing power of personal computers

becomes strong enough to cope with high bandwidth/definition

video streams We have studied an important problem that

decoders may lose synchronization and fail to reconstruct

the encoder’s clock because of video editing operations on

the streaming path We have proposed two solutions to

this problem, both based on the idea of reusing the original

time stamp packets (PCR packets) and adjusting the

num-ber of packets between them to reflect the changes in bit rate caused by video editing operations Experimental re-sults have shown that our solutions are efficient and work fine without any requirement on real-time support from the system

As far as we know, our work is among the first efforts

in promoting real-time software filtering of High Definition MPEG2 streams, and can be beneficial to many real-time applications that work with MPEG2 system streams like HDTV broadcast

6 ACKNOWLEDGMENT

This work was supported by the NASA grant under contract number NASA NAG 2-1406, National Science Foundation under contract number NSF CCR-9988199 and NSF CCR

0086094, NSF EIA 99-72884 EQ, and NSF EIA 98-70736 Any opinions, findings, and conclusions or recommenda-tions expressed in this material are those of the authors and

do not necessarily reflect the views of the National Science Foundation or NASA

7 REFERENCES

[1] ProxiNet http://www.proxinet.com.

[2] Emerging high-speed xDSL access services:

architec-tures, issues, insights, and implications IEEE Com-munications Magazine , Volume: 37 Issue: 11 , Nov.

1999, Page(s): 106 -114, 1999.

[3] Gigabit Ethernet Circuits and Systems, 2001 Tutorial Guide: ISCAS 2001 The IEEE International Sympo-sium on , 2001, Page(s): 9.4.1 -9.4.16, 2001.

[4] I I S 13818 Generic coding of moving pictures and associated audio information 1994

[5] Y C A Fox, S.D Gribble and E Brewer Adapting

to network and client variation using active proxies:

lessons and perspectives IEEE Personal Communica-tion, Vol 5, No 4, pp 10C19, 1998.

[6] A Dutta-Roy An overview of cable modem

technol-ogy and market perspectives IEEE Communications Magazine , Volume: 39 Issue: 6 , June 2001, Page(s):

81 -88, 2001.

[7] C E Holborow Simulation of Phase-Locked Loop for

processing jittered PCRs ISO/IEC JTC1/SC29/WG11, MPEG94/071, 1994.

[8] R M J Smith and C Li Scalable multimedia

deliv-ery for pervasive computing ACM Multimedia 1999,

1999

Trang 8

[9] J Spragins Fast Ethernet: Dawn of a New Network

[New Books and Multimedia] IEEE Network , Vol-ume: 10 Issue: 2 , March-April 1996, Page(s): 4,

1996

[10] B S W Y Ma and J Brassil Content Services

Net-work: The Architecture and Protocols Proceedings of the Sixth International Workshop on Web Caching and Content Distribution, 2001.

[11] B Yu and K Nahrstedt A Compressed-Domain Vi-sual Information Embedding Algorithm for MPEG2

HDTV Streams ICME 2002, 2002.

Định dạng
Số trang	8
Dung lượng	137,87 KB