In short, the distance between original time stamps will be changed non-uniformly when the video frames are resized, and decoders will fail to reconstruct the encoding clock from the res
Trang 1A REALTIME SOFTWARE SOLUTION FOR RESYNCHRONIZING FILTERED MPEG2
TRANSPORT STREAM
Bin Yu, Klara Nahrstedt
Department of Computer Science University of Illinois at Urbana-Champaign DCL, 1304 W Springfield, Urbana IL 61801
binyu, klara@cs.uiuc.edu
ABSTRACT
With the increasing demand and popularity of multimedia
streaming applications over the current Internet,
manipulat-ing MPEG streams in a real-time software manner is
gain-ing more and more importance In this work, we
stud-ied the synchronization problem that arises when a gateway
changes the data content carried in an MPEG2 Transport
stream In short, the distance between original time stamps
will be changed non-uniformly when the video frames are
resized, and decoders will fail to reconstruct the encoding
clock from the resulting stream We propose a cheap
soft-ware real-time approach to solve this problem, which
basi-cally reuses the original time stamp packets and adapts their
distance to accommodate the changes in bit rate
Experi-mental results from a real-time HDTV stream filter shows
that our approach is correct and efficient
1 INTRODUCTION
Video streaming is gaining more and more attention from
both the academy and industry world, and primarily 3 things
are behind this popularity: a widely-accepted video
com-pression standard – MPEG2 [4], a widely available
Inter-net with high bandwidth becoming commonplace, and an
ever-growing user demands for the more easily understood
visual presentation of information Beyond simply sending
the video content, people are working on adapting the
con-tent at intermediate gateways before it reaches the client,
ei-ther to tackle heterogeneity in resource availability or to
in-crease client customization and interaction Example
proto-type systems include ProxyNet [1], IBM Transcoding proxy
[8], UC-Berkeley TranSend [5] and Content Service
Net-work [10] There could be many kinds of of video editing
services, such as watermarking, frequency-domain low-pass
filtering, frame/color dropping, external content embedding
[11] and so on
As we focus on the case of HDTV streaming, the
prob-lem of streaming vs decoding becomes obvious On the
one hand, the Internet is bringing to end hosts video streams above 10Mbits thanks to the technologies like IP multicast
on the mBone, Fast Ethernet [9] and Gigabit Ethernet [3]
in office buildings and xDSL [2] and Cable modem [6] at home On the other hand, PCs and even gateway servers are still not able to decode or do non-trivial video manipu-lation on the high volume HD streams in real-time for lack
of enough computing power and real-time support For ex-ample, using ordinary 30frame per second HDTV stream at 18Mbits, a 100M local area network could afford 4 or 5 high definition video conference sessions in the office building, but even the most advanced desktop computer could only decode and render two frames per second Also, the PC monitor could never match the great experience rendered
by TV screens and big video walls, and in many situations the high definition video needs to be shown on large screens for a large audience
In such a situation, we propose to combine the software video delivery channel with the hardware decoding/rendering interfaces by using desktop PCs to receive/process the HD video streams and then feed the resulting streams into a hardware decoding board For example, in [11], we pre-sented how we implemented software real-time Picture-in-Picture for HDTV streams in this way However, one key problem we are facing is that hardware decoding boards rely
on the time-stamps contained in MPEG2 Transport Layer Streams to maintain their hardware clock, while almost all software editing operations would compromise these time-stamps This problem has to be solved before any similar software video manipulations could be applied to HD video streams, and in this paper we will present our solutions to
it Our solutions are cheap in the sense that it is simple and easy to implement, and no hardware real-time support
is necessary This way, it could be adopted by desktop PCs
or intermediate gateways servers with minimum extra cost This paper is organized as follows: In section 2, we will briefly introduce how the synchronization between MPEG encoder and decoder works according to the MPEG2 stan-dard and the problem of re-synchronization that arises after
Trang 2Figure 1: Synchronization between the encoder and decoder
video editing operations Our solution is then discussed in
detail in section 3 and experiment results follow in section
4 Finally we discuss some related work and conclude this
paper in section 5 and 6
2 THE SYNCHRONIZATION PROBLEM
In this section, we will first briefly review how the
times-tamp encoded in MPEG2 Transport stream is used by the
decoder to reconstruct the encoder’s clock, and then we
in-troduce what kind of video editing system we are focusing
on and how it affects the synchronization between the
en-coder and the deen-coder
2.1 The MPEG2 Transport Layer Stream Timestamps
Figure 1 shows how MPEG2 Transport streams manage to
maintain synchronization between the sender, which encodes
the stream, and the receiver, which decodes it As the
ele-mentary streams carrying video and audio content are
pack-etized, their target Decoding Time Stamp (DTS) and
Pre-sentation Time Stamp (PTS) are determined based on the
current sender clock and inserted into the packet headers
For video streams, the access unit is a frame, and both DTS
and PTS are given only for the first bit of each frame
af-ter its picture header These time stamps are laaf-ter used by
the decoder to control the timing at which it starts to do
decoding and presentation For example, if at time
s
an encoded frame comes to the multiplexing stage at the
sending side and the encoder believes (based on calculation
using predefined parameters) that the decoder should begin
to decode this frame
s after it receives it and output the decoded frame s thereafter Assuming the decoder could
reconstruct the encoder clock and the time it receives this
frame would also be
, then the DTS should be set to
s and the PTS
s After that, as all of these packetized elementary stream packets are further multiplexed together, the final stream is time-stamped with Program Clock Ref-erence(PCR), which is given by periodically sampling the encoder clock This resulting transport layer stream is then sent over the network to the receiver, or stored in storage devices for the decoder to read in the future As long as the delay the whole stream experiences remains constant from the receiver’s point of view, the receiver should be able to reconstruct the sender’s clock that has been used when the stream was encoded The accuracy and stability of this re-covered clock is very important, since the decoder will try
to match the PTS and DTS against this clock to guide its decoding and displaying activities
Figure 2: MPEG2 Transport Stream Syntax Knowing the general idea in timing, we now introduce how the Transport Layer syntax works, as shown in Figure
2 All sub-streams (video, audio, data and time stamps) are segmented into small packets of constant size (188 bytes), and the Packet ID(PID) field in the 4-byte header of each packet tells which sub-stream that packet belongs to The PCR packets are placed at constant intervals, and they form
a running time line along which all other packets are posi-tioned at the target time point On this time line, each 188-byte packet occupies one time slot, and the exact time stamp
of each packet/slot could be interpolated using neighboring PCR packets Data packets arrive and are read into the
Trang 3de-Figure 3: Layered coding scheme of MPEG-2 Transport Stream
coder buffer at constant rate, and this rate can be calculated
by dividing the number of bits between any two
consecu-tive PCR packets by the time difference between their time
stamps In other words, if the number of packets between
any two PCR packets remains constant, then the difference
between their time stamps should also be constant In an
ideal state, packets are read into the decoder at the constant
bit rate, and whenever a new PCR packet arrives, its time
stamp should match exactly with the receiver clock, which
confirms the decoder that so far it has successfully
recon-structed the same clock as the encoder However, since PCR
packets may have experienced jitter in network transmission
or storage device accessing before they arrive at the receiver,
we can not simply set the receiver’s local clock to be the
same as the time stamp carried by the next incoming PCR no
matter when it comes To smooth out the jitter and maintain
a stable clock with a limited buffer size at the receiver,
gen-erally the receiver will resort to some smoothing technique
like the Phase-Locked-Loop(PLL) [7] to generate a stable
clock from the jittered PCR packets PLL is a feedback
loop that uses an external signal(the incoming PCR
pack-ets in our case) to tune a local signal source(generated by
a local oscillator in our case) to generate a relatively more
stable result signal(the receiver’s reconstructed local clock
in our case) So long as the timing relation between PCR
packets is correct, the jitter can be smoothed out with PLL
2.2 HDTV Stream Editing/Streaming Test Bed
In the following discussion, we will base our discussion on a video editing/streaming test bed as shown in Figure 3 Live High definition digital TV stream from the satellite or the
HD storage device is feed into the server PC, which then encodes it into MPEG2 Transport stream and multicasts this stream over the high speed Local Area Network Players on the client PC’s join this multicast group to receive the HD stream, and then feed this stream into the decoding board The decoded analogue signal is then sent to the wide-screen
TV for display Our filter receives this stream in the same way as a normal player, and performs various kinds of video editing operations on this stream in real time, such as low pass filtering, frame/color dropping and visual information embedding [11] There could be multiple editing operations done to the same stream in a chain, and the resulting streams
at all stages are available to clients through other multicast groups
2.3 How Video Editing Affects Clock Reconstruction
Since the timing and spacing of PCR packets are very im-portant for clock reconstruction, it is obvious that video editing operations will cause malfunctions since it changes both
First, all intermediate operations, a video stream goes
Trang 4through before it reaches the decoder, contribute to the
de-lay and jitter of the PCR sub-stream Different filtering
op-erations, such as low pass filtering and Picture-In-Picture,
or even the same operation, take varying processing time to
do the necessary calculation for different frames or
differ-ent parts of the same frame In compensation, traditional
solutions would either try to adjust the resulting stream at
each intermediate point or push all the trouble to the final
client The former solution would suffer from the fact that
processing time for different operations/frames tends to be
quite different and varying, which makes it very hard to find
a local optimal answer The latter solution implies that the
client needs to have a very large buffer and a long waiting
time because of the unpredictable delay and jitter of the
in-coming stream We will see later how our solutions solve
this problem by utilizing the inherent PCR time stamps of
the streams
The second problem of changing spacing between PCR
packets is even more intractable As we said above, each
ac-cess unit (video frame or audio packet) should be positioned
within the time line formed by the PCR sub-stream If a
video frame arrives at the receiver at its destined time point,
the decoder would be able to correctly schedule where and
how long to buffer it before decoding it However, normally
after the filtering operations, a video frame becomes smaller
or larger It takes less or more packets to carry, and so its
following frames are dragged earlier or pushed later along
the time line In such circumstances, if we keep both the
time stamp and the spacing of the PCR packets unchanged,
then the receiver’s clock can still be correctly reconstructed,
but the arriving time of each frame will be skewed along
the time line For example, if the stream is low pass
fil-tered, then every frame becomes shorter, and so following
frames are dragged forward to pack up the vacancy spared
out If the decoder still reads in data at the original speed, it
feels that more and more future frames begin to come earlier
and earlier Since they are all buffered until their stamped
time for decoding, the buffer will be overflowed in the long
run no matter how large it is The fundamental problem is
that after the filtering, the actual bit rate becomes lower or
higher, but the data is still read in by the decoder at the
orig-inal rate since the timing and spacing of PCR packets are
not changed So if the new rate is lower, more and more
future frames are read in by the decoder, causing the
receiv-ing buffer for the network connection to be emptied while
the decoder’s decoding buffer is overflowed; on the other
hand, if the new rate is higher, then at some point in the
fu-ture, the data will remain in the receiving buffer and not be
read in by the decoder even at its decoding time
3 OUR SOLUTIONS
To solve the problems described above, an immediate thought would be to do the same kind of clock reconstruction as the decoder does at the filter, and then re-generate the PCR packets to reflect the changes at the filter output However,
we know that the smoothing mechanisms like PLL are im-plemented in hardware circuits containing a voltage con-trolled oscillator that generates high frequency signals to be tuned with the incoming PCR time stamps This is not easy,
if not impossible, to be done in software on computers with-out real-time support in hardware Therefore, a pure soft-ware mechanism that does not require hardsoft-ware real-time support would enable us to distribute the video editing ser-vice across the network to any point on the streaming path Another goal is to achieve a cheap and efficient solution that could be easily implemented and carried out by any com-puter with modest CPU and memory resource available The key idea behind our solution comes from the ob-servation that the DTS and PTS are only associated with the beginning bit of each frame Consequently, so long as
we manage to fix that point to the correct position on the time line, the decoder should be working fine even if the re-maining bits of that frame following the starting point are stretched shorter or longer
3.1 Simple Solution: Padding
Following the discussion above, we have designed a simple solution that works for bit rate reducing video editing op-erations We do not change the timestamp and the position
of any PCR packet along the time line within the stream, and we also preserve the position of the frame header and
so that of the beginning bit of every frame What is changed here is the size of each frame in terms of number of bits, and we just pack the filtered bits of a frame closely follow-ing the picture header Since each frame takes less 188-byte packets to carry, yet the frame headers are still positioned at their original time points, we can imagine that there would
be some “white space” left between the last bit of one frame and the first bit of the header of the next frame Actually the capacity of this space is the same as the reduction in the number of bits used to encode this frame as a result of the video editing operations, and we can simply pad this space with empty packets (NULL packets)
This solution is very simple to understand and imple-ment, and it preserves the timing synchronization, since we only need to pack the filtered bits of each frame continu-ously after the picture header and then insert NULL packets until the header of the next frame No new time stamps need
to be generated in real-time, and the bit rate remains stably
at the original rate However, it inevitably has some draw-backs First, it can only handle bit rate reduction operations
We only try to fix the header of each frame to its original
Trang 5position on the time line, which means the changed frames
should not occupy more bits larger than the distance
be-tween the current frame header and the next This property
does not always hold, since some filtering operations like
information embedding and watermarking may increase the
frame size in bits Secondly, the saved bits are padded with
NULL packets to maintain the original constant bit rate and
the starting point of each frame, and this ironically runs
counter to our initial goal of bit rate reduction for some
op-erations like low pass filtering and color/frame dropping
The resulting stream contains the same number of packets
as the original one The only difference is that the
num-ber of bits representing each frame has been shrunk, yet this
saving is spent immediately by padding NULL packets at
the end of each frame
Here we want to mention that there does exist another
approach to bypass the second problem Up to now we have
been using a filter model that is transparent to the client
player, which confines us strictly to the MPEG2 standard
syntax However, if some of the filtering intelligence is
ex-ported to the end hosts, then some saving can be expected
For example, instead of inserting NULL packets, we may
compress them by insert only a special packet saying the
next packets should be NULL packets At the end host, a
stub proxy could be watching this incoming stream, and on
seeing this packet, it replaces this packet with the supposed
amount of padding packets before sending the stream to the
client player Note that this padding is important to maintain
correct timing, especially if the client is using some
stan-dard hardware decoding board This way, the bandwidth
is indeed saved, but at the price of relying on non-standard
protocol outside MPEG2 Of course, this will in turn
intro-duce problems associated with non-standardized solutions,
such as difficulty in software maintenance and upgrading
Therefore, we only consider this as a secondary choice, and
not as a major solution
Figure 4: Example: 2/3 shrinking
3.2 Enhanced Solution: Time-Invariant Bitrate
Scal-ing
To ultimately solve the synchronization problem, a more
general algorithm has been designed The key insight
be-hind it is that we can change the bit rate to another constant
value while preserving the PCR time stamps by changing
the number of packets between any PCR pair to another constant value This way, we can scale the PCR packets’ distance and achieve a fixed new bit rate, as if the time line
is scaled looser or tighter to carry more or less packets, yet
we do not need to re-generate new PCR time stamps which relies on hardware real-time support All non-video stream packets can be simply mapped to the new position on the scaled output time line that corresponds to the same time point as on the original input time line In case that no exact mapping is available because the packets are aligned at unit
of 188-byte, we could simply use the nearest time point on the new time line without introducing any serious problem For video stream, the same kind of picture header fixing and frame data packing are conducted as in the first solution but
in a scaled way
An example of shrinking the stream to its 2/3 band-width is given in Figure 4 All non-video packets and video packets that carry picture headers are mapped to their corre-sponding position on the new time line, and so their distance
is also shrunk to 2/3 of the original After video editing operations, the resulting video packets are packed closely and as early as possible within the new stream following the frame header Intuitively, the filtered video data is squeezed into the remaining space between all non-video packets and picture header packets For example, if in the input stream, packet 6 is a frame header, and packet 9 is an audio packet, and packet 7, 8, 10 through 24 are video data from the same frame of packet 6 After 2/3 shrinking, packet 6 is posi-tioned in slot 4, and packet 9 goes to slot 6 The other video data packets are processed by a video editing filter, and the resulting bits are packed again into packets of 188-byte each Therefore, all empty slots, such as slot 5 and 7 through 16, are used to carry the resulting bits If the filter shrinks the video frame to occupy less than 2/3 of its origi-nal number of bits, then the new slots should be enough to carry the resulting frame
This algorithm is also very simple to implement For each non-video packet, its distance (in number of packets) from the last PCR packet is multiplied by a scaling factor
, and the result is used to set the distance between this packet and the last PCR packet in the output stream For video frames, the header containing DTS and PTS is scaled and positioned in the same way, and the remaining bits are closely appended to the header in the result stream Note that when
is set to always be 1, this reduces to the simple solution above
Now the only problem is how to determine
for a spe-cific streaming path If we shrink the time line too much and for some frames the bit rate reducing operation does not have a significant effect, then again we will not have enough space to squeeze in this frame, which will push the beginning bit of the next frame behind schedule On the other hand, if we shrink the time line too little or expand (
Trang 6Figure 5: Result of time line scaling
) it too much, then more space will be padded using
NULL packets to preserve important time points, leading to
a waste of bandwidth There exists one optimal scale
fac-tor
that can balance these two strengths if it fulfills the
condition that
the filtered frame data will always be squeezed into
the scaled stream;
the number of NULL packets for padding purpose is
minimum
However, this optimal scale factor is hard to estimate in
ad-vance since for different operations with various parameters
have quite varying effect on distinct video clips in terms of
bit rate changing Therefore, in our current implementation,
we simply use a slightly exaggerated scale factor based on
the operation type and parameters For example, for low
pass filtering with a threshold of 5, a scaling factor of 0.80
will work almost for all streams Even if we meet a frame
that still occupies more than 0.9 number of packets after
the filtering, only the next few frames may be slightly
af-fected Since a smaller-than-average frame is expected to
follow shortly, this local skew can be absorbed by the
de-coder easily and does not have any chain effect
Our next step will be looking into how to “learn” this optimal scale factor by analyzing history bit rate change of
a stream and adjust this factor
on the fly It is not speci-fied by the MPEG standard how a decoder, especially hard-ware decoding board, should react if the incoming stream changes from one constant bit rate to another, and it is also
an open question how quickly it would adapt to the new rate
4 EXPERIMENTAL RESULT
Figure 5 shows the effect of the time line scaling approach for a Low Passing Filter with threshold 5 Each point on the
x axis represents an occurrence of a PCR packet, and the y axis shows in three colors how many video packets, NULL packets or packets for other data stream are in between each two PCR packets We can see that the distribution of the three areas is kept almost constant for the original frame except for more NULL packets at the end of a frame How-ever, without scaling, the number of video packets varies across different PCR intervals and a lot of extra space is padded with NULL packets as shown in the upper right sub-figure On the other hand, if we do scaling with a scaling factor of 80%, then the padding occurs mostly only at the end of frames and the stream contains mostly only useful
Trang 7LP (10) LP (5) PIP Original BR (Mbps) 18.0 18.0 18.0
Average resulting BR (Mbps) 15.45 13.71 19.04
Average relative change 0.86 0.76 1.06
Suggested
0.90 0.80 1.10 Table 1: Final Statistics
data
One thing we need to point out here is that the skew of
video access units along the time line still exists with this
scaling approach What happens is that after the filtering
operation, each frame shrinks to a size mostly less than 80%
of its original size If we mask out all other packets, we can
see that in the video stream, frames are packed closely one
after another If one frame takes more space than its share,
then the next frame may be pushed behind its time point, but
this skew will be compensated later by another frame with
a larger shrink effect As we said before, this kind of small
jitter around the exact time point on the scaled time line
is acceptable, and it is the change in the bit rate at which
the decoder reads in the data that fundamentally makes our
scaling algorithm able to solve the problem
Another experiment is done on how to determine the
scaling factor
for a particular kind of video editing
oper-ation Three kinds of operations are tested: Low Pass
Fil-tering with threshold being 10 and 5 and Picture-in-Picture
The original stream is a HD stream “stars1.mpg” with bit
rate 18 Mbps in MPEG2 Transport Layer format The
em-bedded frame used for Picture-In-Picture is a shrunk
ver-sion of another HD stream “football1.mpg” Since the
con-tent of this stream is more condense than the background
stream (i.e., more DCT coefficients are used to describe
each block), it is expected that the bit rate will increase
after this Picture-In-Picture operation The final statistics
are shown in Table 1 Experiment results have shown that
with the suggested scaling factor based on real world
statis-tics, our time-invariant scaling algorithm could successfully
solve the synchronization problem
5 CONCLUSION
In this paper, we focus on the scenario of streaming HD
video in MPEG2 Transport Layer syntax with software
stream-ing/processing and hardware decoding, which would be
com-monplace until the processing power of personal computers
becomes strong enough to cope with high bandwidth/definition
video streams We have studied an important problem that
decoders may lose synchronization and fail to reconstruct
the encoder’s clock because of video editing operations on
the streaming path We have proposed two solutions to
this problem, both based on the idea of reusing the original
time stamp packets (PCR packets) and adjusting the
num-ber of packets between them to reflect the changes in bit rate caused by video editing operations Experimental re-sults have shown that our solutions are efficient and work fine without any requirement on real-time support from the system
As far as we know, our work is among the first efforts
in promoting real-time software filtering of High Definition MPEG2 streams, and can be beneficial to many real-time applications that work with MPEG2 system streams like HDTV broadcast
6 ACKNOWLEDGMENT
This work was supported by the NASA grant under contract number NASA NAG 2-1406, National Science Foundation under contract number NSF CCR-9988199 and NSF CCR
0086094, NSF EIA 99-72884 EQ, and NSF EIA 98-70736 Any opinions, findings, and conclusions or recommenda-tions expressed in this material are those of the authors and
do not necessarily reflect the views of the National Science Foundation or NASA
7 REFERENCES
[1] ProxiNet http://www.proxinet.com.
[2] Emerging high-speed xDSL access services:
architec-tures, issues, insights, and implications IEEE Com-munications Magazine , Volume: 37 Issue: 11 , Nov.
1999, Page(s): 106 -114, 1999.
[3] Gigabit Ethernet Circuits and Systems, 2001 Tutorial Guide: ISCAS 2001 The IEEE International Sympo-sium on , 2001, Page(s): 9.4.1 -9.4.16, 2001.
[4] I I S 13818 Generic coding of moving pictures and associated audio information 1994
[5] Y C A Fox, S.D Gribble and E Brewer Adapting
to network and client variation using active proxies:
lessons and perspectives IEEE Personal Communica-tion, Vol 5, No 4, pp 10C19, 1998.
[6] A Dutta-Roy An overview of cable modem
technol-ogy and market perspectives IEEE Communications Magazine , Volume: 39 Issue: 6 , June 2001, Page(s):
81 -88, 2001.
[7] C E Holborow Simulation of Phase-Locked Loop for
processing jittered PCRs ISO/IEC JTC1/SC29/WG11, MPEG94/071, 1994.
[8] R M J Smith and C Li Scalable multimedia
deliv-ery for pervasive computing ACM Multimedia 1999,
1999
Trang 8[9] J Spragins Fast Ethernet: Dawn of a New Network
[New Books and Multimedia] IEEE Network , Vol-ume: 10 Issue: 2 , March-April 1996, Page(s): 4,
1996
[10] B S W Y Ma and J Brassil Content Services
Net-work: The Architecture and Protocols Proceedings of the Sixth International Workshop on Web Caching and Content Distribution, 2001.
[11] B Yu and K Nahrstedt A Compressed-Domain Vi-sual Information Embedding Algorithm for MPEG2
HDTV Streams ICME 2002, 2002.