1. Trang chủ
  2. » Luận Văn - Báo Cáo

Iec Tr 62251-2003.Pdf

46 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Quality assessment – Audio-video communication systems
Trường học Unknown University
Chuyên ngành Multimedia Systems and Equipment
Thể loại Technical Report
Năm xuất bản 2003
Thành phố Geneva
Định dạng
Số trang 46
Dung lượng 0,96 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • 4.1 Input and output channels (8)
  • 4.2 Points of input and output terminals (9)
  • 5.1 Introduction (10)
  • 5.2 End-to-end tone reproduction (11)
  • 5.3 End-to-end colour reproduction (13)
  • 5.4 End-to-end colour differences (14)
  • 5.5 End-to-end peak-signal to noise ratio (PSNR) (18)
  • 5.6 End-to-end objective assessment of video quality (22)
  • 6.1 Perceived audio quality with full-reference signals (23)
  • 6.2 Sampling rate and quantization resolution (25)
  • 6.3 Delay (26)
  • 7.1 Synchronization of audio and video (lip sync) (26)
  • 7.2 Scalability (27)
  • 7.3 Overall quality (28)
  • and 30 fps (0)

Nội dung

TECHNICAL REPORT IEC TR 62251 First edition 2003 05 Multimedia systems and equipment – Quality assessment – Audio video communication systems Reference number IEC/TR 62251 2003(E) L IC E N SE D T O M[.]

Input and output channels

Audio and video signals in audio-video streams are captured at both the input and output channels of the communication system, as illustrated in Figure 1.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Figure 1 – Model of audio-video communication systems

Points of input and output terminals

For effective end-to-end quality assessment of audio-video communication systems, it is crucial to acquire raw data from points that are as close to the ultimate endpoints as possible.

The measurement and characterization methods for equipment using input transducers, like video cameras and microphones, are standardized in IEC 61146-1, IEC 61146-2, IEC 61966-9, and IEC 60268-4 Similarly, the standards for equipment with output transducers, such as video signal displays and loudspeakers, are outlined in IEC 61966-3 and IEC 61966-4.

IEC 61966-5 and IEC 60268-5, they can be outwith the scope of the rage of the end-to-end.

Figure 2 shows a schematic diagram for quality assessment under double-ended and full reference conditions.

1 Original audio or video reference.

To optimize audio-video communication systems, it is essential to implement a pre-conditioner that reduces the dynamic range and frequency range for audio, as well as adjusts the frame size and frame rate for video This ensures that the quality assessment aligns with the necessary standards for effective communication.

3 Encoder for network streaming with a specified bit-rate in order to fit to the bandwidth of end-to-end network connection.

4 Decoder and rendering for the received data to make them audible and visible.

4’ Rendering for the preconditioned data to make them audible and visible, optional.

5 Data acquisition and calculation for quality assessment to provide information specified in this report.

Figure 2 – Schematic diagram for quality assessment

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Introduction

This Technical Report provides a comprehensive end-to-end assessment of video quality by examining two key aspects: static characteristics, including tone and color reproduction, as detailed in sections 5.2 and 5.3, and dynamic characteristics related to the streaming of video frames over networks, discussed in sections 5.4, 5.5, and 5.6.

Utilizing widely accessible video sources, such as the test sequences from the Canadian Research Centre (CRC), is advisable for the original video reference depicted as item 1 in Figure 2 Due to its high bit-rate and large frame size, it is essential to downscale the reference source's frame size and bit-rate for its application as item 2.

Figure 2, if necessary, for actual encoding as streaming video to a network with limited bandwidth.

For the dynamic characteristics, reference video sequences currently available are listed in

Table A.1 All reference video sources in Table A.1 have been adopted in this Technical

The Canadian Research Council (CRC) granted permission for the use of their report by the Video Quality Expert Group (VQEG) to conduct subjective video quality tests and obtain results.

Difference of Mean Opinion Score (DMOS) and also object Video Quality Metric (VQR) as reported in ITU-R 10-11Q/56-E.

The format of each of the reference video sources is composed of 10 frames (for leader) + video frames for 8 s + 10 frames (for trailer) There are two video formats 525/60Hz and

625/50Hz, but only the 525/60Hz format shown in Table A.1 is adopted in this Technical

Each line is in pixel multiplexed 4:2:2 component video format as Cb Y Cr Y … and so on, encoded in line with ITU-R BT.601-5, where 720 bytes/line for Y, 360 B/line for Cb and 360

B/line for Cr The lines are concatenated into frames and frames are concatenated to form the sequence files.

The format contains 720 pixels (1 440 bytes) per horizontal line and has 486 active lines per frame The frame sizes are 1 440 x 486 = 699 840 B/frame and the sequence sizes are

240 frames file size for 8 s + 20 frames Thus, file size is 699 840 bytes/frame x 260 frames 181 958 400 bytes 30 frame/s will result a bit-rate of 699 840 bytes/frame x 30 frame/s x

The original test sequences, with a bit rate of 167,961,600 bit/s, are too high for standard personal computers and internet streaming To accommodate typical video formats like AVI, the frame size has been reduced to 320 x 240 pixels, and the format has been changed to RGB 24-bit/pixel, adhering to IEC 61966-2-1 standards.

NOTE 1 Pixel-by-pixel error assessment requires a very high degree of normalisation to be used with confidence.

The normalisation requires both spatial and temporal alignment as well as corrections for gain and offset For this purpose, Clause A2 of ITU-R 6Q/39-E should be referred to.

NOTE 2 Since the values of objective quality metrics largely depend on video contents, varieties of commonly available video sources should be used as far as possible.

NOTE 3 Video quality metrics obtained by objective assessment in Clause 5 should be converted to be VQR by optimum correlation with DMOS, which is under consideration within ITU-R WP 6Q.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

End-to-end tone reproduction

End-to-end non-linearity in term of tone reproduction.

For item 1 of Figure 2, refer to the grey steps chart defined in IEC 61146-1, illustrated in Figure 3 Additionally, prepare a still neutral image as a file for item 2 of Figure 2, which will be encoded multiple times to create a streaming video for network transmission.

Figure 3 – The image of the grey steps defined in IEC 61146-1

The received streaming video should be decoded and rendered by a viewer for the incoming streaming videos The image data to be displayed should be captured at an output terminal.

The image data should be compared in terms of three component data, R, G, and B, averaged in each of the corresponding areas.

The results of the display data compared to the input image data should be presented in both a table and a plot, as illustrated in Table 1 and Figure 4 This should accompany details about the audio-video communication system being evaluated and the specifications of the input-output points.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Table 1 – An example of tone reproduction

Out put le ve l (8-bi t)

Figure 4 – An example plot of tone reproduction

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

End-to-end colour reproduction

End-to-end colour shifts in the CIELAB colour space for a still colour image.

The colour reproduction chart outlined in IEC 61146-1, depicted in Figure 5, serves as the reference for item 1 in Figure 2 Additionally, a still colour image must be created as a file for item 2 of Figure 2 and encoded multiple times to facilitate streaming video transmission over a network.

Figure 5 – The image of the colour reproduction chart defined in IEC 61146-1

The received streaming video should be decoded and rendered by a viewer for streaming videos The colour image data to be displayed should be captured at an output terminal.

The image data should be acquired in terms of three component data, R, G and B, averaged in each of the corresponding areas.

Input colours and output colours in R, G and B data should be regarded to be in sRGB defined in IEC 61966-2-1 They should be converted to CIE 1976 L*a*b* uniform colour space.

Colour differences ∆E ab * between the reference data and the received data should be calculated and reported as shown in Table 2.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Table 2 – An example of colour reproduction

Specification Input (8-bit x 3) Output (8-bit x 3)

End-to-end colour differences

The average of colour differences in the psychophysically uniform colour space defined in CIE

15.2 between the reference video frame and the corresponding deteriorated video frame.

Reference videos listed in Table A.1 serve as item 1 in Figure 2 For item 2 of Figure 2, videos in uncompressed AVI format with reduced frame sizes must be prepared It is essential to embed frame numbers to facilitate the identification of received frames that correspond to the transmitted frames.

Encoded and transmitted streaming videos shall be continuously captured Pixel-by-pixel calculations should be conducted.

The average colour difference ∆E * ab k in the psychophysically uniform colour space between the reference and the deteriorated frames k is defined as in equation (1).

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU. where

L , a * o k and b o * k are the triplets in the CIELAB colour space corresponding to each pixel of the reference video frame k ;

L , a d * k and b d * k are the triplet in the CIELAB colour space corresponding to each pixel of the deteriorated video frame k ;

∆ is the CIELAB colour difference between the pixels.

The triplets in the CIELAB colour space should be deduced from the pixel values R, G and

B of the reference and the deteriorated video frames in the default RGB colour space (sRGB) defined in IEC 61966-2-1 Each pixel is positioned at row m and column n in a video frame.

The colour difference between each of the corresponding frames should be plotted versus frame numbers as shown in Figure 6 together with identifications of reference video sources.

The conditions of measurement such as frame size in pixels, frame rate, streaming bit-rate should also be reported.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Col our di ff ernc e

Col our di ff ernc e

Frames Refence source encoded/streamed

Figure 6a – Example for SRC13_REF 525 Figure 6b – Example for SRC14_REF 525

Col our di ff ernc e

Col our di ff ernc e

Frames Refence source encoded/streamed

Figure 6c – Example for SRC15_REF 525 Figure 6d – Example for SRC16_REF 525

Col our di ff ernc e

Col our di ff ernc e

Frames Refence source encoded/streamed

Figure 6e – Example for SRC17_REF 525 Figure 6f – Example for SRC18_REF 525

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Col our di ff ernc e

Col our di ff ernc e

Frames Refence source encoded/streamed

Figure 6g – Example for SRC19_REF 525 Figure 6h – Example for SRC20_REF 525

Col our di ff ernc e

Col our di ff ernc e

Frames Refence source encoded/streamed

Figure 6i – Example for SRC21_REF 525 Figure 6j – Example for SRC22_REF 525

Condition of assessment: video frame size: 320 pixels x 240 pixels frame rate: 30 s -1 streaming bit-rate: 250 kb s -1 network bandwidth: more than 250 kb s -1 reproduction: Microsoft Media Player® version 7.1

Figure 6 – Colour differences between reference and streamed video frames at 250 kbs -1 and 30 s -1

To summarize the assessment, the collected data must be averaged across frames, as indicated in equation (2), to yield a single metric for objective evaluation known as the grand average.

It should be reported as in Table 3.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Table 3 – Grand averages of colour differences

Grand average of colour difference

End-to-end peak-signal to noise ratio (PSNR)

Peak-signal to noise power ratios, PSNRs, in three-dimensional coordinate systems.

Reference videos listed in Table A.1 serve as item 1 in Figure 2 For item 2 of Figure 2, videos in uncompressed AVI format with reduced frame sizes must be prepared It is essential to embed frame numbers to facilitate the identification of received frames that correspond to the transmitted frames.

Encoded and transmitted streaming videos shall be continuously captured Pixel-by-pixel calculation should be conducted.

The peak-signal to noise ratio (PSNR) between a full reference image and a reproduced image recommended in ITU-T J.144 should be used It defined the PSNR by the following equation (3).

(p mn d and o(p,m,n) represent, respectively, degraded and original pixel vectors at frame p, row m and column n;

S max is the maximum possible value of the pixel vectors.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

For colour images, each picture element is normally composed of three dimensional values, red (R), green (G) and blue (B) Thus, the definition in equation (4) applies for the mean- square errors.

MSE RGB (4) where S max(RGB) =3×2 2 ( N − 1 ) for the values in N-bit encoding.

It is recommended to evaluate the PSNR in the more uniform colour space, CIE 1976 LAB, as in equation (5).

MSE Lab K * (5) where S max(Lab) = ( ) ( ) ( ) L * max 2 + a max * 2 + b max * 2 , actual value of which depends on a colour gamut of original RGB colour space.

It is recommended to use the default RGB colour space defined by IEC 61966-2-1, in which

NOTE It should be noted that the terms for summation in equation (5) are the square of the colour differences in the psychophysically uniform colour space described in 5.4.

Additionally, luminance signal Y and two colour difference signals C b and C r denoted as Ycc will also be calculated for comparison as in equation (6).

MSE Ycc (6) where S max( Ycc) =1,01659 in YCbCr system defined in IEC 61966-2-1, Amendment 1.

The PSNRs in three-dimensional spaces Lab, Ycc and RGB together with one-dimensional

PSNRs in L * and Y should be reported as shown in Figure 7.

The conditions of measurement such as frame size in pixels, frame rate, streaming bit-rate should also be reported.

NOTE In order to demonstrate software developed by Chiba University in collaboration with Mitsubishi Electric

Corp for various quality metrics regarding the known hypothetical deterioration used in the Video Quality Expert

Group (VQEG) in terms of three-dimensional PSNRs and one-dimensional PSNRs together with the average colour difference are attached in Annex A for information.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Figure 7a – SRC13_REF 525 Figure 7b – SRC14_REF 525

Figure 7c – SRC15_REF 525 Figure 7d – SRC16_REF 525

Figure 7e – SRC17_REF 525 Figure 7f – SRC18_REF 525

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Figure 7g – SRC19_REF 525 Figure 7h – SRC20_REF 525

Figure 7i – SRC21_REF 525 Figure 7j – SRC22_REF 525

Condition of assessment: video frame size: 320 pixels x 240 pixels frame rate: 30 s -1 streaming bit-rate: 250 kbs -1 network bandwidth: more than 250 kbs -1 reproduction: Microsoft Media Player® version 7.1

Figure 7 – Examples of PSNR assessment

To summarize the assessment, the PSNR values must be averaged across frames to yield comprehensive metrics for objective evaluation, as indicated in equation (7) The results should be presented in Table 4.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Table 4 – Overall PNSRs averaged over the frames

RSNR in RGB RSNR in L* RSNR in Y

End-to-end objective assessment of video quality

Estimation of subjective Difference Mean Opinion Scores (DMOS) using a model emulating human visual and perceptual characteristics for digital videos.

For this purpose, the VQEG in its phase 1 test studied the proposed models from ten proponents (actually 9 out of 10 were considered effective) as reported in the document ITU-

The R 10-11Q/56-E framework encompasses several key components for image and video quality evaluation It includes a segmentation-based image evaluation method that predicts quality across predefined scenes, and a visual discrimination model that mimics human spatiotemporal visual responses Additionally, it features a model that replicates human visual characteristics through spatiotemporal 3-dimensional filters, and employs Mean Square Error (MSE) weighted by various human visual filters, including pixel-based, block-based, and sequence-based filters A perceptual distortion metric is also integrated, based on a spatiotemporal model of the human visual system Furthermore, the framework combines a perceptual model with a feature extractor tailored to specific distortion types, while addressing digital video quality by incorporating multiple aspects of human visual sensitivity into straightforward image processing Lastly, it utilizes a perceptual video quality measure that parallels the approach used in assessing speech quality, along with a model that leverages reduced bandwidth features from spatial-temporal regions to estimate subjective quality ratings through a linear combination of parameters.

Performance of all the models was tested in terms of feature extraction capability against the conventional peaksignal-to-noise ratio method.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

The VQEG is under the phase 2 test for full reference television among new proponents.

A feasible method of assessment (model) is still under consideration.

A new method submitted by the Republic of Korea utilizes spatiotemporal wavelet transform, as outlined in ITU-R 6Q/42-E This Technical Report analyzes the method within the sRGB domain, detailed in Annex B.

Video quality rating as an estimation of difference mean opinion scores should be reported together with the model used and the conditions.

NOTE An example of the presentation is under consideration.

Perceived audio quality with full-reference signals

Objective Difference Grade (ODG) values, measured by PEAQ (Perceived Evaluation of Audio

Quality) method recommended in ITU-R BS.1387-1.

Perceived audio quality (PEAQ) is crucial in the design of digital audio-video communication systems While formal listening tests have traditionally assessed audio quality, they are often time-consuming and costly Therefore, an objective measurement method is needed to estimate audio quality effectively Conventional methods like signal-to-noise ratio (SNR) and total harmonic distortion (THD) have proven unreliable in correlating with perceived audio quality, especially when applied to modern codecs that exhibit non-linear and non-stationary characteristics After extensive validation, the ITU-R has endorsed PEAQ as a reliable objective measurement method.

The perceived evaluation of audio quality is essential for assessing the performance of audio equipment, such as low bit-rate codecs This evaluation method is outlined in ITU-R BS.1387-1 and is briefly detailed in Annex B.

The PEAQ objective measurement method produces two key output variables: the objective difference grade (ODG) and the distortion index (DI) The ODG aligns with the subjective difference grade (SDG) in the subjective assessment Notably, the ODG is expressed with a resolution limited to one decimal place.

It is important to exercise caution when interpreting differences of a tenth of a grade between any two ODGs, as such variations may not be significant Additionally, the DI shares the same meaning as the ODG.

DI and ODG can be quantitatively compared, but not qualitatively Generally, ODG serves as the quality measure for values exceeding approximately -3.6, where it shows a strong correlation with subjective assessments.

When ODG value is less than –3,6, the DI should be used Therefore, both ODG or DI variables shall be measured.

6.1.3 Method of assessment and algorithm of PEAQ

The PEAQ objective measurement method is based on two key inputs: the unprocessed reference audio signal and the audio signal being tested The reference signal corresponds to item 2 in Figure 2, while the test signal may represent the output from digital audio-video communication systems, as indicated by item 4 in Figure 2, which is stimulated by the reference signal.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

This measurement method is relevant for various audio signal processing equipment, encompassing both digital and analogue types This document specifically emphasizes its application in digital audio communication channels, with the "device under test" referring to items 2 and 3.

Device under test Objective measurement method

Refence signal Audio quality estimate

Example: digital audio-visual communication system

Figure 8 – Basic concept for making objective measurements

The PEAQ model, illustrated in Figure 9, is grounded in established psychoacoustic principles and involves comparing a processed signal with its time-aligned reference Initially, the peripheral ear is represented through a "perceptual model" or "ear model." Both the reference and processed signals are transformed into outputs of these ear models Subsequently, the algorithm assesses the audible distortion in the test signal by analyzing the outputs from the ear models This process yields several values known as model output variables (MOVs), which are valuable for a detailed signal analysis.

The ultimate objective is to establish a quality metric represented by a single number that reflects the audibility of distortions in the tested signal Achieving this requires additional processing of the MOVs to emulate the cognitive functions of the human auditory system Consequently, the PEAQ algorithm integrates an artificial neural network.

The PEAQ offers two versions: a "Basic" version that utilizes a low complexity approach and an "Advanced" version designed for greater accuracy, albeit with increased complexity Both versions share a similar structure that aligns perfectly with the PEAQ model illustrated in the accompanying figure.

9 The major difference between the basic and the advanced version is hidden in the respective ear models and the set of MOVs used Annex C provides more information about

PEAQ, which helps readers to understand the measurement results.

ODG (DI) Audio quality estimate

MOVs (Detailed analysis) Perceptual model

Figure 9 – Representation of the PEAQ model

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

It is recommended to use the reference items available from the ITU as WAV-files (Microsoft

RIFF format) on a CD-ROM All reference items have been sampled at 48 kHz in 16-bit PCM.

The reference and test signals provided by the ITU have already been time and level adapted to each other, so that no additional gain or delay compensation is required.

The measurement algorithm must be adjusted to a listening level of 92 dB SPL.

PEAQ measurement results should be reported in terms of the names of the reference and signal under test items 2 , and the resulting DI and ODG values in a table.

Table 5 is related to the basic version, and Table 6 contains the values for the advanced version.

Table 5 – Test items and resulting DI and ODG values for the basic version

Item DI ODG Item DI ODG Item DI ODG

Acodsna.wav 1,304 –0,676 Fcodtr2.wav –0,045 –1,927 lcodhrp.wav 1,041 –0,876

Bcodtri.wav 1,949 –0,304 fcodtr3.wav –0,715 –2,601 lcodpip.wav 1,973 –0,293

Ccodsax.wav 0,048 –1,829 gcodcla.wav 1,781 –0,386 mcodcla.wav –0,436 –2,331

Dcodryc.wav 1,648 –0,458 hcodryc.wav 2,291 –0,166 ncodsfe.wav 3,135 0,045

Ecodsmg.wav 1,731 –0,412 Hcodstr.wav 2,403 –0,128 scodclv.wav 1,689 –0,435

Table 6 – Test items and resulting DI and ODG values for the advanced version

Item DI ODG Item DI ODG Item DI ODG

Acodsna.wav 1,632 –0,467 Fcodtr2.wav 0,162 –1,711 Lcodhrp.wav 1,538 –0,523

Bcodtri.wav 2,000 –0,281 Fcodtr3.wav –0,783 –2,662 Lcodpip.wav 2,149 –0,219

Ccodsax.wav 0,567 –1,300 Gcodcla.wav 1,457 –0,573 Mcodcla.wav 0,430 –1,435

Dcodryc.wav 1,725 –0,415 Hcodryc.wav 2,410 –0,126 Ncodsfe.wav 3,163 0,050

Ecodsmg.wav 1,594 –0,489 Hcodstr.wav 2,232 –0,187 Scodclv.wav 1,972 –0,293

Sampling rate and quantization resolution

Sampling rate and bandwidth of the reference and the processed audio signals.

The sampling rate is crucial for the bandwidth of audio signals, with 48 kHz being the standard for high-quality audio It is essential to extract the sampling rate and bandwidth of both reference and processed audio signals for optimal performance.

The reference items are named by substituting the substring "cod" in the test item names with "ref." For instance, the reference item for "bcodtri.wav" becomes "breftri.wav."

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Resolution of quantization relates to the dynamic range of audio signals or quantization noise.

For high-quality audio signals, the linear (or uniform) quantization method, which have 16-bit quantization resolution, are used The resolution and quantization method should be identified.

Extracted and identified values should be reported.

Delay

Delay time in seconds from audio inputs to an encoder and received digital audio signals.

Pulsed audio signals are recommended as the input, as indicated in item 2 of Figure 2 Additionally, it is essential to measure the time interval in seconds between the input of item 3 and the output of item 4 in Figure 2.

NOTE In most audio communication systems over the digital networks, a buffering scheme is incorporated.

Therefore, buffering time is also taken into account in the measurement.

The measured delay time should be reported in seconds.

Synchronization of audio and video (lip sync)

Temporal synchronization between audio and video channels.

True multimedia systems are defined by their capability to maintain temporal synchronization across various media channels, rather than simply being a collection of unrelated media Consequently, incorporating a measurement for temporal synchronization quality is essential in evaluating the performance of audio-video communication systems.

The ITU-T Recommendation P.931 outlines a framework for measuring temporal synchronization across media channels, based on the premise that media signals can be captured at key interfaces For the visual channel, this includes the camera output and display input, while for the audio channel, it encompasses the microphone output and loudspeaker input, as illustrated in Figure 1.

Media signals at those interfaces are digitized, if necessary, broken into fixed sized frames, and given timestamps For the details of this procedure, refer to ITU-T Recommendation

The audio and the video media streams being considered, digitized frames of these media streams are given sequence numbers as follows:

A(m) represents the input audio, while V(n) denotes the video frames, with m and n indicating the sequence numbers for each stream It is assumed that these audio and video elements are linked, corresponding to the same event.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

• A′(p) and V′(q) are the output audio and the video frames, respectively.

• T A (m) and T A ′(n) are the timestamps for A(m) and A′(n), respectively Timestamps for other frames are defined in the same way.

For each input frame, however not all the input frames are to be used as described in ITU-T

To identify the corresponding output frame on page 931, one must navigate a complex matching process due to modifications, distortions, drops, and reshaping of the media stream data For video frames, the approach will employ the PSNR metrics outlined in Clause 5.

The audio processing involves a two-stage method that utilizes audio envelopes for initial analysis and power spectral densities for detailed refinement For further information, please consult the ITU-T guidelines.

Assuming the established matching relationships between A(m) and A′(p), as well as V′(n) and V′(q), the time skew between audio and video frames can be expressed by equation (8).

Choosing suitable input audio signals is crucial for achieving valid and meaningful assessment results If the video signal features static or nearly static scenes, matching the input and output frames may become challenging or even unfeasible This caution also applies to the evaluation of the audio channel.

Modern video compression techniques exhibit varying times for compression, transmission, and decompression, particularly when using variable bit-rate encoding It is essential to utilize suitable input signals tailored to the intended application for accurate assessment.

In systems with low video frame rates, it can be beneficial to allow a greater temporal skew between video and audio streams, as video delays can vary while audio typically remains consistent.

Selection of standard audio-video input streams suited for common usage is left for future study.

The measurement report should clearly illustrate any variations between individual measurements Additionally, classical summary statistics such as minimum, maximum, average, and standard deviation can be included for comprehensive analysis.

Scalability

Autonomous function to tune frame rate dynamically depending on available bandwidth between the sender and the receiver.

The method for measurement of scalability is left for future study.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Overall quality

Overall quality factor in terms of audio and video interaction.

Overall quality of audio-video communication systems OQ AV should be defined as in equation

The overall quality metric, denoted as OQ, is calculated using the formula OQ = aQ_V + bQ_A + cQ_{V&A}, where Q_V represents the objective quality metric from clause 5, Q_A from clause 6, and Q_{V&A} from the current clause The coefficients a, b, and c serve as weighting factors that vary based on the specific applications of the audio-video communication system.

The overall quality factor should be reported with sufficient information on the audio-video communication system under assessment.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

PSNRs defined in three-dimensional spaces applied to hypothetical deterioration over the reference video sources

This annex illustrates the definitions of Peak Signal-to-Noise Ratios (PSNRs) in three-dimensional vector space for each pixel within a video frame It presents the PSNR in CIELAB as defined in equation (5), the PSNR in sYCC in equation (6), and the PSNR in sRGB in equation (4) Additionally, it includes the average color difference from equation (1) for comparative analysis alongside one-dimensional PSNRs in L* and Y.

The values of the objective quality measures will be easily compared with other possible future measures and the results of subjective assessment of video quality.

A.2 Test sources and hypothetical deterioration

This annex outlines 16 distinct hypothetical degradations affecting digital video files formatted in ITU-R BT.601-5, utilized by the Video Quality Expert Group (VQEG) The source videos are identified by the labels SRC13_REF 525.yuv to SRC13_REF 540.yuv.

SRC22_REF 525.yuv as shown in table A.1 They are made use of by the permission of the

Chiba University in Japan, in partnership with Mitsubishi Electric Corp, has developed software for various objective measures The software has produced values based on a reduced frame size of 320 x 240 pixels across 260 frames.

1 M , M2$0and N1=1, N220 in equation (B.4) Numerical results are shown in

Table A.1 – Reference video sources available for objective assessment

SRC13_REF 525 Balloon-pops Film, saturated colour, movement SRC14_REF 525 New York 2 Masking effect, movement

SRC15_REF 525 Mobile & Calendar Colour, movement

SRC16_REF 525 Betes_pas_betes Colour, synthetic, movement, scene cut

SRC17_REF 525 Le_point Colour, transparency, movement in all the directions

SRC18_REF 525 Autumn leaves Colour, landscape, zooming, water fall movement

SRC19_REF 525 Football Colour, movement

SRC20_REF 525 Sailboat Almost still

SRC21_REF 525 Susie Skin colour

SRC22_REF 525 Tempete Colour, movement

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Table A.2 – PSNRs in various colour spaces and the colour difference for SRC13 and SRC14

The data presents a comparison of color metrics across various sources, including Lab sYCC, sRGB, L*, Y, and ∆∆∆∆E values For instance, hrc1/src13 shows values of 20.5 for Lab, 23.2 for sYCC, 23.6 for sRGB, 26.3 for L*, and 8.3 for ∆∆∆∆E In contrast, hrc11/src14 has higher values, with 25.5 for Lab, 26.0 for sYCC, 26.1 for sRGB, 24.6 for L*, and a lower ∆∆∆∆E of 4.1 The data indicates variations in color performance across different sources, with hrc16/src14 achieving values of 26.0 for Lab, 26.0 for sYCC, 26.2 for sRGB, 27.4 for L*, and 5.3 for ∆∆∆∆E, highlighting the differences in color accuracy and consistency.

NOTE 1 hrc16/src14 and so on correspond to hypothetically degraded video (hrc16) from the reference source video (src14), respectively.

NOTE 2 All videos are in the size of 320 pixels x 240 pixels, each of which has a 24-bit colour depth.

Table A.3 – PSNRs in various colour spaces and the colour difference for SRC15 and SRC16

The data presents a comparison of color metrics across various sources, specifically focusing on Lab sYCC and sRGB color spaces Each entry includes values for L*, Y, and ΔE, indicating differences in color perception For instance, hrc1/src15 shows values of 11.8 for L*, 13.6 for Y, and 20.7 for ΔE, while hrc2/src16 has higher values of 27.1 for L*, 28.1 for Y, and a lower ΔE of 4.4 The trend continues with hrc3/src15 recording 15.1 for L*, 16.7 for Y, and 23.1 for ΔE, whereas hrc4/src16 presents values of 22.9 for L*, 23.7 for Y, and 6.0 for ΔE This pattern is consistent across all entries, highlighting variations in color representation and perceptual differences among the tested sources.

NOTE 1 hrc16/src14 and so on correspond to hypothetically degraded video (hrc16) from the reference source video (src14), respectively.

NOTE 2 All videos are in the size of 320 pixels x 240 pixels, each of which has a 24-bit colour depth.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Table A.4 – PSNRs in various colour spaces and the colour difference for SRC17 and SRC18

The data presents a comparison of color metrics across various sources, specifically focusing on Lab sYCC sRGB values For instance, hrc1/src17 shows values of L* = 15.8, Y = 19.2, and ∆E = 20.8, while hrc1/src18 has L* = 18.3 and ∆E = 23.2 Notably, hrc2/src17 records L* = 20.2 and ∆E = 26.6, indicating a trend of increasing values in subsequent sources The highest L* value of 22.8 is observed in hrc2/src18, with a corresponding ∆E of 28.0 As we progress through the data, hrc11/src18 shows L* = 21.5 and ∆E = 24.4, while hrc16/src18 reaches L* = 21.7 and ∆E = 23.9 Overall, the dataset illustrates a consistent pattern of increasing color metrics across the different sources analyzed.

NOTE 1 hrc16/src14 and so on correspond to hypothetically degraded video (hrc16) from the reference source video (src14), respectively.

NOTE 2 All videos are in the size of 320 pixels x 240 pixels, each of which has a 24-bit colour depth.

Table A.5 – PSNRs in various colour spaces and the colour difference for SRC19 and SRC20

Lab sYCC sRGB L* Y ∆∆∆∆ E Lab sYC

The data presented includes various metrics from different sources, labeled as hrc1 through hrc16, across two distinct years, 2019 and 2020 Each source provides a series of numerical values, indicating performance or measurements that vary by source and year For instance, hrc1 shows values ranging from 15.8 to 25.6 in 2019 and 12.7 to 20.2 in 2020, while hrc2 displays values from 20.6 to 28.6 in 2019 and 6.9 to 23.7 in 2020 This pattern continues across all sources, highlighting fluctuations in the data over the two years The analysis of these metrics can provide insights into trends and changes in performance across the different sources.

NOTE 1 hrc16/src14 and so on correspond to hypothetically degraded video (hrc16) from the reference source video

NOTE 2 All videos are in the size of 320 pixels x 240 pixels, each of which has a 24-bit colour depth.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

Table A.6 – PSNR’s in various colour spaces and the colour difference for SRC21 and SRC22

Lab sYCC sRGB L* Y ∆∆∆∆ E Lab sYCC sRGB L* Y ∆∆∆∆ E

The data presents various measurements across different categories, including HRC9 to HRC16, with two distinct sources for each category For instance, HRC9 shows values ranging from 28.5 to 30.7 in src21 and from 17.9 to 25.5 in src22 Similarly, HRC10 values range from 25.8 to 28.5 in src21 and from 18.2 to 24.2 in src22 HRC11 exhibits values between 26.7 and 31.0 in src21 and 18.0 to 25.6 in src22 The trend continues with HRC12, where src21 values range from 26.7 to 31.2 and src22 from 18.3 to 26.5 HRC13 shows a different pattern, with src21 values from 25.0 to 27.4 and src22 from 16.9 to 22.7 HRC14 maintains a similar range in src21 from 25.7 to 28.2 and src22 from 17.8 to 23.2 HRC15 and HRC16 both present consistent values in src21, with HRC15 ranging from 30.3 to 31.1 and HRC16 from 30.4 to 31.2, while src22 values for HRC15 range from 17.8 to 24.4 and for HRC16 from 18.1 to 25.4.

NOTE 1 hrc16/src14 and so on correspond to hypothetically degraded video (hrc16) from the reference source video (src14), respectively.

NOTE 2 All videos are in the size of 320 pixels x 240 pixels, each of which has a 24-bit colour depth.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

End-to-end objective assessment of video quality in the spatial frequency domain

The root mean square errors between corresponding blocks in the wavelet transformed domain corresponding to the reference video and the deteriorated video, which has been proposed in ITU-R 6Q/42-E.

Three-level wavelet transform is assumed Therefore, there are 10 blocks as shown in Figure

Figure B.1 – Assignment of the block numbers

Figure B.2 – Example of wavelet decomposition visualized

Reference videos listed in Table A.1 serve as item 1 in Figure 2 For item 2 of Figure 2, videos in uncompressed AVI format with reduced frame sizes must be prepared It is essential to embed frame numbers at this stage to facilitate the identification of received frames that correspond to the transmitted frames.

Encoded and transmitted streaming videos shall be continuously captured Pixel-by-pixel calculations should be conducted.

The root mean square errors between each of corresponding blocks p=1 10 to the original video frame and the deteriorated video frame k should be acquired as the following.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

In the wavelet domain, the coefficients for the reference red, green, and blue pixel data are denoted as \(c_{R}^{o, ijpk}\), \(c_{G}^{o, ijpk}\), and \(c_{B}^{o, ijpk}\) for the position \((i,j)\) of block \(p\) Conversely, the coefficients for the deteriorated red, green, and blue pixel data are represented as \(c_{R}^{d, ijpk}\), \(c_{G}^{d, ijpk}\), and \(c_{B}^{det, ijk}\) for the same position \((i,j)\) of block \(p\).

Deterioration d pk at the block p of the frame k in the wavelet domain should be evaluated by the sum square error as in equations (B.1) and (B.2).

∑∑    ∆ + ∆ + ∆    i j pk c ijpk c ijpk c ijpk d R 2 G 2 B 2 (B.1) where ijpk ijpk ijpk d ijpk ijpk ijpk d ijpk ijpk ijpk d c c c c c c c c c

The sum of square errors between blocks from wavelet transformed frames should be plotted against frame numbers, as illustrated in Figure B.3, along with the identification of reference video sources Additionally, it is essential to report measurement conditions, including frame size in pixels, frame rate, and streaming bit-rate.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

S quare error of wa ve le t c oef fic ient s

S quare error of wa ve le t c oef fic ient s

Figure B.3a – Example for SRC13_REF 525 Figure B.3b – Example for SRC14_REF 525

S quare error of wa ve le t c oef fic ient s

S quare error of wa ve le t c oef fic ient s

Figure B.3c – Example for SRC15_REF 525 Figure B.3d – Example for SRC16_REF 525

S quare error of wa ve le t c oef fic ient s

S quare error of wa ve le t c oef fic ient s

Figure B.3e – Example for SRC17_REF 525 Figure B.3f – Example for SRC18_REF 525

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

S quare error of wa ve le t c oef fic ient s

S quare error of wa ve le t c oef fic ient s

Figure B.3g – Example for SRC19_REF 525 Figure B.3h – Example for SRC20_REF 525

S quare error of wa ve le t c oef fic ient s

S quare error of wa ve le t c oef fic ient s

Figure B.3i – Example for SRC21_REF 525 Figure B.3j – Example for SRC22_REF 525

- video frame size: 320 pixels x 240 pixels;

- network bandwidth: more than 250 kbps;

- reproduction: Microsoft Media Player® version 7.1.

Figure B.3 – Trends of difference of coefficients of the wavelet transform between reference and streamed video frames at 250 kbps and 30 fps

To summarize the assessment, the acquired square errors must be averaged across all frames, as indicated in equation (B.3), to yield comprehensive metrics for objective evaluation The results should be presented as shown in Table B.1.

To evaluate the video quality rating (VQR) for each received video, the weighted sum of the metrics listed in Table 5 must be computed and reported in the rightmost column of Table B.1, following the calculation outlined in equation (B.4).

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU. p p p C w w

0 (B.4) where w 0 is an offset and w p , p=1 10 are the weights for VQR to be best correlated to the

DMOS for a set of the reference videos, studied by former ITU-R 10-11Q and ITU-R WP 6Q

Table B.1 – Summary of the difference of coefficients of wavelet coefficients

The values of the VQRs are directly influenced by a specific set of weights The provisional example provided in the rightmost column is based on weights developed at Chiba University in January 2002.

LICENSED TO MECON Limited - RANCHI/BANGALORE FOR INTERNAL USE AT THIS LOCATION ONLY, SUPPLIED BY BOOK SUPPLY BUREAU.

PEAQ objective measurement method outline

C.1 Basic concept of the PEAQ measurement algorithm

Ngày đăng: 17/04/2023, 11:45

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN