Robust and scalable video authentication issues and solutions 1

LIST OF FIGURES Fig 1.1 General model for video authentication using digital signature and watermarking ...5 Fig 2.1 Public key Digital Signature Scheme DSS ...14 Fig 2.2 Signature-ba

Trang 1

My thanks also go to Professor Tian Qi, who has always been so caring and helpful to me Every inspiring discussion I had with him gave me valuable rewards

I am not listing all the individuals who has helped me in one way or another, among them are

my colleagues in Pervasive Media Laboratory: Zhang Zhishou, Ye Shuiming and Zhou Zhichen They together have created a harmonious office atmosphere, which is extremely inductive to my research activities

I also want to extend a special thank to Wang Han of Cambridge University He has helped me smooth the language used in every paper I wrote

No words can express my gratitude to my wife, Chen Yunping, who has - and will continue to –provide invaluable and indispensable support for my pursuit of dreams She is the unsounded hero behind all my accomplishments My lovely babies, He Jia and He Xu, also deserve my mention of appreciation for the happiness and colours they have brought to my life

And, finally, I would like to give my most sincere thanks to my parents, who have passed away

a few years ago Their love would forever live in my heart, giving me strength in my path to pursue the dreams of my life

Trang 2

TABLE OF CONTENT

ACKNOWLEDGEMENT i

TABLE OF CONTENT ii

SUMMARY vi

LIST OF TABLES viii

LIST OF FIGURES ix

CHAPTER 1 INTRODUCTION 1

1.1 Robust Video Authentication 1

1.1.1 Objective 1

1.1.2 Source 3

1.1.3 Requirements 3

1.1.4 Models of video authentication 4

1.1.5 Object-based video authentication 5

1.1.6 Scalable video authentication 6

1.2 Theoretical Analysis 7

1.3 Structure of Thesis 8

CHAPTER 2 STATE-OF-THE-ART 10

2.1 Features in Image/Video Authentication 10

2.2 Image Authentication 13

2.2.1 Signature-based authentication 14

2.2.2 Watermarking-based authentication 19

2.2.3 Signature and watermarking based authentication 25

Trang 3

2.3 Video Authentication 29

2.3.1 Frame-based video authentication 30

2.3.2 Object-based video authentication 33

CHAPTER 3 A ROBUST OBJECT-BASED VIDEO AUTHENTICATION SYSTEM 36

3.1 Introduction 36

3.2 Introduction to MPEG4 Video Coding 39

3.3 Overview of the Proposed System 42

3.3.1 Targeted acceptable video processes 43

3.3.2 Brief system description 45

3.4 Feature Selection, Authentication Information Generation and Authenticity Verification 47

3.4.1 Feature selection 47

3.4.2 Authentication information generation and authenticity verification 56

3.5 Object-based Video Watermarking Algorithm (1) 60

3.5.1 Challenges and solutions for object-based watermark embedding and

extraction 60

3.5.2 Watermark embedding 63

3.5.3 Watermark extraction 65

3.5.4 Evaluation of the watermarking algorithm 66

3.6 Object-based Video Watermarking Algorithm (2) 69

3.6.1 Two important techniques 71

3.6.4 Evaluation of the watermarking algorithm 76

Trang 4

3.7 Experimental Results 78

3.8 Summary and Future Works 82

CHAPTER 4 A SCALABLE VIDEO AUTHENTICATION SYSTEM 84

4.1 Introduction 84

4.2 Brief Introduction to Video Streaming and Transcoding 89

4.3 Overview of the Proposed Scheme 92

4.3.1 System general requirements and overview 92

4.3.2 Countermeasures to the transcoding 95

4.4 Authentication Information Generation and Authenticity Verification 98

4.4.1 Authentication information generation 98

4.4.2 Authenticity verification 100

4.5 Video Watermarking Scheme Robust to Transcoding 101

4.6 Experimental Results and System Performance Analysis 109

4.7 Summary and Future Works 113

CHAPTER 5 THEORETICAL ANALYSYS ON SELF-EMBEDDING VIDEO AUTHENTICATION SYSTEM 114

5.1 Introduction 114

5.2 Relation between Feature Difference and Video Distortion 119

5.2.1 Mutual information and rate distortion function 119

5.2.2 Feature difference vs video distortion 120

5.2.3 Results of evaluation 131

5.2.4 Summary 134

Trang 5

5.3 Watermarking Capacity 134

5.3.1 General watermarking capacity 135

5.3.2 Specific watermarking capacity 138

CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 147

6.1 Conclusions 147

6.2 Future Works 149

REFERENCE 150

APPENDIX 167

Trang 6

SUMMARY

As a by-product of the rapid development of digital technologies, attacks on valuable video without any noticeable degradation to the video quality become easy Video authentication aims to ensure the trustworthiness of the video by verifying the integrity and the source of the video data

In this thesis, we investigate the issues in designing a video authentication system and then propose two solutions using technologies such as digital signature, watermarking, error correction coding (ECC), etc Both are secure, robust and content authentication systems, in which the received video is considered as authentic as long as the video content remains unchanged These

two systems, however, have different design considerations because they are for different

applications

The first proposed system is an object-based video authentication system for MPEG4-based video applications This system can tolerate object-based processing such as object-segmentation and RST (rotation, scaling and translation) besides the normal MPEG compression; on the other

hand, object-based attacks, such as object modifications, object replacement and background

replacement can be detected To protect the integrity of video, we first propose to generate authentication information using features of both the object and background

The second proposed system is a scalable video authentication system for video streaming We

mainly focus on video transcoding, including format conversion, frame dropping and quantization While being robust to such transcoding, the system is capable of detecting malicious attacks such as content modification and injection of commercials or offensive materials into the video stream Compressed-domain processing not only reduces the computation complexity of the proposed method but also makes the proposed method compliant with the state-of-the-art transcoders as most of them are performed in the DCT domain rather than the pixel domain

Trang 7

In addition, we present our works in analyzing the two important components, feature selection

and watermarking capacity, in a “self-embedding” video authentication system, which covers the

two proposed systems In this analysis, we theoretically analyze the relationship between video

distortion and its introduced feature difference once a certain set of feature is selected based on the

requirements of completeness, sensitivity and robustness Furthermore, we estimate the capacity of

watermarking schemes, which are designed for authentication applications, from the point of reliable detection instead of information theory

Trang 8

LIST OF TABLES

Table 2-1 Features for image/video authentication 13

Table 3-1 Acceptable video processing and their parameters for system evaluation 44

Table 3-2 2 bits Quantization 51

Table 3-3 4 bits Quantization 51

Table 3-4 System performance when the video object undergoes various video processes 79

Table 4-1 The relationships between DCT coefficients in the QCIF video and their corresponding DCT coefficients in the CIF video 107

Table 4-2 Bit-rate comparison before and after watermarking 112

Trang 9

LIST OF FIGURES

Fig 1.1 General model for video authentication using digital signature and

watermarking 5

Fig 2.1 Public key Digital Signature Scheme (DSS) 14

Fig 2.2 Signature-based image authentication system 15

Fig 2.3 Block-diagram of integrity verification in [13] 16

Fig 2.4 A general model of authentication solutions that using digital signature and watermarking technologies .27

Fig 2.5 Image authentication classification 28

Fig 2.6 A pactical solution for stream signing 33

Fig 3.1 An example of video surveillance system .39

Fig 3.2 Block diagram of the proposed system .39

Fig 3.3 A video frame is a composition of one video object and background 40

Fig 3.4 Block-diagram of object-based video coding 41

Fig 3.5 Structure of VOP encoder 42

Fig 3.6 VOP Formation 42

Fig 3.7 Attack that only modify object content while preserving its shape 49

Fig 3.8 36 normalized ART coefficients from the first VOP of video “Akiyo” 49

Fig 3.9 Videos for feature selection evaluation .52

Fig 3.10 The maximum Hamming distances between feature vectors of the original object and object having undergone normal processes 54

Fig 3.11 Mask difference between the original and processed objects This difference is similar to the segmentation error .55

Fig 3.12 The distance between feature vectors of different video objects 55

Fig 3.13 Procedure for authentication information generation 57

Trang 10

Fig 3.14 Illustration of authentication information generation 58

Fig 3.15 Procedure for authentication verification .59

Fig 3.16 16 DFT coefficients grouping for watermarking The 61

Fig 3.17 Comparison between DFT of the original and scaled images .63

Fig 3.18 Area classification in the DFT domain 63

Fig 3.19 Surveillance video “Dajun” 66

Fig 3.20 PSNR of watermarked video “Akiyo” 67

Fig 3.21 Bit Error Rate (BER) of the extracted watermark under video processes 68

Fig 3.22 Block diagram of watermark embedding 70

Fig 3.23 Block diagram of watermark extraction 70

Fig 3.24 Four adjacent sampling points around A' 73

Fig 3.25 Watermarking area in the LPM domain 75

Fig 3.26 PSNR of watermarked video .76

Fig 3.27 The robustness to acceptable video processes 77

Fig 3.28 Robustness to compression 77

Fig 3.29 Comparison between the original and signed video objects (“Akiyo”) .79

Fig 3.30 The relationship between the correctly authenticated video frames and the scaling factor (“Bream”) 80

Fig 3.31 Attacked “Akiyo” videos for evaluation 82

Fig 4.1 A simple solution for signing pre-coded video streams 86

Fig 4.2 A pactical solution for stream signing 87

Fig 4.3 Stream authentication resilient to packet loss 88

Fig 4.4 Typical transcoding methods for scalable video streaming 89

Fig 4.5 The relationship between 4 blocks in CIF and one block in QCIF 91

Trang 11

Fig 4.6 Block-diagram of the scalable video authentication system 93

Fig 4.7 Diagram of the proposed solution that is robust to video transcoding 94

Fig 4.8 Illustration of authentication information generation 100

Fig 4.9 Procedure of authenticity verification 101

Fig 4.10 Watermark extraction 102

Fig 4.11 First frames of 5 testing videos employed for evaluation 109

Fig 4.12 Bit Error Rate of the extracted watermark under transcoding: CIF to QCIF conversion .110

Fig 4.13 Bit Error Rate of the extracted watermark under transcoding 110

Fig 4.14 Bit Error Rate of the extracted watermark under transcoding: re-quantization and CIF to QCIF conversion 110

Fig 4.15 PSNR of video before and after signing 112

Fig 4.16 The signed video vs its attacked video .113

Fig 5.1 A “self-embedding” authentication system 115

Fig 5.2 Sphere covering for feature selection 117

Fig 5.3 Relationships among visual quality, robustness and amount of embedded information 118

Fig 5.4 Feature difference and video distortion 122

Fig 5.5 Relationship amongH(V0), ( ) 2 0 V V H and I(V0;V2) 126

Fig 5.6 Maximum tolerable feature differences for video “Akiyo” 132

Fig 5.7 Relationship between feature difference and quantization step .133

Fig 5.8 Watermarking communication problem 135

Fig 5.9 The watermarking capacity can be calculated in four parts 146

Trang 12

Declaration

Part of this thesis has been published or submitted in the following conferences and journals:

1 Dajun He, Qibin Sun and Qi Tian, “An Object Based Watermarking Solution for MPEG4 Video Authentication”, ICASSP2003, HongKong, April 2003

2 Dajun He, Qibin Sun and Qi Tian, “A Semi-fragile Object Based Video Authentication System”, ISCAS2003, Thailand, May 2003

3 Dajun He, Qibin Sun and Qi Tian, “A Robust Object-based Video Authentication System”, IEEE International Conference on Information Technology: Research and Education (ITRE), USA, August 2003

4 Dajun He, Tian-Tsong Ong, Zhishou Zhang and Qibin Sun, “A Practical Watermarking Scheme Aligned with Compressed-domain CIF-to-QCIF Video Transcoding”, ICICS-PCM2003, Singapore, December 2003

5 Qibin Sun, Dajun He, Zhishou Zhang and Qi Tian, “A Secure and Robust Approach to Scalable Video Authentication”, ICME2003, USA, July 2003

6 Qibin Sun, Dajun He, Zhicheng Zhou and Shuiming Ye, “Feature Selection for fragile Signature-based Authentication Systems”, IEEE International Conference on Information Technology: Research and Education (ITRE), USA, August 2003

Semi-7 Dajun He, Zhiyong Huang, Ruihua Ma and Qibin Sun, “Feature Difference Analysis in Video Authentication System”, ISCAS2004, Canada, June 2004

8 Dajun He, Qibin Sun and Qi Tian, “A Secure and Robust Object-based Video Authentication System”, accepted, EURASIP J on Applied Signal Processing, Special issue on Multimedia Security and Right Management, November 2004

9 Dajun He and Qibin Sun, “A RST Resilient Object-based Video Watermarking Scheme”, ICIP2004, Singapore, October 2004

10 Qibin Sun, Dajun He and Qi Tian, “A Secure and Robust Authentication Scheme for Video Transcoding”, submitted to IEEE CSVT

Trang 13

CHAPTER 1 INTRODUCTION

This thesis addresses the problem of video authentication After examining the issues of video authentication, some solutions for these issues will be provided

With the rapid development in digital technologies, video applications are infiltrating into our daily life in an accelerated speed, from traditional television broadcasting to Internet/Intranet, wireless communication and consumer products such as VCD/ DVD It is not excessive if we call

it daily life revolution The digital technologies behind this revolution include hardware and software:

Hardware: the increment of hardware function including chip’s processing power, bandwidth of

media/ channel, capacity of storage device and etc

Software: different advanced compression algorithms

Especially, the incredible increment of chip’s processing power not only makes real-time video processing possible, which significantly broadens the prospective areas of video applications; but also simplifies video editing, which greatly enriches the content of video program However, the latter, which allows the video content to be easily modified without any noticeable degradation to the video quality, decreases the trustworthiness of video In other words, “seeing is believing” is

no longer true unless the trustworthiness of the video can be justified Thus, many techniques of video authentication are developed to protect this trustworthiness by verifying the integrity and source of the video data

1.1.1 Objective

Any video applications include at least three parties: provider, receiver and the third party The provider sends video to the receiver via the third party Hence, the provider is also called as the

Trang 14

sender in some applications Here, the third party is a general concept It could be either a storage device in consumer products, or a noisy channel in video transmission An attacker who intends to modify the video content is also classified into the third party

Video authentication techniques are developed to verify whether the received video is the original one (Integrity protection); or whether the video is from a particular provider (Repudiation prevention) Lin and Chang [1 ] classified the multimedia authentication techniques into two

categories: complete authentication and content authentication In complete authentication, no

change in the multimedia data is allowed In content authentication, as long as the meaning of multimedia data remains unchanged, the received multimedia data is considered as authentic,

regardless of the processing or transformation the multimedia data has undergone

Obviously, video authentication should be content authentication because it is usually unnecessary that a receiver must obtain an exact copy of the original video without any distortion

For example, digital video is usually in compressed format due to its huge volume; and most video

compressions, such as MPEG1/2/4, are lossy compression As a result, the de-compressed video is

not identical to the original one However, it should still be considered to be authentic Another

example is video transcoding, in which the bit-rate of a video stream is adjusted to adapt to variable transmission channel

Thus, video authentication theoretically should be robust to all normal video processes Nevertheless, it is a difficult task to define all acceptable video processes due to the huge diversity

of video applications For instance, the object-based video processing such as Rotation, Scale and Translation (RST), is very different from the traditional frame-based video process Therefore, video authentication systems are usually application-dependent

Trang 15

1.1.2 Source

The source of video authentication could be either raw data or compressed data Since most of the video are stored or transmitted in the compressed format (MPEG1/2/4), an authentication solution performed in the compressed domain is preferred in order to reduce the computation load On the other hand, to increase the trustworthiness, in some applications such as video surveillance, we prefer to insert the authentication information into the video immediately after it is captured but before it is subsequently processed and stored For example, in a trustworthy digital camera [2], the image authentication solution is embedded in the camera Hence, in such a case, a video authentication solution that performs on raw data is needed In practice, it is unnecessary that a video authentication solution should be able to perform on both raw data and the compressed data, because such a solution may be more complex than a solution that only operates on either raw data

or compressed data

1.1.3 Requirements

Based on particularities of video applications, the following requirements must be considered in designing a robust video authentication system:

• Blindness: No original video could be available during authenticity verification

• Robustness: Be robust to compression, transcoding, and any other content preserving video processing Different applications may have different robustness requirements

• Complexity: It includes calculation complexity and memory requirement Complex calculation may make real time authentication impossible; and high memory requirement, which is common in the cases when many video frames are employed to authenticate one video frame, will cause delay

• Sensitivity: The authentication system should be sensitive to malicious manipulations

Trang 16

The relative significance of above requirements varies in different applications For example, the complexity is less important than the security in military applications; however, the reverse is true in many civil or domestic applications

1.1.4 Models of video authentication

Video authentication solutions can be roughly classified into signature-based authentication, watermarking-based authentication, and signature and watermarking based authentication

Signature-based video authentication originates from signature-based data authentication since the latter provides a good solution for integrity protection and repudiation prevention, which conform to the objective of the former Initially, many researchers directly applied crypto signature schemes in data authentication, such as DSA and RSA [3], to create an image/ video authenticator [2, 4 , 5 , 6 , 7 ], which belongs to complete authentication Later, feature of image/video is employed to create a content-based signature in image/video authentication, which belongs to content authentication We will analyze features of image/video in details in later chapters The obvious shortcoming of signature-based authentication is the large size of the signature An alternative method to reduce the signature size is crypto-hashing the feature before encrypting it If locating manipulation area is not a major concern, this alternative method is a

good choice in video authentication system To send the digital signature to the receiver, the

digital signature can be stored in the header of the compressed bit-stream (for example, “user data”

in MPEG bit-stream), or a separate file attached to the video This method, however, has some disadvantages: the size of video data will be increased; and the digital signature may be lost in applications that involve multi-cycle compression

Watermarking-based authentication solutions modify the video data to embed some defined codes, which could be user ID or feature of image/video So, no additional data is required However, watermarking-based approaches only works well in protecting the integrity of

Trang 17

user-the content but are unable to solve user-the non-repudiation issue caused by user-the use of a symmetric key

in watermark embedding and extraction [8]

Thus, latest authentication solutions tend to combine signature-based authentication and watermarking-based authentication together A general model of such video authentication solutions is shown in Fig 1.1 In the providing site (or sending site), features of the video, together with the user defined information, is encrypted by a private key to create a content-based

signature; subsequently, this signature is further encoded to generate a content-based watermark;

this watermark is finally embedded into the video to obtain a signed video During authenticity verification, the watermark is extracted from the received video before the digital signature is obtained by decoding this watermark; by decrypting the signature using a public key, the feature of the original video can be extracted again; finally, the feature of the received video is compared with the feature of the original video to decide the authenticity of the received video

Embedder

Private Key

Video

Watermark Generator

Channel / Storage

Watermark Extraction

Feature Extraction Signature

Extarction

Authenticity Decision

Public Key User Info.

Fig 1.1 General model for video authentication using digital signature and watermarking

Based on the model of signature and watermarking based video authentication, we will propose two video authentication solutions for different video applications in this thesis

1.1.5 Object-based video authentication

Nowadays, the object-based MPEG4 standard [9] is becoming growingly attractive to various applications in areas such as the Internet, video editing and wireless communication because of its

Trang 18

object-based nature For instance, in video editing, it is the object of interest, not the whole video, which needs to be processed; in video transmission, if the bandwidth of the channel is limited, only the objects, not the background, are transmitted in real time Generally speaking, object-based video processing can simplify video editing, reduce bit rate in transmission, and make video search efficient Such flexibilities, however, also pose new challenges to multimedia because the video object (VO) can be easily accessed, modified or even replaced by another VO

The objective of this solution is to propose a secure, robust and object-based video authentication solution for protecting the authenticity between video objects and their associated backgrounds In this content-based video authentication system, techniques of watermarking, error correction coding (ECC) and digital signature are employed together to ensure that the system should be robust to MPEG4 compression, object segmentation errors, and some common object-based video processing such as object RST while securely preventing malicious attacks such as

content modifications, object or background replacement In this solution, Angular Radial

Transformation (ART) [10, 11] coefficients are selected as the feature to represent the video object and the background respectively; ECC and cryptographic hashing are applied to generate a robust authentication watermark; the content–based, semi-fragile watermark is embedded into the objects frame by frame before MPEG4 coding

The system’s robustness to normal video processes is guaranteed by ECC and watermarking; and the system’s security is protected by cryptographic hash [3]

1.1.6 Scalable video authentication

With the rapid progresses in multimedia and broadband network technologies, advanced

multimedia services become more and more popular Examples of such services include videoconference, distance learning, networked video, and advanced video workstation In these applications, scalable video streaming, in which transcoder is often adopted for adapting the coded

Trang 19

video bit-stream to dynamic changes in channels as well as user devices, is required Popular transcoding approaches include resizing, frame dropping, and re-quantization

The objective of this solution is to propose a secure and robust authentication scheme for

scalable video streaming This scheme must be not only robust to video transcoding but also capable of detecting malicious attacks such as content modification and injection of commercials

or offensive materials into the video stream Since many advanced transcoders are performed in the DCT domain rather than the pixel domain due to the requirement of low computation load, this

scalable video authentication is also performed in the DCT domain in order to be compatible with

the video transcoding By employing ECC in different ways, the proposed scheme is an end-to-end authentication scheme independent of transcoding infrastructure and provides a good compromise

between system robustness and security

The difference between features extracted from the received and the original video is employed

to decide the authenticity of the received video in a “self-embedding” video authentication system

If the difference exceeds a threshold, the received video will be declared as an un-authentic video This threshold, which refers to the maximum allowable feature difference between the original video and the video that has undergone acceptable video processes, should be determined before

an authentication system is designed Nevertheless, to the best of my knowledge, theoretical analysis of this threshold has never been provided up to now; and many researchers determine the threshold in an empirical way [12, 13, 14] So the appropriateness of the threshold cannot be

Trang 20

guaranteed It depends heavily on the number of acceptable video processes and the number of videos that are used for evaluation This motivates us to develop a theoretical approach to analyze the relationship between the feature difference and the video distortion due to the normal video processing or/and malicious attacks, and subsequently derive a theoretical threshold

Watermarking capacity is the theoretical upper bound of the information that an image or a

video frame can be hidden, regardless of the details of the watermarking schemes In video authentication system, a more critical and practical issue is the number of watermark bits that can

be hidden and then reliably detected from the watermarked and then distorted image/video frame, once a watermarking algorithm is fixed This motivates us to estimate watermarking capacity from the point of reliable detection

In this thesis, after investigating issues in video authentication, we have proposed two secure and robust solutions, which incorporate technologies like digital signature, watermarking, ECC and cryptographic hash, for various applications In addition, two important components in video authentication system, feature selection and watermarking capacity, have also been theoretically analyzed

In Chapter 2, we will review the previous works in the area of video authentication Works in image authentication will also be reviewed since they induce many video authentication solutions

In fact, some image authentication solutions can be directly employed in the video authentication

if a video sequence is considered as a series of video frames

In Chapter 3, we will propose an object-based video authentication solution, which is robust to incidental distortion while being able to detect intentional distortion (We define the distortions

introduced by malicious attacks as intentional distortions and the distortions introduced by acceptable video processes as incidental distortions.) In this chapter, a new feature, which is

Trang 21

robust to object-based video processing, is proposed followed by some signing schemes, which ensure that a legal receiver to detect the authenticity of the received video can extract the feature

of the original video again The watermarking scheme is performed in the DFT domain or the Polar domain

In Chapter 4, we will propose a scalable video authentication solution, in which we will explain the methods employed to design an authentication system robust to most common video transcoding approaches such as frame dropping, frame size conversion and re-quantization, in the DCT domain

In Chapter 5, we address two important issues: one is the method used to theoretically analyze the relation between feature difference and distortion of video content; the other is the watermarking capacity The concept of mutual information [15] will be employed to analyze the theoretical relationship between feature difference and distortion of video content As an example

of the application of this relationship, the maximum allowable difference between features of the original video frame and that of the processed video frame will be studied Instead of presenting the theoretical upper bound of the information that can be hidden in a video frame (general definition of watermarking capacity), in this chapter, we try to estimate the capacity of a video frame from the point of reliable detection, given a particular watermarking scheme that is designed for video authentication application

In Chapter 6, we will present the conclusions of this thesis and discuss the direction of some future works

Trang 22

CHAPTER 2 STATE-OF-THE-ART

In this chapter, we will review works on video authentication To make the review more comprehensive, works on image authentication will also be included for the following reason: Image authentication could be considered as a special form of video authentication, or the latter can be seen as the extension of the former to certain extent In spatial domain, video could be considered as a collection of video frames or images; even in the compression domain, I frame within a GOP (in MPEG1/2/4 format) can be comparable to a JPEG compressed image Thus, many image authentication solutions can be directly employed in video authentication with little or

no modification

Since video authentication should be a content authentication rather than complete

authentication in most applications as we have explained in Chapter 1, the method employed for

extracting features to represent the image/video becomes the immediate critical issue we must face

in designing a content authentication solution Thus, in this chapter, we will also brief on features

employed in image/video authentications

Features employed in content authentication should meet following requirements:

• Completeness: The features must be able to represent image/video Features from similar images/videos should be similar while features from different images/videos should be different

Trang 23

• Sensitivity: The features must be able to detect the content modification in image/video

• Robustness: Robust to content-preserving processing This property is only required in

a semi-fragile authentication solution, which is not only robust to content-preserving processing, but also able to detect content modification

As listed in Table 2-1 at the end of this section, many types of features have been adopted in image authentication

The simplest way is selecting image itself as the feature Wong and Memon [16] set the LSB of each pixel to zero and employed the image with zero-setting as the feature of the original image And Byun et al [17] used the LSB bit-plane of image in R, G channel as the feature for a color image This type of feature is useful in localizing modifications on image/video frame since even one pixel modification can be detected However, it is a completely fragile feature and not suitable for content authentication

To improve the robustness of feature, some researchers divide image into blocks with a specific size and select statistical parameters of the blocks as the feature of the image Although the capability of modification localization is preserved in such a feature, the localization unit is no

longer pixel but block Two commonly used parameters are mean and variance [18,19] In [20],

the MSB of the mean, rather than the mean itself, is employed as the feature to further improve feature’s robustness One benefit of these algorithms is that means of the original blocks could be used for image recovery: during verification, if one block is detected to have been modified, this block could be replaced with the mean of its corresponding block However, this type of feature is block-wise independent This independence makes authentication solutions, based on this scheme, vulnerable to a block-wise counterfeiting attack [21] To overcome this weakness, [22,23] create a feature of block from itself and its neighboring blocks

Trang 24

Histogram of intensity, which could tolerate compression, is also employed as the feature to represent image Schneider and Chang [24] used the histogram of each block as the feature; Coltu

et al [25] calculated histogram based on the selected pixels, which is pre-set by a private key Bhattacharjee and Kutter [26] proposed to exploit “perceptually interesting” image feature, which is also known as visually salient image feature [27], in a semi-fragile authentication system due to its robustness to most image transformations This type of features could be edges [12, 13]

or critical pixel set in image [28] Sun and Chang [29] used the Significance-Linked Connected Component (SLCC) in wavelet domain as the invariant feature while Nour El-Din and Moniri [30] extracted Perceptual self-similarity feature of an image from automate domain Chang et al [31] pointed out that the acceptable manipulations are usually global distortions while the illegal manipulations tend to be localized distortions Based on this observation, they compressed the original image by a content-based compression into an extremely low bit rate version and employed this compressed version as the feature of the image The content-based compression is guided by a space variant weighting function

Since most image/video processing is performed in the DCT domain and it is also easier to locate a visually important component in the DCT domain, some authors proposed to select features in the DCT domain Zou et al [32] used the quantized DCT coefficients as the feature; and Sun [8] employed the remainders of the quantized DC and 3 AC coefficients in one block as the feature of that block To improve feature’s robustness to compression, some researchers generate image feature from groups of DCT coefficients instead of individual DCT coefficients Lin and Chang [33] used the relationship between DCT coefficients in the same position of different DCT blocks as the feature and claimed that this relationship is robust to JPEG compression Wang et al [ 34 ] pointed out that this relationship would be fluctuant under some “content preserving” modification if the difference between two DCT coefficients is small enough and proposed a more robust algorithm, in which the partial energy relation between two DCT blocks is selected as the

Trang 25

feature Uehara et al [ 35 ] proposed a flexible authentication system and claimed that Lin’s algorithm can be considered as a special case of their algorithm

With the increase in object-based image/video application, a feature robust to object-based processing is also required Compared with image or traditional frame-based video, object also includes shape information besides the texture information Thus, feature of object should also include shape information Moment-based feature, which has been widely studied in pattern recognition, is employed in this type of features Datta et al [36] employed the Hu’s moment and Flusser’s moment to represent an object

Table 2-1 Features for image/video authentication

Wavelet

DCT domain

Quantized DCT coefficients [8, 32];

Relationship between DCT coefficients [33,35]

As stated in Chapter1, authentication solutions can be classified into 3 categories: signature-based authentication, watermarking-based authentication, and signature and watermarking based

Trang 26

authentication Thus, we will review previous works in this order and finally summarize these works in Fig 2.5

2.2.1 Signature-based authentication

Data authentication solution should be possible for the receiver of the message to ascertain its origin; an intruder should not be able to masquerade as someone else [3] In other words, it is a process of integrity verification and repudiation prevention For image authentication, the capabilities of integrity verification and repudiation prevention are also required The only

difference is the definition of “integrity” In data authentication, even one bit alteration on the data

is not allowed; in image authentication, any manipulations on the image are allowed as long as this

manipulation is content-preserved

Digital signature scheme (DSS) is a typical technology for data authentication, which includes signature generation and signature verification A general public key DSS is shown in Fig 2.1 The left part is the procedure for signature generation; and the right part is the procedure for signature verification A sender’s private key is used to encrypt the message to generate a digital signature while a public key is used to decrypt the digital signature to get the original message The Public Key Infrastructure (PKI) scheme well solves the problem of repudiation prevention while data integrity could be verified by checking whether the original message is correctly obtained since any bit modification on the signature will lead to the failure in getting the original message The famous digital signature schemes are RSA and DSA [3]

Fig 2.1 Public key Digital Signature Scheme (DSS)

Trang 27

Due to its great success in data authentication, DSS is also employed in image authentication A general signature-based image authentication system can be depicted as Fig 2.2 The upper part is

the block-diagram for digital signature generation; and the lower part is the block-diagram of

authenticity verification The PKI signing and decryption in image authentication are similar to those in data authentication Based on the input of the encryption (signing) module, signature-based image authentication solutions can be further classified into 2 sub-categories: non-hash and hash digest, which are depicted as (a) and (b) in Fig 2.2

Fig 2.2 Signature-based image authentication system

a Non-hash

In this type of image authentication, the feature of the original image is signed by the sender’s private key to generate a digital signature; this digital signature is sent to end user associated with the original image During verification, the original feature is obtained by decrypting the received digital signature using a public key and then compared with the feature extracted from the received image to decide the authenticity of the received image This type of schemes includes [13, 18, 26,

28, 33] The received image can be claimed as an authentic one only if two conditions are met: the

public key is allowable; and the original and extracted features match

Trang 28

It is not easy to decide whether two features match, because features from the original and received images may have some difference even if the received image has only undergone content-preserving processing Usually, a threshold (Th) is employed to decide whether this difference is allowable Let and represent the features of the original and received images respectively The authenticity can be decided by

f f

Authentic Th

f f

r o

|

As an example, we draw the block-diagram of integrity verification in [13], shown as Fig 2.3

In [13], edge is employed as the feature of image; a difference between edge bit-maps of the original and compressed images may exist since most compression techniques will distort the contours During verification, firstly, the edge difference bit-map is calculated; and secondly, error relaxation is employed to delete the false edges in the edge difference bit-map; thirdly, the maximum connected region in the error difference bit-map is computed; and finally, integrity violation is decided if the maximum connected region exceeds a pre-defined threshold

Fig 2.3 Block-diagram of integrity verification in [13]

In [13], the value of the threshold is determined in a statistical base, after further tests, using different compression rates and different kinds of manipulation Nevertheless, we find that this value is hard to determine in practice

Another shortcoming of this type of scheme is the size of the signature Since the size of the

signature is proportional to the size of the feature, it can be excessively large in some cases This

Trang 29

will significantly increase the computation in signature generation and verification; and the storage

of this kind of signature will also be a problem An alternative solution is hashing the feature before signing it

The concept of content hash is first proposed by Fridrich and Goljan [37, 38] They call it visual hash or robust hash In their scheme, N random smooth patterns with uniform distribution in the interval [0,1] are generated using a secret key; image is divided into equal size blocks; and each block is projected into these patterns to extract N bits data string, which is robust to different types

of image manipulations such as JPEG compression Data strings of all blocks are concatenated into a content hash digest The security of this content hash lies in the confidentiality of the random smooth pattern; an attacker cannot modify the projection without the knowledge of the secret key, which is similar to the crypto-hash

Another type of content hash is Approximate Message Authentication Codes (AMAC) [39], which is generated through a series of processing including pseudorandom permutations, masking and majority voting AMACs provide a solution for measuring the similarity between two different images in a short checksum However, limitations exist: for example, it cannot distinguish modifications caused by intentional attack from normal image manipulation such as compression [40] To overcome this shortcoming, Xie et al extended AMAC to Approximate Image MAC

Trang 30

(IMAC), in which the most significant bits are extracted followed by a parallel AMAC computation to generate the IMAC of an image [40] The designed length of IMAC in their algorithm is 128 bits; and they claimed that this type of IMAC is robust to moderate compression

while being able to detect and locate intentional attack

A threshold is still required to verify the integrity of the received image in the above two solutions Furthermore, locating the local attack is also a difficult task if the digital signature is generated using content hash

b.2 Crypto-hash

A crypto-hash function is a one-way function, which has following good properties:

• Given a message m and a hash function H, it should be easy and fast to compute the hash digest h = H(m)

• Given hash digest h, it is very hard to compute m such that h = H(m)

• Given m, it is hard to find another message m’ such that H(m’) = H(m)

These properties of one-way function ensure the security of crypto-hash function against various

attacks on both message and hash digest The security is proportional to the length of its output

For example, SHA-1 [3] is a typical crypto-hash functions; the output of SHA-1 is 160bits; the attack should try average random messages to obtain two messages with identical hash digest

160

2

Due to its security and fixed length output from a message of any length, crypto-hash has been widely employed in generating a digital signature for image authentication [12, 16, 24] In such image authentication schemes, the input of the crypto-hash function is image feature; and the output is the hash digest

Unlike the content hash, in which the similarity is proportional to the similarity between two images, in crypto-hash, one bit difference in the input feature will result in a totally different output (hash digest) Thus, the threshold employed in content hash based image authentication is

Trang 31

no longer required in crypto-hash based image authentication The authenticity is decided by comparing two hash digests: one is from the original image, which is extracted by decrypting the digital signature; the other is generated from the received image Even if only one bit difference between the two hash digests exists, the received image will be considered as a tampered one The property of crypto-hash that one bit difference in the input feature will result in a totally different hash digest, however, will decrease the robustness of the image authentication solutions, because acceptable manipulations may cause changes to the feature though these changes may be small compared with the changes caused by content-altering attacks To improve the robustness, Sun et al [ 41 ] proposed a new semi-fragile authentication solution, in which feature Error Correcting Coding (ECC) scheme is exploited to tackle feature distortions caused by acceptable image manipulations Since watermarking is often involved in this type of image authentication,

we will explain it in Section 2.2.3

2.2.2 Watermarking-based authentication

In signature-based authentication, the digital signature is stored either in the header of format or in

a separate file So, the risk of losing the signature is always a major concern Watermarking-based image authentication solves this problem by embedding a watermark in the original image

Most watermarking-based authentication solutions verify the authenticity of image by comparing the original watermark with the watermark extracted from the image to be authenticated The watermark could be a binary string, a logo, feature of image or even a compressed version of image; and the watermarking could perform in either spatial domain or transform domain (DCT, DWT, DFT, etc) According to the distortions caused by watermarking, authentication solutions could be divided into lossless authentication and lossy authentication; According to their robustness to acceptable manipulations, authentication solutions could be divided into fragile authentication and semi-fragile authentication Furthermore, some

Trang 32

authentication solutions employed multiple watermarks for different purposes Although lossless authentication should be a fragile one, in this chapter, we will classify it as an individual class Thus, we will review the existing watermarking-based authentication solutions following lossless authentication, fragile authentication, semi-fragile authentication and multipurpose watermarking based authentication

a Lossless authentication

Although watermark is inserted into the original image by modifying the original image

imperceptibly, some distortions will be introduced This distortion is unacceptable in some cases,

such as military and medical usage In lossless authentication solution, this type of distortion can

be completely removed Fridrich et al [42] proposed an invertible authentication solution, in which the LSB bit-plane of the selected quantized DCT coefficients is losslessly compressed to make some space for embedding the watermark, hash digest of the quantized DCT coefficients During verification, the hash digest is extracted before the LSB bit-plane is decompressed to calculate the hash digest of the received image The two hash digests are subsequently compared bit by bit to verify the authenticity of the received image To increase the payloads of watermarking,Fridrich

et al further proposed a RS-vector lossless authentication solution in [43] Mohan et al [44] also proposed a high capacity lossless watermarking algorithm for authentication by embedding the watermark in a specific number of LSBs in the texture blocks instead of the LSB bit-plane [42]; the selection of LSBs is guided by Human Visual System (HVS); and the selected LSBs are compressed using arithmetic coding to make sufficient space to embed watermark or authentication information

Meanwhile, Shi et al [32] proposed a content-based lossless authentication system, in which a tag or watermark consisting of the digital signature generated from the quantized DCT coefficients and the index of the corresponding quantization function, is embedded into the image using

“Circular Histogram” algorithm

Trang 33

b Fragile authentication

In fragile authentication, any manipulations on the image could be located or detected by comparing the original and extracted watermarks Yeung and Mintzer [45] proposed a scheme that can authenticate individual pixels In this scheme, a key dependent binary look-up-table (LUT) is employed as a watermark extraction function to extract watermark pixel-by-pixel; a binary watermark image is embedded into the original image by modifying every pixel value until the extracted watermark bit from this pixel is identical to the desired watermark bit A similar LUT is also used in Wu and Liu’s solution [46], in which watermarking is performed in the DCT domain

Fridrich et al [47] pointed out some weak points in Yeung and Mintzer’s scheme such as key management, and proposed an improved version, in which the key dependent LUT for a single pixel is replaced by an encryption map However, Fridrich’s scheme still has some drawbacks, for example, the cropping on the right and at the bottom of the watermarked image could not be detected [48] In [48], Li et al proposed an improved version of Fridrich’s scheme

Wong and Memon [16] proposed a public key based fragile authentication scheme, in which a watermark, which consists of the size of image, the approximation of image and block information, is embedded into the LSB of each pixel of image If the correct key is specified in the watermark extraction procedure, then an output image is returned showing a proper watermark; any modification would be reflected in a corresponding error in the watermark If the key is incorrect, or if the watermarked image is cropped, an image that resembles random noise will be returned Since it requires a user key during both the watermark insertion and extraction procedures, it is not possible for an unauthorized user to insert a new watermark or alter the existing watermark The scheme, which can detect any modifications on the image and indicate the specific locations that have been modified, however, is vulnerable to block-wise counterfeiting attack To thwart such attacks, Lu and Liao [49] proposed a pixel-wise fragile

Trang 34

authentication system while Celik ei al [50] gave a hierarchical watermarking scheme in which a hierarchical signatures is generated and embedded into the LSB of image A public key based color image fragile authentication is proposed in [17]

Besides localizing modifications on image, recovering the modified image is also important in image authentication Fridrich and Goljan proposed a fragile image authentication with self-correcting capabilities [51] In this scheme, for each 8x8 DCT block, the first 11 DCT coefficients are quantized and further binary encoded into a 64 bits string; this binary string is encrypted and inserted into LSB of each DCT coefficient Authors claimed that the quality of the reconstructed image from this binary string is roughly half as good as JPEG compressed images Lee [52] also proposed an authentication solution with error correction In his scheme, the LSB of every pixel is set to “0”; and the zero-replaced image is encoded by an ECC scheme; finally, the ECC parities are inserted into the LSB of every pixel During verification, these parities can be employed to authenticate the received image and correct modified pixels

Besides the pixel domain and the DCT domain, wavelet domain based fragile image authentication has also been proposed [53], where the watermark is embedded in the wavelet domain by quantizing the corresponding DWT coefficients

c Semi-fragile authentication

Although the fragile authentication can easily detect alteration on the image, it cannot decide whether this alteration is an acceptable image manipulation or malicious attack Thus, semi-fragile authentication, which is robust to incidental distortions while being sensitive to intentional distortions, attracts more and more attentions in image/video authentication research The existing semi-fragile authentication solutions can be roughly classified into three classes: spatial domain based authentication, DCT domain based authentication and DWT domain based authentication Chotikakamthorn and Sangiamkun [23] proposed a semi-fragile authentication scheme in the spatial domain In their scheme, similarity between neighboring blocks is employed as the feature

Trang 35

of image; this feature is embedded into the image using a general spread-spectrum method; authenticity of the image is decided by comparing the original feature and the feature newly generated from the received image Based on the hash function proposed in [38, 37], Fridich et al have also proposed some semi-fragile authentication solutions [54, 55] In these solutions, image hash is calculated first by projecting the image into the hash functions; and then, a watermark is inserted by modifying the image hash [54] or by watermarking the image hash and inserting it into DCT coefficients located in the middle frequency [55]; authenticity of the image is decided by checking whether the original watermark is present

Since DCT domain has the property of frequency localization, it is easier to design a watermarking scheme, which is robust to normal image manipulations, in the DCT domain

Hence, most semi-fragile authentication solutions in the DCT domain achieve the robustness by

inserting watermarks into DCT coefficients, which locate in the low or middle frequencies The only differences between them are how to generate authentication information or how to embed a watermark

Eggers and Girod [56] took a binary sequence as a watermark and embedded this watermark into the DCT coefficients, from the 2nd to the 8th in the zigzag order of a 8x8 DCT block, using dither quantization rule; a secret dither sequence is used to extract the embedded binary sequencefor authenticity verification In [ 57 ], pseudo-random numbers with zero-mean, unit variance Gaussian distribution are employed as watermark; and this watermark is placed in the upper triangle except the DC component of a block and then converted to a spatial domain watermark using IDCT for spatial domain embedding; the verification is based on the correlation of the extracted data with the original watermark The security of this scheme is preserved by assigning different watermark patterns to different blocks Lin and Chang [58] pointed out a property in JPEG compression: if a DCT coefficient is modified to be an integral multiple of a quantization step, which is larger than the steps used in later JPEG compressions, then this coefficient can be

Trang 36

exactly reconstructed after these compressions In [58], they employed this property to embed watermark

Besides the frequency localization, DWT domain also has the property of space localization Thus, watermarking solutions, performed in the DWT domain, could be more robust to geometric attacks [59]

Shi and Yi [60] proposed a semi-fragile authentication solution in the DWT domain In this scheme, one level DWT is employed; the approximate coefficients (LL band) and the chaotic sequence are used as the watermark; the watermark is inserted into the DWT coefficients in HL band and LH band using the technique of HVS; the authenticity is determined by comparing the difference between the extracted watermark and the watermark re-generated from the received image with a threshold If the difference exceeds the threshold, the tamper is considered as an intentional attack; otherwise, the tamper is considered as an acceptable image manipulation

Yu et al [61] and Han et al [59] proposed watermarking schemes for image authentication based

on multiple levels DWT In both schemes, to improve the robustness against JPEG compression, a statistical value is calculated from the LL band and quantized for watermark embedding In [61], the statistical value is the mean value of the weighted magnitudes of the selected DWT coefficients; watermark is embedded by rounding this mean value to quantization levels specified

the wavelet coefficients is modeled as a Gaussian distribution with small and large variance for incidental distortion and intentional distortion Yu et al pointed out that the weighted mean has the advantage of preserving small variation introduced by incidental distortion In [59], the statistical value is the sum of DWT coefficients in a 2x2 block ( ); the watermark is embedded by quantizing the to

Trang 37

A self-recovery scheme for semi-fragile image authentication has been proposed in [62] In this scheme, one level DWT is employed for the image; and then, the low pass version of the original image is transformed using full frame DCT, followed by scaling down the DCT coefficients to decrease their obtrusiveness; the scaled DCT coefficients are scanned in Zig-zag format; finally, some significant DCT coefficients are selected, scrambled and embedded into the high pass sub-bands of DWT During verification, the hidden DCT coefficients are extracted and transformed to get a reference image using IDCT; this reference image is compared with the received image visually to determine the authenticity of the received image

d Multipurpose watermarking based authentication

Instead of one single watermark, some authentication solutions have multiple watermarks for multiple purposes In [49], robust and fragile watermarks are simultaneously embedded using cocktail watermarking, for copyright protection and content authentication respectively In [63], fragile watermark is for subtle change detection while the robust watermark is for malicious attack detection

2.2.3 Signature and watermarking based authentication

As we have stated in the previous subsection, issues of integrity verification and repudiation prevention are well solved by signature-based authentication while watermarking-based authentication has the advantages of tamper location and no extra data requirement Naturally, authentication solution, which combines signature and watermarking, attracts many researchers Such solutions include [8, 29, 41, 64] All are semi-fragile authentication solutions A general model of these solutions is shown in Fig 2.4 Firstly, the feature of the original image is encoded using a systematic Error Correction Coding (ECC) scheme [65] to get the ECC codeword and the

Parity Check Bits (PCB)(We define this ECC scheme as Feature ECC Coding scheme) A

Trang 38

systematic ECC scheme means that after ECC encoding, its codeword can be separated into two parts: one is its original message and the other is its PCB data Secondly, the hash digest of the ECC codeword is signed using sender’s private key to create a digital signature Thirdly, this signature is concatenated with the PCB data to generate authentication information Finally, this authentication information is encoded using another ECC scheme to generate a watermark, which

is embedded into the original image (We define this ECC scheme as Watermarking ECC Coding

scheme) During authenticity verification, the authentication information (PCB data and digital signature) is extracted by decoding the watermark using Watermarking ECC coding scheme; the PCB data is concatenated with the feature of the received image to create a feature ECC codeword Same Feature ECC coding scheme is employed to decode this feature ECC codeword If the codeword cannot be decoded, the received image is claimed as unauthentic Otherwise, the decoded codeword is crypto-hashed to create a hash digest; and this hash digest, together with the hash digest obtained from the digital signature, is employed to decide the authenticity of the received image

In [8, 41, 64], the authentication information does not include signature So the signature should be stored in an additional space

Trang 39

Fig 2.4 A general model of authentication solutions that using digital signature and watermarking technologies The upper part is the block-diagram of signing; and the lower part is authenticity verification

Since this type of authentication scheme will also be employed in our proposed video authentication solutions in Chapter 3 and Chapter 4, we will roughly introduce the Feature ECC Coding and Watermark ECC Coding schemes Please refer to [14] for more details on employing ECC scheme

Feature ECC Coding. Although a difference between features of the original and the processed images may exist, this difference is relatively small compared with the difference between features

of two different images Therefore, an ECC scheme can be exploited to tackle distortion caused by acceptable image manipulations The design of Feature ECC Coding scheme should follow such a rule: it should be able to correct the difference between features of the original image and the image that has undergone acceptable video manipulations; on the other hand, the difference between features of different images should not be rectifiable

Trang 40

Watermark ECC Coding Since image may undergo a series of manipulations before reaching

final users, the watermark extracted from the received image is usually not identical to the original watermark However, the original authentication information contained in the original watermark should be extracted correctly for authenticating the received image if the manipulations are acceptable Watermark ECC Coding scheme is employed to solve this problem The design of Watermark ECC Coding scheme should consider the error ratio between the extracted watermark and the original watermark in order to make sure that the original authentication information can

be recovered free of errors

Signature and watermarking

Định dạng
Số trang	185
Dung lượng	1,76 MB