On the correlation analysis of sequences designed for spread spectrum watermarking

In this paper, some concepts of correlation analysis which is mandatory in spread spectrum (SS) watermarking are reviewed. Next, the link between two analysis methods named CDS and D-transform, which look quite different is shown for the first time, is studied.

Trang 1

ON THE CORRELATION ANALYSIS OF SEQUENCES DESIGNED

FOR SPREAD SPECTRUM WATERMARKING

Nguyen Le Cuong*

Abstract: In this paper, some concepts of correlation analysis which is mandatory

in spread spectrum (SS) watermarking are reviewed Next, the link between two analysis methods named CDS and D-transform, which look quite different is shown for the first time, is studied Their application for checking the auto-correlation function of pseudo-noise sequences (PN) employed in SS watermarking is investigated A comparison between them is made to give a broader view of mathematical tools employed in sequence design in SS watermarking

Keywords: Watermarking, Spread spectrum, Correlation analysis

1 INTRODUCTION

Along with the digitalization of media assets, the rapid growth of the Internet, and the speed of file transfers, the danger of intellectual right violation become obvious Therefore, it is necessary to have mechanisms to protect these digital assets and associated rights In this regard, digital watermarking is considered as an effective measure against the illegal copies of images, music titles, and video films

In digital watermarking process the information such as hidden copyright notices

or verification messages are added to the cover medium like digital image, audio/video, or documents signals, to protect the ownership rights These hidden messages consist of a group of bits giving information about the author of the signal or the signal itself In order to extract or detect the hidden messages, the correlation analysis technique is used Since, in the last decade the spread spectrum communications (CDMA, HSPA) and cryptography techniques [5] had developed rapidly, this concept has been borrowed not only successfully for watermarking

[1-4, 6-11] but also for other applications like: RFID, EPC, BARCODE [12, 13] and anti-jamming, protective jamming [14-16] In the above mentioned applications the following merits of the sequences are most desired: sharp auto – correlation function (ACF), large linear complexity (LC), larger length L, normal distribution assumption, bi-polar…

This fact is obviously an encouraging motive for carry out an in depth study on nonlinear sequences satisfying those demand As we already know, in spread spectrum communication, a narrow-band signal is spread and then transmitted over

a much larger bandwidth such that the signal energy presented in any signal frequency is undetectable The spreading signal is generated from a PN sequence running periodically at a much higher rate than the original data signal On the receiving end, the receiver first performs a correlation process on the incoming signal so that the original data signal is recovered in original bandwidth

Similarly, a watermark is spread over many frequency bins so that the energy

in one bin is very small and certainly undetectable, if the spreading sequence is not known Since the the location, the spreading sequence and the content of the watermark are known to the watermark verification process, it is possible to concentrate these weak signals into a single output with high SNR thank to the so

Trang 2

called processing gain In technical terms we can express this process as [17] adding a modified maximal-length linear shift register sequence (m-sequence) to the pixel data The identification of the watermark is implemented by correlation techniques (two types of sequences may be formed from an m-sequence: unipolar and bipolar The elements of a bipolar sequence are {-1,1} and the elements of a unipolar sequence are{0,1})

It is worth to make the remark that the correlation process is not only applied in spread spectrum watermarking (SS watermarking) but widely used in other watermarking schemes [16] Without going into the details of watermarking process

we concentrate ourself on the issue of appropriate sequences selection The paper is organized as follows: at the end of this section (I) we will review the related work in spread spectrum watermarking process The preliminaries are represented in next section (II) In the next section, attentions are paid to the: design and analysis issues However, due to the constraint scope of the paper, only the correlation analysis, which is the most important property in watermarking, is discussed The remaining requirements (robustness against attacks, balance…) will be discussed later in other contributions In the last section some comparisons and conclusions are given

The sequences with above mentioned properties have found wide applications

in watermarking process (image as well as audio watermarking) to improve its attributes of robustness, fidelity, capacity, detection [18]

To improve the above attributes, the Pseudo-Random Sequence: with good random criteria (uniformly distributed, sharf ACF) are required In order to serve

as secret key they should have larger linear complexity (LC)

As for audio watermarking according to [19], it is pointed out that by applying redundant-chip coding to spread-spectrum watermarking, the system can effectively resist the geometric distortion such as time scaling and frequency scaling of up to 4% by performing multiple correlation Beside the spreading PN sequences, a so-called tracking sequence is also employed as a key sequence Later

on, in the receiving end, one can locate the watermark embedding position by extensive computation the correlation to find out the maximum value

According to [11] spread spectrum watermarking technique offers robustness Spread spectrum embedding is resistant against number of attacks, especially highly resistant to collusion attacks, since the watermarks have a component-wise Gaussian distribution and are statistically independent The randomness inherent in such watermarks makes the probability of accusing an innocent user very unlikely Spread spectrum embeds the watermark in overlapped regions and this spreading makes it challenging to change even a single bit at will

The popular direct-sequence spread-spectrum watermarking approach is employed in the DCT domain because such spreading gives a robust but invisible watermark and it allows various types of detectors for blind watermark extraction [7] In [20], the need for introducing PN sequence for watermark signal spreading and robustness improving are highlighted The PN sequence is generated by a pseudo-random noise generator which has been initialized with a seed that depends

on the secret-key, which significantly improve the robustness of the system This

Trang 3

key is known only to the legal owner of the watermarked document and without it the generation of the watermark at the receiver is impossible Furthermore, the spreading sequence must have noise-like properties in order to spread the spectrum

of the input signal In other words, the mean of the sequence should be precisely zero and its auto-correlation should approach the delta function Consequently, a bi-polar pseudo-random sequence which takes the values, with relative frequencies 1/2 each will be the suitable candidate for this choice

To survive the attack [20] proposed to combine PN sequences and concatenated turbo codes In [10], a PN of length L=296-1 bits is used to ensure larger LC

2 PRELIMINARY

In order to make sure that the generated sequences meet the above mentioned requirements, many methods have been proposed for analyzing the the properties of nonlinear sequences where LC and ACF, the two most important sequence criteria, have attracted a lot of attentions Recently, in [21] the methods for generating and analyzing LC of nonlinear sequences are thoroughly reviewed but ACF analysis is only briefly mentioned due to scope of that paper Since correlation analysis is the most crucial tool not only in every watermarking scheme but in frequency hopping, anti-jamming techniques [14,15] and security mechanism in digital watermarking [16], it deserves a special attention and needs to be separately treated

When designing a watermark detection system, we need to consider the desired performance and robustness of the system The watermark should be able to be detected under common signal processing operations, such as digital-to-analog and analog-to-digital conversion, linear and nonlinear filtering, compression, and scaling The watermark detection is based on the correlation analysis [22]

The correlation coefficients can be defined via the related variable [6] or via the statistic parameters of the variables In [6], the concept of SS watermarking was first introduced (1997) and the correlation coefficients as a measure of similarity between the original X and the extracted signals X* was used

Furthermore [23], the concept of linear and normalized correlation of two vectors are employed The linear correlation between two vectors is the average product of their elements To normalize the linear correlation by the magnitudes of the two vectors, the detection is unaffected if all elements of either vector are multiplied by a constant A system using the correlation coefficient (CC) is robust

to changes in image brightness and contrast

In some papers, correlation coefficients are defined via the statistic parameters

of the variables [24] The watermark energy resides in all frequency bands Compression and other degradation may remove signal energy from certain parts

of the spectrum, but since the energy is distributed all over the spectrum, some of the watermark remains

Co-variance is a measure of the joint variability of two random variables (In probability theory and statistics)

Trang 4

Variance is a statistical measure that tells us how measured data vary from the average value of the set of data In other words, variance is the mean of the squares

of the deviations from the arithmetic mean of a data set

In case, the signal is expressed as a sequence of samples, the sample correlation statistic is introduced [25] Given a received signal, the watermark detector makes

a (possibly incorrect) decision about the presence or absence of watermark We assume that the detector is synchronized with the embedded watermark W[n] A popular detection method is correlation detection, in which the detector computes the sample correlation statistic, and then compares to a threshold to decide whether W[n] is present or not A larger value of corresponds to increasing confidence that

is indeed present In order to evaluate the performance of this decision a false alarm criterion is introduced [22]

In some watermarking schemes [10, 24], two dimension processing technique is applied, therefore the 2D pseudo sequences (array) and the 2D correlation is defined correspondingly In these schemes, a very long m-sequence is employed to increase the LC and some bits are discarded to resist the correlation attack (like in mobile communications cryptography)

Recently a more complicated and sophisticated watermarking schemes, which considers the effect of quantizing noise and the nonlinear correlations on the performance of spread spectrum watermark are presented [7]

Now we can summarize the section as follows: correlation analysis is a crucial operation in watermark detection and extraction The performances of these actions depend closely upon the correlation properties of the employed PN sequences (best possible ACF) In the next section we will investigate the methods to check whether the designed sequences ensure these properties or not?

3 CORRELATION ANALYSIS IN PN SEQUENCES DESIGNED

FOR WATERMARK

In this section we will review the two well known methods for ACF analysis and point out the link between them, which is still missing in the literature (to the best of our knowledge).They are namely combinatorial method and D-transform method

3.1 The mathematical (combinatorial) approach

Since there is a one-to-one correspondence between cyclic difference sets and almost balanced binary sequences with the auto-correlation property [26-28] the constructing all cyclic difference sets is equivalent to finding all almost-balanced binary sequences with the desired auto-correlation property This problem has been thoroughly discussed and reported in literature so that we just give short reference here

Definition 1 [26-28] - difference set: A set of distinct integers D = {d1, d2, …,

dk} modulo an integer υ is called integer difference set or difference set denoted by (υ, k, λ) if every integer b ≠ 0 (mod υ) can be expressed in exactly λ way in the form di - dj ≡ b (mod υ), where di, dj belong to the integer set D

Example 3.1:

Trang 5

1 3 4 5 9

1 0 2 3 4 8

3 9 0 1 2 6

4 8 10 0 1 5

5 7 9 10 0 4

9 3 5 6 7 0

D = {1, 3, 4, 5, 9} is a (11, 5, 2) – difference set λ=2

It is well known that [27-29] CDS characteristic sequence of period υ defined by:

0 for t D (t)=

1 for t D



Has the two-level autocorrelation function:

for 0(mod ) ( )

4( ) otherwise

s

R

k







 

Example 3.2: Consider a CDS (15, 7, 3), and D = {0 5-7 10,11-13,14} The

corresponding sequence s(t) determined by (1) is: 011110101100100 and has a two-levelled ACF

Definition 2- CDS with Singer parameters: Cyclic difference sets in GF(2n) with Singer parameters are those with parameters (2n – 1, 2n – 1 – 1, 2n − 2 − 1) for some integer n or their complements Sets of sequences with Singer parameters are: q-ary m-sequences, q-q-ary GMW sequences, and q-q-ary cascaded GMW sequences and they are having interleaved structure All of them are having two-levelled ACF For details please see [30,31]

3.2 The (technical oriented) D-transform method [21, 30, 31]

This method is first proposed as early as 1985, much earlier than [29] but it is more hardware oriented and less well–known In technical term interleaving is nothing but time multiplexing, which is very well known to telecommunication engineers and is traditionally represented via delay operation (D-transform)

Definition 4 [21,30,31]: The D-transform of a sequence {bi} over GF(p) is denoted by D[bi] or F and designed by:

(3) For example, let {bi} = 010111, D-transform of bn is D(bi) = D + D3 + D4 + D5 The inverse transform of D is D-1 = {bi}

The D-transform of the generator sequence {bi} of a linear feedback shift register (LFSR) is then given by:

(4) Where G(D) of degree n is the generating polynomial of a LFSR and S(D) of degree ≤ n -1 specifies the initial condition corresponding to a particular shifted version of {bi}

- Shift sequences (interleaving orders) by D-transform

Trang 6

Since interleaving process and D-transform are both sort of time multiplexing [8, 10] one can easily derive the interleaving order ITp straightforwardly In fact, there are two methods for derive ITp namely: expanding and decomposition (decimation) For the sake of simplicity we just show the result by decimation as in Example 2

Example 3.3: Let m = 3, n = 6 and let α be a primitive element of GF(26) with primitive polynomial b(D) = D6 + D5 + 1 over GF(2) Let {bi} denote the m-sequence generated by b(D):

{bi} = {0 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 1 1 0 1 1 1 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0

0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0}

Decimation of {bi} by T = 9, we obtain {ai} = {bi9} and rearrange is as a [9x7] matrix:

0000000

1110010

1011100

1110010

1100101

1011100

0101110

1110010

We can see that the rows are shift equivalent and = {z,5,3,5,6,3,3,2,5}, where z represents Null sequence It can be seen obviously that represents the relative phase shifts of sub-sequence {ai}

In D-transform method the ACF of interleaved sequence is explained in a different way (without any concept of CDS) The two-level ACF is ensured by the phase shift relations between the sub-sequences and proved via the coincidence matrix: The digits in both row and column represent the relative phase shift between sub-sequence

(or ACF matrix )

Note: The key remark here is: in each diagonal I≠0, there is exactly one place where the digits in both column and row are the same (marked by +)! In other

Trang 7

words: they form the one coincidence sequences and ensure that ACF is almost ideal! For details please see [21][30][31]

3.3 The missing links between two methods

Even though the two methods are independently developed by different mathematical tools and parallel used in sequence design for a long time, there has not been any link between them shown so far Since one can not find the common language for them, it is best to illustrate the link via numerical examples

Example 3.4:

Step I: from the list of CDS, we pick up some CDS corresponding to sequences

with interleaved structure (definition 2)

- 1st: CDS {7, 3, 1}, D =1, 2, 4, λ=1

- 2nd: CDS {15, 7, 3}, D = {0, 5, 7, 10, 11, 13, 14}

- 3rd: CDS {63, 31, 15}, D={0 ,1 ,2 ,3, 4, 6, 7, 8, 9, 12, 13, 14, 16, 18, 19, 24,

26, 27, 28, 32, 33, 35, 36, 38, 41, 45, 48, 49, 52, 54, 56} λ=15

- 4th:CDS {255, 127, 63}, D = {0, 7, 11, 13, 14, 17, 19, 22, 23, 26, 27, 28, 34,

38, 39, 43, 44, 46, 47, 49, 51, 52, 53, 54, 55, 56, 57, 63, 67, 68, 76, 77, 78, 83,

85, 86, 88, 89, 92, 94, 95, 97, 98, 99, 101, 102, 104, 106, 108, 110, 111, 112,

113, 114, 115, 119, 121, 123, 125, 126, 131, 133, 134, 136, 137, 139, 141,

147, 149, 151, 152, 153, 154, 155, 156, 159, 161, 166, 169, 170, 172, 175,

176, 177, 178, 183, 184, 185, 187, 188, 189, 190, 193, 194, 196, 197, 198,

201, 202, 203, 204, 205, 207, 208, 212, 215, 216, 219, 220, 221, 222, 224,

226, 228, 229, 230, 231, 235, 237, 238, 242, 243, 245, 246, 249, 250, 252}…

Step II: conversion CDS into Binary sequences:

Using s(t), which is defined as binary incidence indication sequence (or characteristic sequence) of the CDS

- 1st binary incidence indication sequence: {bn1} = {1,0,0,1,0,1,1} (as a basic sub-sequence)

- 2nd binary incidence indication sequence: {bn2} = {0,1,1,1,0,1,0,1,1,0,0,1,0, 0}

- 3rd binary incidence indication sequence: {bn3}= {0,0,0,0,0,1,0,0,0-,0,1,1,0,

0,0,1,0,1-,0,0,1,1,1,1,0,1,0-,0,0,1,1,1,0,0,1,0-,0,1,0,1,1,0,1,1,1-,0,1,1,0,0,1,1,0,1-,0, 1,0,1,1,1,1,1,1}

- 4th binary incidence indication sequence:

011111101110/100110101100/110001111101/110011100100/101000000011/ 111011100111/111100011110/100100110100/100010010101/010000001110/ 101010011110/100100101011/111010100000/011010111101/100101100001/ 111000100001/100100011000/001001110110/011000010101/000011101001/ 110010011001/011-3

Step III: Decomposition of binary indication sequences into sub-sequences (like

1st and 2nd sequences) This operation is useful when applied for sequences with great composite length L = N.T, with N being the length of sub-sequences [29-31]

We now decompose the 3rd and the 4th sequences into corresponding sub-sequences and arrange them into matrices [9x7] as follows: (rows represent sub-sequences)

- {bn3} as:

Trang 8

0 0 0 0 0 0 0

0 1 0 0 1 1 1

0 1 1 1 0 1 0

0 0 1 1 1 0 1

1 0 1 0 0 1 1

0 1 0 0 1 1 1

0 0 1 1 1 0 1

0 1 0 0 1 1 1

With Shift sequence: = {z 0 4 5 5 1 0 5 0}

- {bn4} as [17x15] matrix

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 0 1 0 1 1 0 0 1 0 0 0 1

1 0 1 0 1 1 0 0 1 0 0 0 1 1 1

1 1 1 0 1 0 1 1 0 0 1 0 0 0 1

1 1 0 0 1 0 0 0 1 1 1 1 0 1 0

1 0 0 0 1 1 1 1 0 1 0 1 1 0 0

1 0 1 0 1 1 0 0 1 0 0 0 1 1 1

0 1 1 1 1 0 1 0 1 1 0 0 1 0 0

1 1 1 1 0 1 0 1 1 0 0 1 0 0 0

1 0 0 1 0 0 0 1 1 1 1 0 1 0 1

0 0 1 1 1 1 0 1 0 1 1 0 0 1 0

1 1 0 0 1 0 0 0 1 1 1 1 0 1 0

0 1 0 1 1 0 0 1 0 0 0 1 1 1 1

0 1 1 1 1 0 1 0 1 1 0 0 1 0 0

1 1 0 1 0 1 1 0 0 1 0 0 0 1 1

1 1 1 0 1 0 1 1 0 0 1 0 0 0 1 With shift sequence: = {z, 0, 2, 0, 6, 10, 2, 13, 14, 7, 7, 12, 6, 3, 13, 1, 0} Note: There are another algorithms to calculate the shift sequences

Step IV: Checking the ACF by correlation matrix [30][31]

We create the ACF matrix by writing = {z, 0, 2, 0, 6, 10, 2, 13, 14, 7, 7, 12, 6,

3, 13, 1, 0} into row and column like below and remark the coincidences between them by +

z 0 2 0 6 10 2 13 14 7 7 12 6 3 13 1 0 z 1 3 1 7 11 3 14 0 8 8 13 7 4 14 2 1 2

z + +

0 + + + +

2 + + + +

0 + +

6 + +

10 +

2 + + +

13 + + +

14 + + +

Trang 9

7 + + + +

7 + +

12 +

6 +

3 + + +

13 + +

1 + + + +

0 + +

Similarly, for the CDS {1023 511 255} corresponds to the binary sequences generated by GF(210) = 1 + d3 + d10 we can decompose it into sub-sequences corresponding to GF(25) Then, we get the = { -, 23, 15, 20, 30, 10, 9, 17, 29, 13, 20, 21, 18, 21, 3, 9, 27, 12, 26, 21, 9, 7, 11, 11, 5, 22, 11, 4, 6, 27, 18, 14, 23} and the ACF matrix as: 0 z 2315203010 9 17291320211821 3 9 27122621 9 7 1111 5 2211 4 6 27181423 z +

23 + +

15 +

20 + +

30 +

10 +

9 + + +

17 +

29 +

13 +

20 + +

21 + + +

18 + +

21 + + +

3 +

9 + + +

27 + +

12 +

26 +

21 + + +

9 + + +

7 +

11 + + +

5 +

22 +

11 + + +

4 +

Trang 10

6 +

27 + +

18 + +

14 +

23 + +

In each diagonal there is exactly one place where the digits in both column and row are the same (marked by +).The ACF is exactly as that one given by CDS and that clearly gives the evidence that both CDS and D-transform methods are equivalent This fact throws a new insight into the relation between two popular mathematical tools used in correlation analysis, which has not been given in the literature so far In this regard, we would like to emphasize that checking the coincidences between rows and columns of the matrix TxT is very much faster and far more intuitive than comparison of two sequence shifts, since the length of the sequence is L=T.N, especially when L is very large in practice Below is an examples for illustration For the sequence length L = 218 – 1 = 262143 bits we

have:

- Sub-sequence length N = 29 – 1 = 511 bits

- T = L/N = 513

= {, 356, 201, 66, 402, 410, 132, 221, 293, 312, 309, 157, 264, 39, 442, 178,

75, 328, 113, 56, 107, 476, 314, 165, 17, 136, 78, 139, 373, 394, 356, 275, 150,

107, 145, 12, 226, 438, 112, 396, 214, 346, 441, 139, 117, 506, 330, 191, 34, 220,

272, 446, 156, 280, 278, 115, 235, 357, 277, 166, 201, 14, 39, 38, 300, 411, 214,

404, 290, 369, 24, 270, 452, 481, 365, 299, 224, 265, 281, 364, 428, 260, 181, 244,

371, 173, 278, 347, 234, 284, 501, 138, 149, 472, 382, 258, 68, 322, 440, 447, 33,

400, 381, 75, 312, 95, 49, 150, 45, 175, 230, 203, 470, 244, 203, 57, 43, 371, 332,

356, 402, 347, 28, 114, 78, 390, 76, 357, 89, 400, 311, 295, 428, 142, 297, 35, 69,

131, 227, 35, 48, 495, 29, 425, 393, 229, 451, 75, 219, 189, 87, 418, 448, 112, 19,

421, 51, 270, 217, 412, 345, 74, 9, 9, 362, 19, 488, 361, 231, 258, 346, 341, 45,

353, 183, 179, 468, 127, 57, 389, 491, 134, 276, 220, 298, 397, 433, 204, 253, 206,

5, 338, 136, 371, 133, 507, 369, 119, 383, 227, 66, 302, 289, 22, 251, 242, 150,

430, 113, 82, 190, 24, 98, 386, 300, 81, 90, 143, 350, 368, 460, 100, 406, 169, 429,

499, 488, 54, 406, 286, 114, 198, 86, 328, 231, 304, 153, 179, 201, 500, 293, 378,

183, 312, 56, 327, 228, 19, 156, 82, 269, 329, 152, 363, 203, 32, 178, 434, 289,

461, 111, 412, 79, 20, 345, 420, 284, 494, 83, 324, 70, 453, 138, 54, 262, 475, 454,

429, 70, 509, 96, 366, 479, 396, 58, 179, 339, 263, 275, 206, 458, 185, 391, 241,

150, 133, 438, 386, 378, 87, 174, 398, 325, 492, 385, 417, 224, 456, 38, 48, 331,

88, 102, 378, 29, 186, 434, 174, 313, 451, 179, 456, 148, 327, 18, 66, 18, 248, 213,

115, 38, 95, 465, 312, 211, 391, 462, 293, 5, 10, 181, 385, 171, 177, 90, 64, 195,

323, 366, 199, 358, 359, 425, 186, 254, 60, 114, 407, 267, 377, 471, 297, 268, 449,

41, 72, 440, 306, 85, 250, 283, 399, 355, 420, 408, 90, 506, 445, 412, 164, 10, 297,

165, 182, 272, 473, 231, 462, 266, 466, 503, 418, 227, 283, 238, 215, 255, 439,

454, 90, 132, 359, 93, 121, 67, 449, 44, 455, 502, 209, 484, 280, 300, 445, 349,

343, 226, 484, 164, 289, 380, 58, 48, 412, 196, 147, 261, 193, 89, 288, 162, 100,

180, 349, 286, 204, 189, 149, 225, 292, 409, 381, 200, 466, 301, 223, 338, 149,

Định dạng
Số trang	14
Dung lượng	365,79 KB