In this contribution, criteria for the sequences employed in spread spectrum (SS) watermarking and steganography are discussed. These criteria are: Sharp autocorrelation function (ACF), large linear complexity (LC), large length (L), normal distribution assumption and bi-polar.
Trang 1ON THE GENERATION AND SELECTION OF SEQUENCES FOR SPREAD SPECTRUM WATERMARK AND STEGANOGRAPHY
Nguyen Le Cuong*
Abstract: In this contribution, criteria for the sequences employed in spread
spectrum (SS) watermarking and steganography are discussed These criteria are: sharp autocorrelation function (ACF), large linear complexity (LC), large length (L), normal distribution assumption and bi-polar The reasons for choosing those criteria and the methods for generating and evaluating sequences satisfying them are explained and analyzed via the so-called D-transform, as a result, these sequences can be represented in a hardware-oriented form Some simulation results are also referenced to demonstrate outstanding features of generated sequences such as very good auto-correlation function (ACF) and large linear complexity (LC) for watermarking and steganography
Keywords: Nonlinear sequences, Spread spectrum watermark, Steganography
1 INTRODUCTION
Along with the digitalization of media assets, the rapid growth of the Internet, and the speed of file transfers, the danger of intellectual right violation become obvious Therefore, it is necessary to have mechanisms to protect these digital assets and associated rights In this regard, digital watermarking is considered as an effective measure against the illegal copies of images, music titles, and video films In the digital watermarking process the information such as hidden copyright notices or verification messages are added to the cover media like digital images, audio/video or documents signals, to protect the ownership rights These hidden messages consist of a group of bits giving information about the author of the signal or the signal itself In the last decade, the spread spectrum communications (CDMA, HSPA) has developed rapidly, this concept has been borrowed not only successfully for watermarking [1-11] But also widely applied beyond cryptography like: RFID, EPC, BARCODE [12,13] and anti-jamming, protective jamming [14,15] In the above-mentioned applications the following merits of the sequences are most desired:
Sharp ACF
Large linear complexity (LC)
Large length (L)
Normal distribution assumption
Bi-polar
This fact is obviously an encouraging motive for carrying out an in-depth study on nonlinear sequences satisfying those demands As we already know, in spread spectrum communication a narrow-band signal is spread and then transmitted over a much larger bandwidth such that the signal energy presented in any signal frequency is undetectable That effect is caused by the following process: The transmitter first modulates the data signal with a carrier signal, and then spreads the modulated signal, by applying modulo-2 addition to it with a spreading signal The spreading signal is generated from a PN sequence running periodically at a much higher rate than the original data signal
On the receiving end, the receiver first performs a correlation process on the incoming signal, that is, it applies the modulo-2 addition to the incoming signal with a synchronized copy of the spreading signal so that the original data signal is recovered in original bandwidth At the same time, the identical modulo-2 addition in the receiving end will
Trang 2spread out the power of the interference, which is supposed to be narrowband and therefore provides interference rejection for the SS signal hence will increase the receiving signal-to-noise (SNR) of the signal of interest
Similarly, a watermark is spread over many frequency bins so that the energy in one bin is very small and certainly undetectable if the spreading sequence is not known Since the location, the spreading sequence and the content of the watermark are known to the watermark verification process, it is possible to concentrate these weak signals into a single output with high SNR thank to the so-called processing gain
It is worth to make the remark that the correlation process is not only applied in spread spectrum watermark (ss watermark) but widely used in other watermark schemes [16-19] Without going into the details of the watermarking process we concentrate on the issue of appropriate sequences selection The paper is organized as follows: at the end of this section, we will review the related work in spread spectrum watermarking process The preliminaries are represented in next section In the next section, attention will be paid to the: design and analysis issues However, due to the constraint scope of the paper, only the correlation and linearity analysis, which is the most important property in watermarking, is discussed The remaining requirements (robustness against attacks, balance…) will be discussed later in other contributions In the last section, some comparisons and conclusions are given
It is clear that spread spectrum watermark is closely related to spreading sequences The sequences with above-mentioned properties have been widely used to improve the attributes of the watermarking process (image as well as audio watermarking) In the DCT domain, the popular direct-sequence spread-spectrum watermarking approach is employed because such spreading gives a robust but invisible watermark and it allows various types
of detectors for blind watermark extraction [1] The PN sequence is generated by a pseudorandom noise generator which has been initialized with a seed that depends on the secret key, which significantly improves the robustness of the system This key is known only to the legal owner of the watermarked document and without it the generation of the watermark at the receiver is impossible Furthermore, the spreading sequence must have noise-like properties in order to spread the spectrum of the input signal In other words, the mean of the sequence should be precisely zero and its autocorrelation should approach the delta function Consequently, a bi-polar pseudorandom sequence which takes the values, with relative frequencies 1/2 each will be the suitable candidate for this choice [2, 3] In order to survive the attack better, the nonlinear sequences are also introduced [4, 5] There have been also a proposal for employing PN key or PN sequences for copyright protection steganography [6, 7] However, the correlation properties are somehow relaxed or not sufficiently discussed
2 PRELIMINARY
The watermark detection, in general, is based on the correlation analysis (not only for spread spectrum watermark) because the correlation is crucial for noise removing which plays a decisive role in signal quality [8-11,17-19] Therefore, in this paper we insist on the finding PN sequences with spiky ACF and high LC value We measured the similarity between the original watermark and the watermark extracted from the attacked image using the correlation factor given below:
1
ˆ ˆ
(w,w)
ˆ
N
i i i
N N
i i
i i
w w
Trang 3Where N is the number of pixels in the watermark, w and w are the original and the extracted watermarks respectively The correlation factor may take values {0,1} In general, a correlation coefficient of about 0.75 or above is considered acceptable.This correlation factor can also be taken as a measure of robustness [17] In many papers The Structural similarity.index measure SSIM based on Normal correlation factor is widely used and defined as:
w
ij ij
1 1 w
ij ij
1 1
ˆ
ˆ
h
h
The Structural Similarity Index Measure (SSIM) is closely related to PSNR (peak signal to noise ratio):
2
255
SSIM PSNR
SSIM
Where σ is covariance between Wij and W*ij
For more details please see [16-19] It is clear that correlation property is no doubt the first preference for the sequence selection in watermark and steganography
3 DESIGN AND ANALYSIS ISSUES 3.1 The combinatorial approach
Since there is a one-to-one correspondence between cyclic difference sets and almost balanced binary sequences with the autocorrelation property [20-23], the constructing all cyclic difference sets is equivalent to finding all almost-balanced binary sequences with the desired autocorrelation property This problem has been thoroughly discussed and reported in the literature so that we just give a short reference here
Definition 1 - cyclic difference set (CDS) [20, 21]: A set of distinct integers D = {d1, d2,
…, dk} modulo an integer υ is called integer difference set or difference set denoted by (υ,
k, λ) if every integer b ≠ 0 (mod υ) can be expressed in the exactly λ way in the form di -
dj ≡ b (mod υ), where di, dj belong to the integer set D
Example 3 1:
j
D = {1,3,4,5,9} is a (11, 5, 2) – difference set λ=2
It is well known that CDS characteristic sequence of period υ defined by:
0 for t D (t)=
1 for t D
s
(1) Has the two-level autocorrelation function
for 0(mod ) ( )
4( ) otherwise
s
R
k
Trang 4Example 3.2: Consider a CDS(15,7,3), and D = {0 5-7 10,11-13,14} The
corresponding sequence S(t) determined by (1) is: 011110101100100 and has a two-levelled ACF
Definition 2- CDS with Singer parameters [22-23]: Cyclic difference sets in GF(2n) with Singer parameters are those with parameters (2n – 1, 2n – 1 – 1, 2n − 2 − 1) for some integer n or their complements Sets of sequences with Singer parameters are: q-ary m-sequences, q-ary GMW m-sequences, and q-ary cascaded GMW sequences and they are having interleaved structure and ideal two-level ACF [24]
Example 3.3: The GMW or m-like sequence [20,25]:
{bi}={0,1,1,1,0,0,0,1,1,0,0,1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,1,0,0,1,0,1,0,1,0,0,1,1,0,1,0, 0,0,0,1,0,0,0,1,0,1,1,0,1,1,1,1,1,1,0,1} has ideal two level ACF
3.2 D-transform for interleaved sequences
Since most of the sequences with ideal ACF are having interleaved structure, time multiplexing technique is very useful tool for analyze them
In technical term, interleaving is nothing but time multiplexing, which is very well known to telecommunication engineers and is traditionally represented via delay operation (D-transform) [25] [26]
and designed by:
D[bi] = F = biDi
∞
i=1
(3) For example, let {bi} = 010111, D-transform of bn is D(bi) = D + D3 + D4 + D5
The inverse transform of D is D-1 = {bi}
The D-transform of the generator sequence {bi} of a linear feedback shift register (LFSR) is then given by:
Where G(D) of degree n is the generating polynomial of an LFSR and S(D) of degree ≤ n-1 specifies the initial condition corresponding to a particular shifted version of {bi}
3.3 Shift sequences (interleaving orders) by D-transform
Since interleaving process and D-transform are both sort of time multiplexing [25,26] one can easily derive the multiplexing (interleaving) order ITp straightforwardly
Example 3.5: Let m = 3, n = 6 and let α be a primitive element of GF(26) with primitive
b(D):
{bi} = {0 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 1 1 0 1 1 1 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1
1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0}
Decimation of {bi} by T = 9, we obtain {ai} = {bi} and rearrange in time multiplexing manner as:
0 1 1 1 1 1 1 0 1
0 1 0 1 1 0 0 1 1
0 1 1 1 0 1 1 0 1
0 0 1 0 0 1 1 1 0
0 0 1 0 1 1 1 1 0
0 1 0 1 0 0 0 1 1
0 0 0 0 1 0 0 0 0
Trang 5We can see that the columns are time multiplexing order (shift sequences) IpT = {∞,5,3,5,6,3,3,2,5}, where ∞ represents Null sequence
3.4 Correlation analysis of PN sequences designed for watermark and steganography
As it has been shown in [24, 25, 27] ACF of PN sequences is closely related to IpT and can be intuitively demonstrated like this:
(a)
(b)
Fig 1 Correlation matrix of (a) GF(2 8 ) = 1 + d 2 + d 3 + d 4 + d 8 with = {Inf, 2, 4, 2, 8,
12, 4, 0, 1, 9, 9, 14, 8, 5, 0, 3, 2}, (b) GF(2 8 ) = 1 + d + d 3 + d 5 + d 8 with IpT = {Inf, 5, 10, 8,
5, 6, 1, 3, 10, 3, 12, 11, 2, 2, 6, 9, 5}
It is clear that in order to ensure the best ACF of the sequence there must be only one
other words, the interleaved sequence and its shift version presented by IpT have only one position where subsequences are exactly in the same phase
3.5 Linear complexity analysis of PN sequences designed for watermark and steganography
LC can be defined in many ways:
The linear complexity of a sequence S is equal to the degree of the minimal polynomial generates the sequences;
In trace representation method the linear complexity is determined by the minimum number of terms in its trace function expression;
In D-transformation the linear complexity can be calculated by Euclid algorithm Let S(t)=S(0),S(1),…S(L-1) be a binary sequence of period L and define the sequence polynomial(similar to D-transform)
Then, its minimal polynomial is determined as follows:
Trang 6m (x)
And its linear complexity is LCs = L-deg(gcd(xL-1,S(x)))
Where gcd(xL-1,S(x)) denotes the greatest common divisor of xL-1 and S(x)
According to [22-27], Linear Complexity can be calculated by:
Trace function representation [22-24];
D-transformation (Euclid algorithm and DFT (discrete Fourier Transform)) [25-27] Below tables show simulation results of LC for different sequences:
Table 1 LC for sequences in GF(2 10 ) with subsequences in GF(2 5 )
5
)
101001 100101 111101 101111 111011 110111
Trang 7STT GF(2 10 ) GF(2
5
)
101001 100101 111101 101111 111011 110111
Table 2 LC for sequences in GF(2 12 ) with subsequences in GF(2 4 ) and GF(2 3 )
4
Trang 8STT GF(2 12 ) GF(2
4
The following conclusion can be drawn: whilst the ACF remain the same in all kind of interleaving, the LC spectrum shows a significant difference That means we can expect that the SSIM and PSNR are almost the same for all chosen sequences and pay more attentions to the robustness against Massey-Berlekamp attack
3.6 Rise the security to GPS and military level
For this purpose, PN sequences are two folded useful PN sequences are exploited to assign a certain amount of chips in a PN sequence to represent secret data [28] This approach will be discussed in another contribution
In the context of secure wireless communications, Pseudo-noise (PN) masking technique has been proven to be an effective technique against unauthorized data collection (eavesdropping) Typical examples of PN masking technique are military-grade communications and global-positioning systems (GPS) [28,29] In fact, in a PN-masked secure SS embedding approach, the embedded SS signal is scrambled by random-like PN masks such that no subspace of embedded signal can be found and tracked in the data-embedded host Thank to scrambling process (randomization) in SS embedding scheme, the performance in terms of recovery bit-error-rate (BER) at the intended receiver is maintained at almost the same level (since the ACF of scrambled data is as good as that of the conventional SS embedding (i.e almost no performance loss), while the unauthorized users will have BER close to 0.5 (i.e almost perfect security) due to a high spike in cross-correlation (see fig Since the proposed PN masked SS embedding schemes can efficiently minimize the likelihood that embedded data are "stolen" by the unauthorized users, they are suitable for the applications with high-security requirements, such as steganography and covert communications With random-like PN-masked carriers, the SS signal of interest behaves like white noise and no subspace of embedded signal (e.g statistic distribution) can be tracked from the observation data Therefore, PN masked SS embedding in can efficiently prevent illegitimate data extraction by unauthorized users who have no knowledge of PN masks Mathematically, the whitening effect can be explained like this
3.7 {1} and {0} distribution
Let {O} be the output signal of the scrambler, {I} be the input sequences (embedded
SS signal generating forced response) and {u} represents LFSR sequences (free response) The probabilities of '0' and '1' bits in {O} are Po(0) and Po(1) respectively The probabilities of '0' and '1' bits in {I} are PI(0) and PI(1) respectively Similarly, the probabilities of '0' and '1' bits in {u} are PU(0) and PU(1), respectively Since scrambler is
a linear system in GF(2n), we can apply the superposition rule and have:
O I u
So, the probability of '1' in {O} equals the probability of '1’ in XOR – operation:
0(1) I(1) (0)u I(0) (1)u
P P P P P
Since {u} satisfies the balance condition ,we get:
1 (1) (0)
2
Under the assumption that {I} and {U} are statistically independent, we have:
Trang 9 0
(1) (0)
2
So, any input sequence with unbalanced distribution will become balanced (noise-like)
3.8 ACF
ACF of {O} in binary form is calculated as:
( ) A D
R k
Where, A and D are the agreement and disagreement positions between {O} and its shift version respectively So, ACF is:
0
2
In other words, {O} looks like noise with:
1
2 ( ) 0
R k
And following simulation results in [30] showed the randomization effect of scramblers:
Fig 2 ACF of nonlinear PN sequence generated by polynomial of degree 12
Fig 3 CCF between two sequences
Trang 104 CONCLUSIONS AND FUTURE WORKS
The spiky ACF of nonlinear sequences ensure the good quality of embedding and extracting processes both in watermark and steganography
Since the LC spectrum is widespread, it is important to select the right sequences The differences in LC may be as high as n.100 or even n.1000 This fact will surely affect the robustness of watermark and steganography
In order to increase the security of watermark and steganography to the higher level (military or GPS), nonlinear PN sequences can be used as masking sequences.The large values of LC ensure the resistance against Berlekamp–Massey algorithm attack
PN masked SS embedding in can efficiently prevent illegitimate data extraction by unauthorized users who have no knowledge of PN masks
Mathematical tools for finding out the desired sequences are also represented via D- transform, a hardware-oriented method and any input sequence with unbalanced distribution will become balanced (noise-like) by the randomization effect of scramblers
We hope, in the future contributions, the process for choosing the right interleaving will
be addressed in details In our opinion, the application of nonlinear interleaved sequences in steganography and cryptography fully deserves the attention of academic circle
REFERENCE
[1] Ashok Patel, Bart Kosko, “Noise Benefits in Quantizer-Array Correlation Detection
and Watermark Decoding,” IEEE Transactions on signal processing, Vol 59, No 2,
February 2011, pp 448-505
[2] Alexia Briassouli and Michael G Strintzis, “Locally Optimum Nonlinearities for
DCT Watermark Detection,” IEEE Transactions on image processing, Vol 13, No
12, December 2004, pp 1604-1617
[3] X Kang, J Huang, and W.Zeng, “Improving Robustness of Quantization-Based
Image Watermarking via Adaptive Receiver,” IEEE Transactions on Multimedia,
Vol 10, No 6, October 2008, pp 953-959
[4] Al-Rawi1.S.S, Sadiq.A.T, Farhan B.G, “Digital Video Quality Metric Based on
Watermarking Technique with Gaffe Generator,” Computer Science and
Engineering 2012, 2, pp 138-146
[5] T Rachwalik, J Szmidt, R Wicik, and J Zabłocki, “Generation of Nonlinear
Feedback Shift Registers with special-purpose hardware,” Military Communication
Institute Poland 2012
[6] P Senatore, “A Blind Video Watermarking Algorithm for Copyright Protection
based on Dual Tree Complex Wavelet Transform", Journal of Information Hiding
and Multimedia Signal Processing, November 2006, pp 1147-116
[7] Rizky M Nugraha, “Implementation of Direct Sequence Spread Spectrum
Steganography on Audio Data,” 2011 International Conference on Electrical
Engineering and Informatics, 17-19 July 2011, Bandung, Indonesia
[8] Sudip Ghosh, Somsubhra Talapatra, Debasish Mondal, Navonil Chatterjee,
Hafizur Rahaman and Santi P Maity, “VLSI Architecture for Spread Spectrum
Image Watermarking using Binary Watermark,” IEEE International Conference
on Advances in Computing and Communications(ICACC), from 9-11 August
2012 at Rajagiri School of Engineering & Technology, Cochin, Kerala, India, Pp.166-169, 2012
[9] Sudip Ghosh, Somsubhra Talapatra, Debasish Mondal, Navonil Chatterjee,
Hafizur Rahaman and Santi P Maity, “VLSI Architecture for Spread Spectrum
Image Watermarking using BinaryWatermark,” IEEE International Conference on