15 CHAPTER 2 DIGITAL IMAGE STEGANOGRAPHY BASED ON THE GALOIS FIELD USING GRAPH THEORY AND AUTOMATA.. 19 2.3.2 Digital Image Steganography Based on The Galois Field GF pm Using Graph Theo
Trang 1MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
——————————
Nguyen Huy Truong
RESEARCH ON DEVELOPMENT OF METHODS
OF GRAPH THEORY AND AUTOMATA
IN STEGANOGRAPHY AND SEARCHABLE ENCRYPTION
DOCTORAL DISSERTATION IN MATHEMATICS AND
INFORMATICS
Hanoi - 2020
Trang 2
MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
——————————
Nguyen Huy Truong
RESEARCH ON DEVELOPMENT OF METHODS
OF GRAPH THEORY AND AUTOMATA
IN STEGANOGRAPHY AND SEARCHABLE ENCRYPTION
Major: Mathematics and Informatics Major code: 9460117
DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS
SUPERVISORS:
1 Assoc Prof Dr Sc Phan Thi Ha Duong
2 Dr Vu Thanh Nam
Hanoi - 2020
Trang 3DECLARATION OF AUTHORSHIP
I hereby certify that I am the author of this dissertation, and that I have completed itunder the supervision of Assoc Prof Dr Sc Phan Thi Ha Duong and Dr Vu ThanhNam I also certify that the dissertation’s results have not been published by other authors
Hanoi, May 18, 2020PhD Student
Nguyen Huy Truong
Supervisors
Trang 4I am extremely grateful to Assoc Prof Dr Sc Phan Thi Ha Duong
I want to thank Dr Vu Thanh Nam
I would also like to extend my deepest gratitude to Late Assoc Prof Dr Phan TrungHuy
I would like to thank my co-workers from School of Applied Mathematics andInformatics, Hanoi University of Science and Technology for all their help
I also wish to thank members of Seminar on Mathematical Foundations for ComputerScience at Institute of Mathematics, Vietnam Academy of Science and Technology for theirvaluable comments and helpful advice
I give thanks to PhD students of Late Assoc Prof Dr Phan Trung Huy for sharingand exchanging information in steganography and searchable encryption
Finally, I must also thank my family for supporting all my work
Trang 5Page
LIST OF SYMBOLS iii
LIST OF ABBREVIATIONS iv
LIST OF FIGURES v
LIST OF TABLES vi
INTRODUCTION 1
CHAPTER 1 PRELIMINARIES 4
1.1 Basic Structures 4
1.1.1 Strings 4
1.1.2 Graph 4
1.1.3 Deterministic Finite Automata 6
1.1.4 The Galois Field GF (pm) 7
1.2 Digital Image Steganography 8
1.3 Exact Pattern Matching 11
1.4 Longest Common Subsequence 12
1.5 Searchable Encryption 15
CHAPTER 2 DIGITAL IMAGE STEGANOGRAPHY BASED ON THE GALOIS FIELD USING GRAPH THEORY AND AUTOMATA 16
2.1 Introduction 16
2.2 The Digital Image Steganography Problem 18
2.3 A New Digital Image Steganography Approach 19
2.3.1 Mathematical Basis based on The Galois Field 19
2.3.2 Digital Image Steganography Based on The Galois Field GF (pm) Using Graph Theory and Automata 21
2.4 The Near Optimal and Optimal Data Hiding Schemes for Gray and Palette Images 29
2.5 Experimental Results 34
2.6 Conclusions 38
CHAPTER 3 AN AUTOMATA APPROACH TO EXACT PATTERN MATCHING 40
3.1 Introduction 40
3.2 The New Algorithm - The MRc Algorithm 42
3.3 Analysis of The MRc Algorithm 48
3.4 Experimental Results 51
3.5 Conclusions 56
CHAPTER 4 AUTOMATA TECHNIQUE FOR THE LONGEST COMMON SUBSEQUENCE PROBLEM 57
4.1 Introduction 57
Trang 64.2 Mathematical Basis 58
4.3 Automata Models for Solving The LCS Problem 62
4.4 Experimental Results 67
4.5 Conclusions 68
CHAPTER 5 CRYPTOGRAPHY BASED ON STEGANOGRAPHY AND AUTOMATA METHODS FOR SEARCHABLE ENCRYPTION 69
5.1 Introduction 69
5.2 A Novel Cryptosystem Based on The Data Hiding Scheme (2, 9, 8) 71
5.3 Automata Technique for Exact Pattern Matching on Encrypted Data 75
5.4 Automata Technique for Approximate Pattern Matching on Encrypted Data 77 5.5 Conclusions 79
CONCLUSION 81
LIST OF PUBLICATIONS 82
BIBLIOGRAPHY 83
Trang 7LIST OF SYMBOLS
Σ∗ The set of all strings on Σ
|S| The number of elements of a set S
GF (pm) The Galois field is constructed from the polynomial ring Zp[x],
where p is prime and m is a positive integer(GFn(pm), +, ·) A vector space over the field GF (pm)
LCS(p, x) A longest common subsequence of p and x
lcs(p, x) The length of a LCS(p, x)
LeftID(u) The least element the leftmost location of u
Rmp(u) The last component of LeftID(u) in p
(I, M, K, Em, Ex) A data hiding scheme
I A set of all image blocks with the same size and image format
Em An embedding function embeds a secret element in an image
block
Ex An extracting function extracts an embedded secret element
from an image block
qcolour The number of different ways to change the colour of each
pixel in an arbitrary image block
Adjacent(cp, a) An adjacent vertex of cp
c block A string of length c
Posp(z) The last position of appearance of z in p
Mp An automaton accepting the pattern p
Config(p) The set of all the configurations of p
Wp(u) The weight of u in p
WConfig(p) The set of the weights of all the configurations of p
Wpi(a) The weight of a at the location i in p
Wmp(a) The heaviest weight of a in p
Trang 8LIST OF ABBREVIATIONS
BNDM Backward Nondeterministic Dawg Matching
FOPA Fastest Optimal Parity Assignment
HCIH High Capacity of Information Hiding
TVSBS Thathoo Virmani Sai Balakrishnan Sekar
Trang 9LIST OF FIGURES
Figure 1.1 A simple graph 5
Figure 1.2 A spanning tree of the graph given in Figure 1.1 6
Figure 1.3 The transition diagram of A in Example 1.3 7
Figure 1.4 The basic diagram of digital image steganography 9
Figure 1.5 The degree of appearance of the pattern p 12
Figure 2.1 The nine commonly used 8-bit gray cover images sized 512 × 512 pixels 35 Figure 2.2 The nine commonly used 8-bit palette cover images sized 512 × 512 pixels 36
Figure 2.3 The binary cover image sized 2592 × 1456 pixels 36
Figure 3.1 Sliding window mechanism 41
Figure 3.2 The basic idea of the proposed approach 45
Figure 3.3 The transition diagram of the automaton Mp, p = abcba 47
Trang 10LIST OF TABLES
Table 1.1 An adjacency list representation of the simple graph given in Figure 1.1 5
Table 1.2 The performing steps of the BF algorithm 11
Table 1.3 The dynamic programming matrix L 13
Table 2.1 Elements of the Galois field GF (22) represented by binary strings and decimal numbers 30
Table 2.2 Operations + and · on the Galois field GF (22) 30
Table 2.3 The representation of E and the arc weights of G for the gray image 31 Table 2.4 The payload, ER and PSNR for the optimal data hiding scheme (1, 2n− 1, n) for palette images with qcolour = 1 37
Table 2.5 The payload, ER and PSNR for the near optimal data hiding scheme (2, 9, 8) for gray images with qcolour = 3 37
Table 2.6 The payload, ER and PSNR for the near optimal data hiding scheme (2, 9, 8) for palette images with qcolour = 3 38
Table 2.7 The comparisons of embedding and extracting time between the chapter’s and Chang et al.’s approach for the same optimal data hiding scheme (1, N, blog2(N + 1)c), where N = 2n − 1, for the binary image with qcolour = 1 Time is given in second unit 38
Table 3.1 The performing steps of the MR1 algorithm 47
Table 3.2 Experimental results on rand4 problem 52
Table 3.3 Experimental results on rand8 problem 52
Table 3.4 Experimental results on rand16 problem 53
Table 3.5 Experimental results on rand32 problem 53
Table 3.6 Experimental results on rand64 problem 54
Table 3.7 Experimental results on rand128 problem 54
Table 3.8 Experimental results on rand256 problem 55
Table 3.9 Experimental results on a genome sequence (with |Σ| = 4) 55
Table 3.10 Experimental results on a protein sequence (with |Σ| = 20) 56
Table 4.1 The Refp of p = bacdabcad 60
Table 4.2 The comparisons of the lcs(p, x) computation time for n = 50666 67
Table 4.3 The comparisons of the lcs(p, x) computation time for n = 102398 68
Trang 11In the modern life, when the use of computer and Internet is more and more essential,digital data (information) can be copied as well as accessed illegally As a result,information security becomes increasingly important There are two popular methods toprovide security, which are cryptography and data hiding [2, 5, 6, 20, 56, 62, 81].Cryptography is used to encrypt data in order to make the data unreadable by a thirdparty [5] Data hiding is used to embed data in digital media Based on the purpose ofthe application, data hiding is generally divided into steganography that hides theexistence of data to protect the embedded data and watermarking that protects thecopyright ownership and authentication of the digital media carrying the embedded data.Steganography can be used as an alternative way to cryptography However,steganography will become weak if attackers detect existence of hidden data Henceintegrating cryptography with steganography is as a third choice for data security[2, 5, 6, 12, 19, 61, 62, 81, 86, 93]
With the rapid development of applications based on Internet infrastructure, cloudcomputing becomes one of the hottest topics in the information technology area Indeed, it
is a computing system based on Internet that provides on-demand services from applicationand system software, storage to processing data For example, when cloud users use thestorage service, they can upload information to the servers and then access it on the Internetonline Meanwhile, enterprises can not spend big money on maintaining and owning asystem consisting of hardware and software Although cloud computing brings manybenefits for individuals and organizations, cloud security is still an open problem when cloudproviders can abuse their information and cloud users lose control of it Thus, guaranteeingprivacy of tenants’ information without negating the benefits of cloud computing seemsnecessary [28, 38, 40, 41, 60, 95, 102] In order to protect cloud users’ privacy, sensitivedata need to be encoded before outsourcing them to servers Unfortunately, encryptionmakes the servers perform search on ciphertext much more difficult than on plaintext Tosolve this problem, many searchable encryption techniques have been presented since 2000.Searchable encryption does not only store users’ encrypted data securely but also allowsinformation search over ciphertext [26, 28, 29, 38, 40, 60, 71, 85, 102]
Searchable encryption for exact pattern matching is a new class of searchable encryptiontechniques The solutions for this class have been presented based on algorithms for [26]
or approaches to [41, 89] exact pattern matching
As in retrieving information from plaintexts, the development of searchable encryptionwith approximate string matching capability is necessary, where the search string can
be a keyword determined, encrypted and stored in cloud servers or an arbitrary pattern[28, 40, 71]
From the above problems, together with the high efficiency of techniques using graph andautomata proposed by P T Huy et al for dealing with problems of exact pattern matching(2002), longest common subsequence (2002) and steganography (2011, 2012 and 2013), aswell as potential applications of graph theory and automata approaches suggested by LateAssoc Prof Phan Trung Huy in steganography and searchable encryption, and under
Trang 12the direction of supervisors, the dissertation title assigned is research on development
of methods of graph theory and automata in steganography and searchableencryption
The purpose of the dissertation is to research on the development of new and qualitysolutions using graph theory and automata, suggesting their applications in, and applyingthem to steganography and searchable encryption
Based on results published and suggestions presented by Late Assoc Prof Phan TrungHuy in steganography and searchable encryption, the dissertation will focus on followingfour problems in these fields:
- Digital image steganography;
- Exact pattern matching;
- Longest common subsequence;
- Searchable encryption
The first problem is stated newly in Chapter 2, the three remaining problems are recalledand clarified in Chapter 1 In addition, background related to these problems is presentedclearly and analysed very carefully in Chapters of the dissertation
For the first three problems, the dissertation’s work is to find new and efficient solutionsusing graph theory and automata Then they will be used and applied to solve the lastproblem
The dissertation has been completed with structure as follows Apart fromIntroduction at the beginning and Conclusion at the end of the dissertation, the maincontent of it is divided into five chapters
Chapter 1 Preliminaries This chapter recalls basic knowledge indicatedthroughout the dissertation (strings, graph, deterministic finite automata, digital images,the basic model of digital image steganography, some parameters to determine thequality of digital image steganography, the exact pattern matching problem, the longestcommon subsequence problem, and searchable encryption), re-presents importantconcepts and results used and researched on development in remaining chapters of thedissertation (adjacency list, breadth first search, Galois field, the fastest optimal parityassignment method, the module method and the concept of the maximal secret dataratio, the concept of the degree of fuzziness (appearance), the Knapsack Shakingapproach, and the definition of a cryptosystem)
Chapter 2 Digital image steganography based on the Galois field usinggraph theory and automata Firstly, from some proposed concepts of optimal andnear optimal secret data hiding schemes, this chapter states the interest problem in digitalimage steganography Secondly, the chapter proposes a new approach based on the Galoisfield using graph theory and automata to design a general form of steganography in binary,gray and palette images, shows sufficient conditions for existence and proves existence ofsome optimal and near optimal secret data hiding schemes, applies the proposed schemes
to the process of hiding a finite sequence of secret data in an image and gives securityanalyses Finally, the chapter presents experimental results to show the efficiency of theproposed results
Chapter 3 An automata approach to exact pattern matching This chapterproposes a flexible approach using automata to design an effective algorithm for exactpattern matching in practice In given cases of patterns and alphabets, the efficiency ofthe proposed algorithm is shown by theoretical analyses and experimental results
Trang 13Chapter 4 Automata technique for the longest common subsequenceproblem This chapter proposes two efficient sequential and parallel algorithms forcomputing the length of a longest common subsequence of two strings in practice, usingautomata technique Theoretical analysis of parallel algorithm and experimental resultsconfirm that the use of the automata technique in designing algorithms for solving thelongest common subsequence problem is the best choice.
Chapter 5 Cryptography based on steganography and automata methodsfor searchable encryption This chapter first proposes a novel cryptosystem based on
a data hiding scheme proposed in Chapter 2 with high security Additionally, ciphertexts
do not depend on the input image size as existing hybrid techniques of cryptography andsteganography, encoding and embedding are done at once The chapter then applies resultsusing automata technique of Chapters 3 and 4 to constructing two algorithms for exactand approximate pattern matching on secret data encrypted by the proposed cryptosystem.These algorithms have O(n) time complexity in the worst case, together with an assumptionthat the approximate algorithm uses d(1 − )me processors, where , m and n are the error
of the string similarity measure proposed in this chapter and lengths of the pattern andsecret data, respectively In searchable encryption, the cryptosystem can be used to encodeand decode secret data on users side and pattern matching algorithms can be used toperform pattern search on cloud providers side
The contents of the dissertation are written based on the paper [T1] published in 2019,the paper [T4] accepted for publication in 2020 in KSII Transactions on Internet andInformation Systems (ISI), and the papers [T2, T3] published in Journal of ComputerScience and Cybernetics in 2019 The main results of the dissertation have been presentedat:
- Seminar on Mathematical Foundations for Computer Science at Institute ofMathematics, Vietnam Academy of Science and Technology,
- The 9th Vietnam Mathematical Congress, Nha Trang, August 14-18, 2018,
- Seminar at School of Applied Mathematics and Informatics, Hanoi University ofScience and Technology
Trang 141.1 Basic Structures
1.1.1 Strings
In this dissertation, secret data are considered as strings So, some terms related tostrings will be recalled here [11, 24, 83]
A finite set Σ is called an alphabet The number of elements of Σ is denoted by |Σ|
An element of Σ is called a letter A string (also referred to as a text) x of length n on thealphabet Σ is a finite sequence of letters of Σ and we write
x = x[1]x[2] x[n], x[i] ∈ Σ, 1 ≤ i ≤ n,where n is a positive integer
A special string is the empty string having no letters, denoted by The length of thestring x is the number of letters in it, denoted by |x| Then || = 0
Notice that for the string x = x[1]x[2] x[n], we can also write x = x[1 n] in short.The set of all strings on the alphabet Σ is denoted by Σ∗ The operator of strings isconcatenation that writes strings as a compound The concatenation of the two strings u1and u2 is denoted by u1u2
Let x be a string A string p is called a substring of the string x, if x = u1pu2 for somestrings u1 and u2 In case u1 = (resp u2 = ), the string p is called a prefix (resp suffix)
of the string x The prefix (resp suffix) p is called proper if p 6= x Note that the prefix
or the suffix can be empty
1.1.2 Graph
Besides some basic concepts in graph theory, this subsection recalls the way representing
a graph by adjacency lists and breadth first search [82] These are used in Chapter 2
A finite undirected graph (hereafter, called a graph for short) G = (V, E) consists of anonempty finite set of vertices V and a finite set of edges, where each edge has either one
or two vertices associated with it A graph with weights assigned to their edges is called aweighted graph
Trang 15An edge connecting a vertex to itself is called a loop Multiple edges are edges connecting
the same vertices A graph having no loops and no multiple edges is called a simple graph
In a simple graph, the edge associated to an unordered pair of vertices {i, j} is called the
edge {i, j}
Two vertices i and j in a graph G are called adjacent if they are vertices of an edge of
G
A graph without multiple edges can be described by using adjacency lists, which specify
adjacent vertices of any vertex of the graph
Example 1.1 Using adjacency lists, the simple graph given in Figure 1.1 can be
Stego Image
Cover Image
Given a simple graph G, a subgraph of G that is a tree including every vertex of G is
called a spanning tree of G A spanning tree of a connected simple graph can be built by
using breadth first search (BFS) This algorithm is shown in pseudo-code as follows
Breadth First Search:
Input: A connected simple graph G with vertices ordered as i1, i2, , in
Output: A spanning tree T
Set T to be a tree consisting only i1;
Set L to be an empty list;
Trang 16For each adjacent vertex j of i
If (j is not in L and T ){
Add j to the end of L;
Add j and the edge {i, j} to T ;}
}Return T ;End
Example 1.2 For a graph given in Figure 1.1, a spanning tree of this graph is found byusing BFS as in Figure 1.2
Stego Image
Cover Image
Figure 1.2 A spanning tree of the graph given in Figure 1.1
A graph with directed edges (or arcs) is called a directed graph Each arc is associatedwith the ordered pair of vertices In a simple directed graph, the arc associated with theordered pair (i, j) called the arc (i, j) And the vertex i is said to be adjacent to the vertex
j and the vertex j is said to be adjacent from the vertex i
1.1.3 Deterministic Finite AutomataStudy on the problem of the construction and the use of deterministic finite automata
is one of objectives of the dissertation Hence, this subsection will clarify this model ofcomputation [44, 82]
Definition 1.1 ([44]) Let Σ be an alphabet A deterministic finite automaton (hereafter,called an automaton for short) A = (Σ, Q, q0, δ, F ) over Σ consists of:
• A finite set Q of elements called states,
• An initial state q0, one of the states in Q,
• A set F of final states The set F is a subset of Q,
• A state transition function (or simply, transition function), denoted by δ, that takes
as arguments a state and a letter, and returns a state, so that δ : Q × Σ → Q,
• The transition function δ can be extended so that it takes a state and a string, andreturns a state Formally, this extended transition function δ can be defined recursively by
δ : Q × Σ∗→ Qsuch that for all q ∈ Q, s ∈ Σ∗, a ∈ Σ, δ(q, as) = δ(δ(q, a), s) and δ(q, ) = q
Trang 17An alternative and simple way presenting an automaton is to use the notation “transition
diagram” A transition diagram of an automaton A = (Σ, Q, q0, δ, F ) is a directed graph
given as follows [44]
a) Each state of Q is a vertex
b) Let q0 = δ(q, a), where q is a state of Q and a is a letter of Σ Then the transition
diagram has an arc (q0, q) labeled a If there are several letters that cause transitions from
q0 to q, then the arc (q0, q) is labeled by a list of these letters
c) There is an arrow into the initial state q0 This arrow does not originate at any
vertex
d) States not in F have a single circle Vertices corresponding to final states are marked
by a double circle
Example 1.3 Consider an automaton A = (Σ, Q, q0, δ, F ) over Σ = {a, b}, where
Q = {q0, q1, q2}, F = {q2}, and δ is given by the following table Then the transition
diagram of A is shown in Figure 1.3
Stego Image
Cover Image
Figure 1.3 The transition diagram of A in Example 1.3
Definition 1.2 ([82]) A string p is said to be accepted by the automaton
A = (Σ, Q, q0, δ, F ) if it takes the initial state q0 to a final state, it means that δ(q0, p) is
a state in F
1.1.4 The Galois Field GF (pm)
This subsection describes how to construct a finite field with pm elements, called the
Galois field GF (pm), where p is prime and m ≥ 1 is an integer [88] The algebraic structure
will be used in Chapter 2
Let p be a prime number Define Zp[x] to be the set of all polynomials with the variable
x, whose coefficients belong to the field Zp Addition and multiplication in Zp[x] are defined
in the usual way and then reduce the coefficients modulo p at the end
For f (x) ∈ Zp[x], the degree of f (x), denoted by deg(f ), is the largest exponent of
x in f (x) A polynomial f (x) ∈ Zp[x] is called to be irreducible if there does not exist
7
Trang 18polynomials f1(x), f2(x) ∈ Zp[x] such that
f (x) = f1(x)f2(x),where deg(f1) > 0 and deg(f2) > 0
Let f (x) ∈ Zp[x] be an irreducible polynomial with deg(f ) = m ≥ 1 Define
Zp[x]/(f (x)) to be the set of pm polynomials of degree at most m − 1 in Zp[x] Additionand multiplication in Zp[x]/(f (x)) are given as in Zp[x], followed by a reduction modulo
f (x) Then Zp[x]/(f (x)) with these operations is a field having pm elements, called theGalois field GF (pm) Note that for p is prime and m ≥ 1, the Galois field GF (pm) isunique
1.2 Digital Image Steganography
The interest problem in Chapter 2 is digital image steganography This section willrecall the concept of digital images, the basic model of digital image steganography, someparameters to determine the efficiency of digital image steganography and lastly re-presentresults researched on development and used in Chapter 2 such as the fastest optimal parityassignment (FOPA) method, the module method and the concept of the maximal secretdata ratio (MSDR) [18, 20, 21, 39, 49, 50, 51, 53, 61, 63, 65, 76, 78, 104]
A digital image is a matrix of pixels Each pixel is represented by a non negative integernumber in the form of a string of binary bits This value indicates the colour of the pixel[39]
Note that based on the way representing of colours of pixels, digital images can bedivided into following different types [78]
1 Binary image: Each pixel is represented by one bit In this image type, the colour of
a pixel is white, “1” value, or black, “0” value
2 Gray image: Each pixel is typically represented by eight bits (called 8-bit gray image).Then the colour of any pixel is a shade of gray, from black corresponding to colour value
“0” to white corresponding to colour value “255”
3 Red green blue image: Each pixel is usually represented by a string of 24 bits (called24-bit RGB image), where the first 8 bits, the next 8 bits and the last 8 bits corresponds
to shades of red, green and blue, specifying the red, green and blue colour components
of the pixel, respectively Then the colour of the pixel is a combination of these threecomponents
4 Palette image: The colour of each pixel is not shown directly by the numberrepresenting the pixel as for RGB images Instead, this number is a colour index of thecolour of the pixel existed in the colour table (the palette), an ordered set of values (strings
of 24 bits) which represent all colours as in RGB images used in the image and contained
in the file with the image The size of the palette is the same as the length of a bit stringrepresenting a pixel and is limited by 8 bits For a string of 8 bits, call palette images 8-bitpalette images
The objective of digital image steganography is to protect data by hiding the data in
a digital image well enough so that unauthorized users will not even be aware of theirexistence [21, 18] Figure 1.4 shows the basic model of digital image steganography, wherethe cover image is a digital image used as a carrier to embed secret data into, the stegoimage is digital image obtained after embedding secret data into the cover image by the
Trang 19function block Embed with the secret key on the Sender side For steganography generally,
the secret data needs to be extracted fully by the block Extract with the secret key on
the Receiver side [20, 61, 63, 76]
The total number of the secret data sequence bits embedded in the cover image is called
a Payload Corresponding to a certain Payload, to measure the embedding capacity of the
cover image, the embedding rate (ER) is used and defined as follows [104]
Stego Image
Cover Image
on the value of PSNR, we can know the degree of similarity between the cover image and
stego image If the PSNR value is high, then quality of stego image is high Conversely,
quality of stego image is low In general, for the digital image, PSNR is defined by the
where B(i, j), G(i, j), R(i, j), B0(i, j), G0(i, j) and R0(i, j) are the colour value of the Blue,
Green and Red components of a pixel at position (i, j) in the cover and stego image,
respectively For human’s eyes, the threshold value of PSNR value is 30dB [20, 53, 65, 104],
it means that the PSNR value is higher than 30dB, it is hard to distinguish between the
cover image and its stego image
Let G be a palette image and P = {c1, c2, , cn} be its palette, where ci is the colour
of a pixel of G corresponding to the colour index i Each colour c in P is considered as a
vector consisting of red, green and blue components Suppose d is a distance function on P
The FOPA method [50] tries to get functions Next, Next: P → P , and Val, Val: P → Z2,
where two conditions are satisfied for all c ∈ P as follows
Trang 201 d(c, Next(c)) = minv6=c∈Pd(c, v),
2 Val(c) =Val(Next(c)) + 1 on the field Z2
Call GP = (VP, EP) a weighted complete undirected graph of the palette image G, where
VP = P and the weight of the edge {c, c0} is d(c, c0) The function Nearest, Nearest: P → P ,
is given by Nearest(c) = c0 holding d(c, c0) = minv6=c∈Pd(c, v) A rho forest F = (V, E) is
a directed graph with vertices weighted by the functionVal, where V = VP, E is a set ofall arcs (v, Next(v)), the vertex v has the weightVal(v) for all v ∈ V The construction of
a algorithm determining F is the essence of the FOPA method
Algorithm for FOPA:
Input: A weighted complete undirected graph GP, the function Nearest
Output: A rho forest F = (V, E)
Choose a vertext c ∈ P , set V = {c}, and set C = P \{c};
SetVal(c) = 0; // Or 1 randomly
While (C is not empty) // Update F
{
a) Take one element v ∈ C;
b) Initialize v0 = v, setVal(v0) = 0 (or 1 randomly), by a finite loop, find a longestsequence of k + 1 different elements in P consecutively, v0, v1, , vk, such thatNearest(vi) = vi+1 for i = 0, k − 1, vi ∈ C, vk ∈ C or vk ∈ V , and set
Next(vi) = vi+1, i = 0, k − 1;
b1) Case vk ∈ C: SetVal(vi) = 1+Val(vi−1), i = 1, k and Next(vk) = vk−1;
Set V = V ∪ {v0, v1, , vk} and C = C\{v0, v1, , vk};b2) Case vk ∈ F : SetVal(vi) = 1+Val(vi+1), i = k − 1, , 1, 0;
Set V = V ∪ {v0, v1, , vk−1} and C = C\{v0, v1, , vk−1};}
Return F ;
End
Definition 1.3 ([51]) Let M be a module over the ring Zm, k > 0 be a natural number,and U be a subset of M \{0} Call U a k-base of M if for any v in M \{0}, thereexist t elements v1, v2, , vt ∈ U, t ≤ k, together with a1, a2, , at ∈ Zm such that
v = v1a1+ v2a2+ + vtat
Let G be a digital image, call CG the set of all colours of pixels in G Consider thecase m = 2 and G is a binary image Then CG = {0, 1}, and for n is a positive integer,the set M = Z2n = {(x1, x2, , xn)|xi ∈ Z2, i = 1, n} with element addition and scalarmultiplication defined as usual is a module over the ring Z2 [49] For k = 1, the set
U = M \{0} is an unique 1-base of M [51] Two functions Next, Next: CG → CG, andVal, Val: CG → Z2, satisfying the condition Val(c) =Val(Next(c)) + 1 on the ring Z2, aredefined in [49] Suppose that for N ≥ |U |, I = {I1, I2, , IN} is an arbitrary image block
of G, K = {K1, K2, , KN|Ki ∈ Z2, i = 1, N } is a secret key, d is any element in M , and
h is a surjective function from I to U In the module method, d is considered as a secretdata, embedded in and extracted from the image block I with the key K by the blocksEmbed and Extract as follows [49, 51]
Trang 21The block Embed (embedding d in I):
Step 1) Compute m =PNi=1h(Ii)(Val(Ii) + Ki);
Step 2) Case d = m: Keep I intact;
Case d 6= m: Find v ∈ U such that d + (−m) = v Based on v and h, determine
an element Ii of I Then change Ii to Ii0 = Next(Ii);
Return I0;
The block Extract (extracting d from I0): d = PN
i=1h(Ii0)(Val(Ii0) + Ki);
Definition 1.4 ([49]) MSDRk(N ) is the largest number of embedded bits of secret data
in an image block of N pixels by changing colours of at most k pixels in the image block,where k, N are positive integers
Given a positive integer qcolour, call qcolour the number of different ways to change thecolour of each pixel in an arbitrary image block of N pixels According to [49]
MSDRk(N ) = blog2(1 + qcolourCN1 + q2colourCN2 + · · · + qcolourk CNk)c (1.3)
1.3 Exact Pattern Matching
This section will restate the exact pattern matching problem, and recall the concept ofthe degree of fuzziness (appearance) used in Chapter 3 [24, 52, 68]
Let x be a string of length n Denote the substring x[i]x[i + 1] x[j] of x by x[i j]for all 1 ≤ i ≤ j ≤ n, the ith element of x by x[i] and i is called a position in x Let
p be a substring of length m of x, where m is a positive integer, then there exists i for
1 ≤ i ≤ n − m + 1 such that p = x[i i + m − 1] And say that i is an occurrence of p in x
or p occurs in x at position i
Definition 1.5 ([68]) Let p be a pattern of length m and x be a text of length n overthe alphabet Σ Then the exact pattern matching problem is to find all occurrences of thepattern p in x
The following example uses the Brute Force (BF) algorithm [24] to demonstrate themost original way solving this problem
Table 1.2 The performing steps of the BF algorithm
Trang 22Example 1.4 Given a pattern p = fah and a text x = dfahfkfaha Then there are twooccurrences of p in x as shown below: dfahfkfaha The BF algorithm is performed by thefollowing steps presented in Table 1.2, the bold letters correspond to the mismatches, theunderlined letters represent the matches when comparing the letters of the pattern andthe text We know that many letters scanned will be scanned again by the BF algorithmbecause each time either a mismatch or a match occurs, the pattern is only moved to theright one position.
Chapter 3 uses the degree of fuzziness in [52] to determine the longest prefix of thepattern in the text at any position However, this terminology can lead to severalmisunderstandings for the readers So throughout this dissertation, the degree offuzziness will be replaced with the degree of appearance The concept of the degree ofappearance is restated as follows
Definition 1.6 ([52]) Let p be a pattern and x be a text of length n over the alphabet
Σ Then for each 1 ≤ i ≤ n, a degree of appearance of p in x at position i is equal to thelength of a longest substring of x such that this substring is a prefix of p, where the rightend letter of the substring is x[i]
Notice that obviously, if the degree of appearance of p in x at an arbitrary position iequals |p|, then a match for p in x occurs at position i − |p| + 1 Figure 1.3 illustrates theconcept of the degree of appearance of the pattern p in x
Figure 1.5 The degree of appearance of the pattern p
1.4 Longest Common Subsequence
This section will recall the longest common subsequence (LCS) problem, and theKnapsack Shaking approach addressing the problem studied on development in Chapter 4[24, 47, 94, 101]
Definition 1.7 ([101]) Let p be a string of length m and u be a string over the alphabet
Σ Then u is a subsequence of p if there exists a integer sequence j1, j2, , jt such that
(i) u is a common subsequence of p and x,
(ii) There does not exist any common subsequence v of p and x such that |v| > |u|
Trang 23Denote an arbitrary longest common subsequence of p and x by LCS(p, x) The length
Problem 1 Find a longest common subsequence of p and x
Problem 2 Compute the length of a longest common subsequence of p and x
The simple way to solve the LCS problem is to use the algorithm introduced byWagner and Fischer in 1974 (called the Algorithm WF) This algorithm defines a dynamicprogramming matrix L(m, n) recursively to find a LCS(p, x) and compute the lcs(p, x) asfollows [94]
Example 1.6 Let p = bgcadb and x = abhcbad Use the Algorithm WF, the L(m, n)
is obtained below Then lcs(p, x) = L(6, 7) = 4 In Table 1.3, by traceback procedure,starting from value 4 back to value 1, a LCS(p, x) found is a string bcad
Table 1.3 The dynamic programming matrix L
Definition 1.10 ([47]) Let u = p[j1]p[j2] p[jt] be a subsequence of p Then an element
of the form (j1, j2, , jt) is called a location of u in p
From Definition 1.10, the subsequence u has at least a location in p If all the differentlocations of u are arranged in the dictionary order, then call the least element the leftmostlocation of u, denoted by LeftID(u) Denote the last component of LeftID(u) by Rmp(u)[47]
Trang 24Example 1.7 Let p = aabcadabcd and u = abd Then u is a subsequence of p and hasseven different locations in p, in the dictionary order they are
(1, 3, 6), (1, 3, 10), (1, 8, 10), (2, 3, 6), (2, 3, 10), (5, 8, 10), (7, 8, 10)
It follows that LeftID(u) = (1, 3, 6) and Rmp(u) = 6
Definition 1.11 ([47]) Let p be a string of length m Then a configuration C of p isdefined as follows
1 Or C is the empty set Then C is called the empty configuration of p, denoted by
C0
2 Or C = {x1, x2, , xt} is an ordered set of t subsequences of p for 1 ≤ t ≤ m suchthat the two following conditions are satisfied
(i) For all 1 ≤ i ≤ t, |xi| = i,
(ii) For all xi, xj ∈ C, if |xi| > |xj| then Rmp(xi) >Rmp(xj)
Set of all the configurations of p is denoted by Config(p)
Definition 1.12 ([47]) Let p be a string of length m on the alphabet Σ, C ∈ Config(p)and a ∈ Σ Then a state transition function ϕ on Config(p) × Σ such that
ϕ : Config(p) × Σ → Config(p) defined as follows
In 2002, P T Huy et al introduced a method to solve the Problem 1 by using theautomaton given as in the following theorem In this way, they named their method theKnapsack Shaking approach [47]
Theorem 1.1 ([47]) Let p and x be two strings of lengths m and n over the alphabet
Σ, m ≤ n Let Ap = (Σ, Q, q0, ϕ, F ) corresponding to p be an automaton over the alphabet
Σ, where
• The set of states Q = Config(p),
Trang 25• The initial state q0 = C0,
• The transition function ϕ is given as in Definition 1.12,
• The set of final states F = {Cn}, where Cn = ϕ(q0, x)
Suppose Cn = {x1, x2, , xt} for 1 ≤ t ≤ m Then
1 For every subsequence u of p and x, there exists xi∈ Cn, 1 ≤ i ≤ t such that the twofollowing conditions are satisfied
(i) |u| = |xi|,
(ii) Rmp(xi) ≤ Rmp(u)
2 A LCS(p, x) equals xt
1.5 Searchable Encryption
This section clarifies the term of searchable encryption (SE) and recalls the definition
of a cryptosystem They will be studied and used in Chapter 5 [26, 40, 60, 85, 88, 102].Consider a problem to occur in cloud security as follows [60, 85, 102] Cloud tenants, forexample enterprises and individuals with limited resource including software and hardware,store data with sensitive information on cloud servers Assume that these servers cannot
be fully trusted This means they may not only be curious about the users’ informationbut also abuse the data received Then users wish to encrypt their data before uploadingthem to servers Because of limitations of cloud users’ information technology system,users also wish that cloud providers can help them perform information search directly
on ciphertexts However, encryption brings difficulties for servers to do search on theencrypted data These lead to a problem that is to find a solution to satisfy the two wishes
of cloud users when they choose cloud storage service
SE is a way to solve the above problem It is indeed a system consisting of twomain components, a cryptosystem is used to encode and decode on cloud users side andalgorithms for searching on encrypted data are done on cloud providers side [40, 102]
In cryptography, SE can be either searchable symmetric encryption (SSE) or searchableasymmetric encryption (SAE) In SSE, only private key holders can create encrypted dataand produce trapdoors for search In SAE, users who have the public key can makeciphertexts but only private key holders can generate trapdoors [26, 102]
Since the dissertation proposes a new symmetric encryption system for SSE in Chapter
5, the correctness of this system needs to prove In this dissertation, the components andproperties of a cryptosystem defined in [88] will be considered as a standard form to verify.Here recalls this definition
Definition 1.13 ([88]) A cryptosystem is a five-tuple (P, C, K, E , D) such that thefollowing properties are satisfied
1 P is a finite set of plaintexts,
2 C is a finite set of ciphertexts,
3 K is a finite set of secret keys,
4 For every k ∈ K, there exists an encrypting function ek ∈ E and a correspondingdecrypting function dk ∈ D, where ek : P → C and dk : C → P holds dk(ek(x)) = x foreach x ∈ P
Trang 26on the Galois field GF (pm) using graph and automata to design the data hiding scheme
of the general form (k, N, blog2pmnc) for binary, gray and palette images with the givenassumptions, where k, m, n, N are positive integers and p is prime, shows sufficientconditions for existence and proves existence of some optimal and near optimal secretdata hiding schemes These results are derived from the concept of the maximal secretdata ratio of embedded bits, the module method and the FOPA method proposed by
P T Huy et al in 2011, 2012 and 2013, recalled in Section 1.2 of Chapter 1 Anapplication of the schemes to the process of hiding a finite sequence of secret data in animage is also considered Security analyses and experimental results confirm that theproposed approach can create steganographic schemes which achieve high efficiency inembedding capacity, visual quality, speed as well as security, which are key properties ofsteganography
The results of Chapter 2 have been published in [T1]
2.1 Introduction
In steganography, depend on the type of digital media there are many types ofsteganography such as image, audio and video steganography [4, 5, 20, 61, 62, 75, 76, 96].However, image steganography is used the most popularly because digital images areoften transmitted on Internet and they have high degree of redundancy Furthermore, thetechnique of image steganography is mainly image steganography in spatial domain,steganography is achieved by changing colours of some pixels directly in the image[17, 57, 62, 76, 100] The chapter’s work focuses on steganography in digital images inspatial domain
Digital image steganography studies the steganographic schemes, where each schemeconsists of an embedding function and extracting function The embedding function showshow to embed secret data in the digital image and the extraction function describes how
to extract the data from the digital image carrying the embedded data [46, 87]
In digital image steganography, a few main factors must be taken in consideration when
we design a new secret data hiding scheme, which are embedding capacity of the coverimage, quality of stego image and security However, as well known, embedding capacity
of the cover image and quality of its stego image are irreconcilable conflict A balanceachieved of the two factors can be done according to different application requirements Inaddition to the three main factors, speed of the embedding and extracting functions also
Trang 27plays an important role in steganographic schemes It is considered as a last constraint todetermine efficiency of schemes [46, 53, 65, 69, 87, 104].
The simplest and most popular spatial domain image steganography method is the leastsignificant bit (LSB) substitution (called LSB based method) For 24-bit RGB and 8-bitgray images, in this method the data is embedded in the cover image by changing the leastsignificant bits of the image directly, therefore it becomes vulnerable to security attacks[18, 62, 72, 75, 76, 97, 104] EZ Stego method for palette images is similar to the commonlyused LSB based method However, this method does not guarantee quality of stego images[36, 37, 97] To alleviate this problem, in 1999, Fridrich proposed a new method based
on the parity bits of colour indexes of pixels in palette cover images, called the parityassignment (PA) method Then EZ Stego method can be considered as an example of
PA method [36, 50] In 2000, Fridrich et al improved the method by investigating theproblem of optimal parity assignment for the palette and this version is called the optimalparity assignment (OPA) method [37] To easily control quality of stego images, Huy et
al introduced another OPA method, called the FOPA method, in 2013 [50] Unlike thecolour and gray images, each pixel in binary images only requires one bit to represent colourvalues (black and white), therefore, modifying pixels can be easily detected So, binaryimage steganography is a more difficult and challenging problem For binary images,block based method is usually used to maintain quality of stego images In this method,the cover and stego images are partitioned into individual image blocks of the same size,embedding and extracting secret data are based on the characteristic values calculated forthe blocks WL (Wu et al., 1998), PCT (Pan et al., 2000), modified PCT (Tseng et al.,2001), CTL (Chang et al., 2005) schemes are all well known and block based for binaryimages [21, 18, 48, 75, 92]
Given a qcolourwhich is the number of different ways to change the colour of each pixel in
an arbitrary image block, and use the concept of the maximal secret data ratio of embeddedbits proposed by Huy et al in 2011 [49], the chapter introduces concepts of optimaland near optimal secret data hiding schemes Actually, the optimality of steganographicschemes has been considered in [37, 46] However, the authors used the time complexity
of embedding and extracting functions, or the concept of optimal parity assignment thatminimizes the energy of the parity assignment for the colour palette to determine whether
a steganographic scheme is optimal
By the block based method, call a secret data hiding scheme a data hiding scheme(k, N, r), where k, N, r are positive integers, if the embedding function can embed r bits
of secret data in each image block of N pixels by changing colours of at most k pixels inthe image block The chapter’s work is concerned with the problem of designing optimal
or near optimal data hiding schemes (k, N, r) for digital images (binary, gray and paletteimages)
Based on the module approach and the (FOPA) method using graph theory proposed
by Huy et al in 2011 and 2013 [49, 50], the chapter proposes a new approach based on theGalois field using graph and automata in order to solve the problem By this approach,the chapter proposes schemes consisting of the optimal data hiding scheme (1, 2n− 1, n)for binary, gray and palette images with qcolour = 1, where n is a positive integer, the nearoptimal data hiding scheme (2, 9, 8) and the optimal data hiding scheme (1, 5, 4) for grayand palette images with qcolour = 3 Security analyses show that an application of theseschemes to the process of hiding a finite sequence of secret data in an image can avoid
Trang 28detection from brute-force attacks.
The experimental results reveal that the efficiency in embedding capacity and visualquality of the near optimal data hiding scheme (2, 9, 8) for gray images with qcolour = 3 isindeed better than the efficiency of the HCIH scheme [104] The embedding and extractingtime of the proposed approach are faster than that of the Chang et al.’s approach [18] Forthe near optimal data hiding scheme (2, 9, 8) for palette images with qcolour = 3 and theoptimal data hiding scheme (1, 2n− 1, n) for palette images with qcolour = 1, values of ERcan be selected suitably to achieve acceptable quality of the stego images
The rest of the chapter is organized as follows Section 2.2 gives some new conceptsand states the chapter’s digital image steganography problem Section 2.3 consists of twoSubsections 2.3.1 and 2.3.2 Subsection 2.3.1 introduces mathematical basis based on theGalois field GF (pm) for the digital image steganography problem, where p is prime and
m is a positive integer Subsection 2.3.2 firstly proposes a digital image steganographyapproach based on the Galois field GF (pm) using graph and automata to design the datahiding scheme of the general form (k, N, blog2pmnc) for the given assumptions, where
k, m, n, N are positive integers and p is prime Secondly, the subsection gives sufficientconditions for existence of the optimal data hiding schemes (1,ppmnm −1−1, blog2pmnc) and(2,
of hiding a finite sequence of secret data of length blog2pmnc bits in an image is considered.Subsection 2.4 proves that there exist the near optimal data hiding scheme (2, 9, 8) andthe optimal data hiding scheme (1, 5, 4) for gray and palette images with qcolour = 3.Section 2.5 shows experimental results in order to evaluate the efficiency of the proposeddata hiding schemes and approach Lastly, some conclusions are drawn from the proposedapproach and experimental results in Section 2.6
2.2 The Digital Image Steganography Problem
This section gives some new concepts and states the chapter’s digital imagesteganography problem
Definition 2.1 A block based secure data hiding scheme in digital images (for short, called
a data hiding scheme) is a five-tuple (I, M, K, Em, Ex), where the following conditions aresatisfied
1 I is a set of all image blocks with the same size and image type,
2 M is a finite set of secret elements,
3 K is a finite set of secret keys,
4 Em is an embedding function to embed a secret element in an image block,
Trang 29Definition 2.2 A data hiding scheme (I, M, K, Em, Ex) is called a data hiding scheme(k, N, r), where k, N, r are positive integers, if each image block in I has N pixels and theembedding function Em can embed r bits of secret data in an arbitrary image block bychanging colours of at most k pixels in the image block.
Definition 2.3 For a given qcolour, a data hiding scheme (k, N, r) is called an optimaldata hiding scheme if r = MSDRk(N ) and there does not exist a positive integer N0 suchthat N0 < N , r = MSDRk(N0) Then N is denoted by Noptimum
Definition 2.4 For a given qcolour, a data hiding scheme (k, N, r) is called a near optimaldata hiding scheme if r = MSDRk(N ) and N > Noptimum
The chapter’s digital image steganography problem Design optimal or near optimaldata hiding schemes (k, N, r) for digital images (binary, gray and palette images)
2.3 A New Digital Image Steganography Approach
This section introduces mathematical basis based on the Galois field for the digitalimage steganography problem (Subsection 2.3.1), proposes a digital image steganographyapproach based on the Galois field using graph theory and automata to design the datahiding scheme of the general form (k, N, blog2pmnc) for the given assumptions, where
k, m, n, N are positive integers and p is prime, shows sufficient conditions for existence andproves existence of some optimal data hiding schemes (Subsection 2.3.2) Security analysesand an application of these data hiding schemes to the process of hiding a finite sequence
of secret data in an image are considered in Subsection 2.3.2
2.3.1 Mathematical Basis based on The Galois Field
This subsection constructs mathematical basis based on the Galois field GF (pm) forthe digital image steganography problem, where p is prime and m is a positive integer(Propositions 2.2, 2.4 and Theorem 2.1)
Given the Galois field GF (pm), recalled in Subsection 1.1.4 of Chapter 1, where p
is prime and m is a positive integer Let GFn(pm) = {(x1, x2, , xn)|xi ∈ GF (pm),
i = 1, n}, where n is a positive integer, with two operations of vector addition + and scalarmultiplication · are defined as follows
x + y = (x1+ y1, x2+ y2, , xn+ yn),
ax = (ax1, ax2, , axn), a ∈ GF (pm),where x, y ∈ GFn(pm) and x = (x1, x2, , xn), y = (y1, y2, , yn) We remember that(GFn(pm), +, ·) is a vector space over the field GF (pm) [13]
Definition 2.5 The class of an element x ∈ GFn(pm), denoted by [x], is given by
Trang 30Proof Suppose [x] ∩ [y] 6=∅, then there exists z in [x] ∩ [y] By Definition 2.5, z = ax = by.Since a ∈ GF (pm)\{0}, x = a−1by Thus x ∈ [y] and therefore [x] ⊂ [y] Similarly, [y] ⊂ [x]and hence [x] = [y].
Propostion 2.1 The set of all classes forms a partition of the set GFn(pm)
Proof For all x ∈ GFn(pm), then x ∈ [x] by Definition 2.5 Thus the union of all classes is
GFn(pm) By Lemma 2.1, any two distinct classes are disjoint The proof is complete.Denote the set of all classes by [GFn(pm)] This can be represented by[GFn(pm)] = {[x]|x ∈ GFn(pm)} The number of elements of a set S is denoted by |S|.Propostion 2.2 |[GFn(pm)]\{0}| = ppmnm −1−1
y, y0 ∈ [x], y 6= y0, [x] ∈ [GFn(pm)]\{0}, then y = ax, y0 = bx for a, b ∈ GF (pm)\{0} Since
y 6= y0, x 6= 0, then a 6= b Clearly, |GF (pm)\{0}| = pm − 1 (see [88]) Since x 6= 0, then
b) For all v ∈ V \{0}, there exists t such that 1 ≤ t ≤ k and v = Pt
i=1aivi, where
v1, v2, , vt ∈ S, a1, a2, , at∈ K\{0}
Lemma 2.2 Let S = {v1, v2, , vt} be a k-Generators for the vector space GFn(pm).Then S0= {[v1], [v2], , [vt]} is a k-[Generators] for the set [GFn(pm)]
Trang 31Proof Since S is a k-Generators for GFn(pm), then for all v, v0 ∈ S, there does not exists
a in GF (pm) such that v0 = av By Proposition 2.1 and Definition 2.5, [vi] 6= [v0i] and[vi] 6= 0, for all vi ∈ S, 1 ≤ i ≤ t For all [u] ∈ [GFn(pm)]\{0}, then u = Pk
The proof is complete
Lemma 2.3 Let S0 = {[v1], [v2], , [vt]} be a k-[Generators] for the set [GFn(pm)] Then
S = {v1, v2, , vt} is a k-Generators for the vector space GFn(pm)
Proof For all v ∈ GFn(pm)\{0}, then
i = 1, k0, k0 ≤ k For all [v], [v0] ∈ S0, then there does not exists a in GF (pm) such that
v0 = av by Proposition 2.1 It means that for all v, v0 ∈ S, there does not exists a in
GF (pm) such that v0= av The proof is complete
Theorem 2.1 There exists S to be a k-Generators for the vector space GFn(pm) with
|S| = N if and only if there exists S0 to be a k-[Generators] for the set [GFn(pm)] with
|S0| = N
Proof This is deduced immediately from Lemmas 2.2 and 2.3
Propostion 2.4 Let c be the number of k-[Generators] of N elements for the set[GFn(pm)] Then the number of k-Generators of N elements for the vector space
Trang 32I = {I1, I2, , IN},where Ii is a colour value for binary and gray images or colour index in the palette forpalette images of the ith pixel in I with i = 1, N Consider C to be a set of all colourvalues or indexes of pixels of I.
Let M be a finite set of secret elements and set M = GFn(pm)
Let K be a finite set of secret keys For all K ∈ K, also assume that the structure ofthe key K is the same as the structure of the image block I So, we can write
K = {K1, K2, , KN}for Ki∈ GF (pm) with i = 1, N
Assume that we find a k-Generators S for GFn(pm) with |S| = N and
Assume that we build a flip graph G = (V, E)
From the way to determine the arc set E in Definition 2.8, assume that
Trang 33Definition 2.10 Let Σ2 = GFn(pm), N = {1, 2, , N }, 2N ×GF (p )\{0} - the set of allsubsets of the set N × GF (pm)\{0} Then δ2 is a function such that
Remark 2.1 For the case v 6= q, then v + (−q) 6= 0 Since S is a k-Generators for
GFn(pm), |S| = N, S = {v1, v2, , vN}, thus there exist k0, k0 ≤ k, vit ∈ S,
1 ≤ it ≤ N, at ∈ GF (pm)\{0}, t = 1, k0 such that v + (−q) = Pk
0
t=1atvi t (on GFn(pm)
So, δ2 given in Definition 2.10 is a function
Definition 2.11 Let I ∈ I, M ∈ M and K ∈ K The automaton A(I, M, K) is afive-tuple (Σ, Q, q0, δ, T ), where
1 The alphabet Σ = C ∪ Σ2,
2 The set of states Q = {qi, i = 0, N + 1|q0 = PN
i=1Kivi, qi = δ1(qi−1, (i, Ii)),
i = 1, N , qN +1 = δ2(qN, M )},
3 The initial state q0,
4 The set of final states T = {qN +1},
5 The transition function δ : Q × Σ → Q, δ(qi−1, Ii) = qi, i = 1, N , δ(qN, M ) = qN +1.Remark 2.2 The set of states Q and the transition function δ given in Definition 2.11are completely determined based on the functions δ1, δ2 and it follows that the automatonA(I, M, K) is constructed accurately in Definition 2.11
Let an image block I ∈ I, a secret element M ∈ M, a key K ∈ K By using theautomaton A(I, M, K) and the flip graph G, two functions Em and Ex in the data hidingscheme (I, M, K, Em, Ex) are designed as follows
The function Em (embedding M in I):
Remark 2.3 Consider I0= Em(I, M, K), by (2.5), Em only changes colours of |q| pixels
in I based on the flip graph G, then I0∈ I So, Em designed holds Definition 2.1
The function Ex (extracting M from I0):
Trang 34Propostion 2.5 For all (I, M, K) ∈ I × M × K, Ex(Em(I, M, K), K) = M
Proof Set M0= Ex(I0, K) By Definitions 2.9 and 2.11, M0=PN
i=1(V al(Ii0) + Ki)vi (2.9).After implementing (2.3) q = PN
i=1(V al(Ii) + Ki)vi By Definitions 2.10 and 2.11, afterimplementing (2.4) we consider two cases of q:
If q = ∅, then (2.5) is not implemented and hence I is not changed Thus I0 ≡ I andtherefore M0= M
Theorem 2.2 Suppose that a k-Generators S for the vector space GFn(pm) is found and
a flip graph G is built Then there exists the data hiding scheme (k, N, blog2pmnc), where
N = |S|
Proof For the assumption that a k-Generators S for GFn(pm), |S| = N is found and a flipgraph G is built, we offer the way to construct the data hiding scheme (I, M, K, Em, Ex)based on the Galois field GF (pm) by using the flip graph G and the automaton A(I, M, K)
Em changes colours of at most k pixels I to embed M in I for all I ∈ I, M ∈ M byDefinition 2.10 and Statement (2.5)
Consider B to be the set of all secret data of length r bits, then |B| = 2r |M| = pmn by
M = GFn(pm) Suppose that we construct an injective function f, f : B → M Then the
Em is used to embed b ∈ B in I as follows
I0= Em(I, M, K);
Since f is injective by our supposition, after extracting M from I0 by Ex, the secret data
b will be determined accurately based on f
Since B and M are finite sets, thus to exist the injective function f , we let |B| ≤ |M|, itmeans 2r ≤ pmn, then r ≤ log2pmn, choose r = blog2pmnc So, for r = blog2pmnc, the r
Trang 35bits of the secret data b can be embedded in I By Definition 2.2, the data hiding scheme(I, M, K, Em, Ex) is a data hiding scheme (k, N, blog2pmnc) So, the data hiding scheme(k, N, blog2pmnc) exists.
Security analysis of the data hiding scheme proposed (k, N, blog2pmnc): Assume thatparameters k, N , Em, Ex, the vector space GFn(pm) and the flip graph G in the datahiding scheme (k, N, blog2pmnc) are published The secret element M is extracted from I0
by the extracting function Ex as follows
M = Ex(I0, K),from Definitions 2.9 and 2.11 and by (2.9):
of elements in S also affects the formula (2.11)) The number of choices for the key K
is pmN because K ∈ K Consider GF to be an arbitrary subset of 2blog2 pmnc elements ofthe set GFn(pm), B to be the set of all secret data of length blog2pmnc bits, it means
B = {0, 1, , 2blog2 p mn c− 1} in the decimal system Then there exists a bijective function
f, f : B → GF By (2.10), to decrypt the secret element M to the secret data b, we need toknow f The number of choices for the bijective function f is Cp2mnblog2 pmnc2blog2 p mn c! Thenfor a brute force attack, an attacker has to try every possible combination of S, K and f
in the given data hiding scheme The number of combinations of S, K and f is
c(pm− 1)NN !pmNCp2mnblog2 pmnc2blog2 p mn c! (2.12)Theorem 2.3 Suppose that a flip graph G is built Then there exists the optimal datahiding scheme (1,ppmnm −1−1, blog2pmnc) for qcolour = pm− 1
Proof Set S0 = [GFn(pm)]\{0}, then S0 is a 1-[Generators] for [GFn(pm)] by Definition2.6 Consider [v] ∈ S0, then S0\{[v]} is not a 1-[Generators] for [GFn(pm)] because[v] /∈ {[av0]|a ∈ GF (pm), [v0] ∈ S0\{[v]}} by Proposition 2.1) Therefore
S0 is a unique 1-[Generators] for [GFn(pm)]\{0}, (2.13)and |S0| = ppmnm −1−1 by Proposition 2.2 By Theorem 2.1, there exists 1-Generators S for
GFn(pm), |S| = |S0| = ppmnm −1−1 By (2.13) and Theorem 2.1, there does not exist another1-Generators S0 for GFn(pm), |S0| < |S|, then
S is a 1-Generators for GFn(pm) with the smallest number of elements (2.14)
By Assumption (2.1), qcolour = pm− 1 and for k = 1, N = |S| = ppmnm −1−1, we obtain
Trang 36Propostion 2.6 For n is a positive integer, there exists the optimal data hiding scheme(1, 2n− 1, n) for binary, gray and palette images with qcolour = 1.
Proof For qcolour = 1, from (2.1), therefore p = 2, m = 1 If we build a flip graph G, thenthere exists the optima data hiding scheme (1, 2n− 1, n) with qcolour = 1 by Theorem 2.3.The Galois field GF (pm), GF (pm) = GF (2) is the same as the field Z2 (see [88]) Next,
we show ways to build flip graphs G = (V, E) on the field Z2 for binary, gray and paletteimages as follows
For the binary image, then C = {0, 1}, cp∈ C, cp is a colour value of a pixel
• V = C and for all v ∈ V , the vertex v is assigned a weight by a functionVal suchthatVal(v) = v;
• E = {(cp, cp0)|cp, cp0 ∈ V, cp 6= cp0} and every arc (cp, cp0) has the same weight 1.For the gray image, then C = {0, 1, , 255}, cp ∈ C, cp is a colour value of a pixel
• V = C and for all v ∈ V , the vertex v is assigned a weight by a functionVal suchthatVal(v) = v mod 2;
• E = {(255, 254), (cp, cp+ 1)|cp ∈ V, 1 ≤ cp ≤ 254} and every arc (cp, cp0) is assignedthe same weight 1
For the palette image, then C = {0, 1, , 2t − 1}, t is the number of bits to representcolour indexes, cp∈ C, cp is a colour index of a pixel The palette P = {p0, p1, , p2t −1},
pi ∈ P , pi is the colour corresponding to the colour index i, i = 0, 2t− 1 To unifynotations throughout this dissertation, here changes the name of the functionVal in theFOPA method, recalled in Section 1.2 of Chapter 1, to Valp and setVal(cp) = Valp(p),where the colour index cp∈ C corresponds to the colour p ∈ P
• Consider G to be the rho forest built by the algorithm for FOPA and assign the sameweight 1 to all arcs of G However, all colours of the rho forest are replaced with theircolour indexes
By Definition 2.8, it is not difficult to verify that the graphs G for binary, gray and paletteimages built as above are all flip graphs on the field Z2 So, there exists the optimal datahiding scheme (1, 2n− 1, n) for binary, gray and palette images with qcolour = 1
Notice that if we set N = 2n− 1, then the data hiding scheme (1, 2n − 1, n) becomesthe data hiding scheme (1, N, blog2(N + 1)c) Remember that for N is a positive integer,the data hiding scheme (1, N, blog2(N + 1)c) for binary image with qcolour = 1 is the datahiding scheme CTL [18] So, Proposition 2.6 shows that the data hiding scheme CTLreaches an optimal data hiding scheme for N = 2n− 1, where n is a positive integer.Theorem 2.4 Suppose that a 2-Generators S for the vector space GFn(pm) with
Trang 37Suppose the data hiding scheme (2, N, r) is optimal for qcolour = pm− 1, then
r = MSDR2(N ) = blog2(1+qcolourCN1+qcolour2 CN2)c =
log2(1+qcolourN +qcolour2 N (N − 1)
2 (1 − 2
r) = qcolour2
(qcolour− 2)2
r− 1)
Since qcolour ≥ 1, then qcolour2
2 > 0 To have (2.16), we let N hold
Trang 38Given an image F used as a carrier to embed a secret data sequence into, partition Finto disjoint image blocks of N pixels, F = {F1, F2, , Ft2} Let D = D1D2 Dt3 be
a secret data sequence embedded in the cover image F , where Di is secret data of lengthblog2pmnc bits, i = 1, t3 Since each blog2pmnc bits of secret data is only embedded in oneimage block of F , t3 ≤ t2
Let Jump be a bijective function used to determine the order of blocks in F in theprocess of hiding D in F , Jump : {1, 2, , t2} → {1, 2, , t2}
Consider GF to be an arbitrary subset of 2blog2 p mn c elements of the set GFn(pm), B to
be the set of all secret data of length blog2pmnc bits, it means B = {0, 1, , 2blog2 pmnc−1}
in the decimal system Then there exists a bijective function f, f : B → GF
In real applications, when apply the data hiding scheme (k, N, blog2pmnc) based onthe proposed approach to the process of hiding D in F , use the secret key set K,
K = {K1, K2, , Kt1} instead of one secret key The process of hiding D in F by usingthe data hiding scheme (k, N, blog2pmnc) consists of the embedding algorithm EmDF andthe extracting algorithm ExDF proposed as follows
The embedding algorithm EmDF (embedding a secret data sequence D in F ):
F0 = F ; // F’ is called a stego image
The extracting algorithm ExDF (extracting the secret data sequence D embedded from F0):
t = 1;
For i = 1 to t3 Do
{
M = Ex(FJump(i), Kt); // Use the automaton A(FJump(i), M, Kt) (2.24)
Di = f−1(M ); //f−1 is the inverse function of f (2.25)}
D = D1D2 Dt3;
Propostion 2.7 For a cover image F , a secret data sequence D, a bijective function Jump,
a bijective function f , a secret key set K and the data hiding scheme (k, N, blog2pmnc) based
on the proposed approach given as above Suppose the stego image block F0 is generatedafter D is embedded in F by the embedding algorithm EmDF Then the data sequence D0extracted from F0 by the extracting algorithm ExDF is exactly the secret data sequence D
Trang 39Proof By (2.21) and (2.23), EmDF in (2.22) and ExDF in (2.24) use the same secret key
Kt The bijective function Jump guarantees for all i, j ∈ {1, 2, , t3}, i 6= j,Jump(i) 6= Jump(j), it means that an arbitrary image block in F is only used at mostone time in the process of hiding By Proposition 2.5, M extracted by (2.24) is the same
as M embedded by (2.22) Then the bijective function f guarantees that Di encrypted
by (2.20) is the same as Di decrypted by (2.25), i ∈ {1, 2, , t3} Therefore we completethe proof
Security analysis of process of hiding D in F : Assume that parameters k, N , Em, Ex,the vector space GFn(pm) and the flip graph G in the data hiding scheme (k, N, blog2pmnc)are published The secret element M is extracted from FJump(i)0 by (2.24), we have
M = Ex(FJump(i)0 , Kt),from Definitions 2.9 and 2.11 and by (2.9), we obtain
of choices for the k-Generators S is c(pm− 1)NN ! The number of choices for the key set
K, two bijective functions Jump and f are pmt 1 N, t2! and Cp2mnblog2 pmnc2blog2 pmnc! (see thesecurity analysis of the data hiding scheme (k, N, blog2pmnc) as above), respectively.Then for a brute force attack, an attacker has to try every possible combination of S, K,Jump and f in the given process of hiding The number of combinations of S, K, Jumpand f is
According to the way of constructing the Galois field GF (pm) from the polynomial ring
Zp[x], where p is prime and m is a positive integer [88], here consider the case p = m = 2and use the irreducible polynomial g(x) = x2+ x + 1 in Z2[x] to construct the Galois field
GF (22) from the polynomial ring Z2[x], we obtain the Galois field GF (22) as follows
GF (22) = {0, 1, x, x + 1}
with two operations addition + and multiplication · are defined as in Z2[x], followed by areduction modulo g(x)
Trang 40Notice that the polynomial g(x) is irreducible in Z2[x] Indeed, if g(x) has factors beingdifferent from the constant, then the factors of g(x) are only polynomials of degree 1 andhence g(x) has roots in Z2, this can not happen because g(0) = g(1) = 1.
To save memory space, this section writes all polynomials of GF (22) by sequences oftheir coefficients and then denote the sequence of any polynomial’s coefficients by a binarystring and a decimal number as in Table 2.1
From Table 2.1, to be convenient for programming, hereafter, GF (22) can be considered
in decimal system by GF (22) = {0, 1, 2, 3} Then two operations in GF (22) are presented
by GF4(22) = {0, 1, · · · , 255} Thus two operations of vector addition + and scalarmultiplication · on GF4(22) are completely determined based on the operations on theGalois field GF (22) in Table 2.2
Table 2.2 Operations + and · on the Galois field GF (22)