Research on development of methods of graph theory and automat in steganography and searchable encryption

15 CHAPTER 2 DIGITAL IMAGE STEGANOGRAPHY BASED ON THE GALOIS FIELD USING GRAPH THEORY AND AUTOMATA.. 19 2.3.2 Digital Image Steganography Based on The Galois Field GF pm Using Graph Theo

Trang 1

MINISTRY OF EDUCATION AND TRAINING

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

||||||||||

Nguyen Huy Truong

RESEARCH ON DEVELOPMENT OF METHODS OF GRAPH THEORY AND AUTOMATA IN STEGANOGRAPHY AND

SEARCHABLE ENCRYPTION

DOCTORAL DISSERTATION IN MATHEMATICS AND

INFORMATICS

Hanoi - 2020

Trang 2

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

||||||||||

Nguyen Huy Truong

RESEARCH ON DEVELOPMENT OF METHODS OF GRAPH THEORY AND AUTOMATA IN STEGANOGRAPHY AND SEARCHABLE

ENCRYPTION Major: Mathematics and Informatics Major code: 9460117

DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS

SUPERVISORS:

1 Assoc Prof Dr Sc Phan Thi Ha Duong

2 Dr Vu Thanh Nam

Hanoi - 2020

Trang 3

Nguyen Huy TruongSupervisors

Trang 4

I am extremely grateful to Assoc Prof Dr Sc Phan Thi Ha Duong

I want to thank Dr Vu Thanh Nam

I would also like to extend my deepest gratitude to Late Assoc Prof Dr PhanTrung Huy

I would like to thank my co-workers from School of Applied Mathematics andInformatics, Hanoi University of Science and Technology for all their help

I also wish to thank members of Seminar on Mathematical Foundations forComputer Science at Institute of Mathematics, Vietnam Academy of Science andTechnology for their valuable comments and helpful advice

I give thanks to PhD students of Late Assoc Prof Dr Phan Trung Huy for sharingand exchanging information in steganography and searchable encryption

Finally, I must also thank my family for supporting all my work

Trang 5

Page

LISTOFSYMBOLS iii

LISTOFABBREVIATIONS iv

LISTOFFIGURES v

LISTOFTABLES vi

INTRODUCTION 1

CHAPTER1 PRELIMINARIES 4

1.1 Basic Structures 4

1.1.1 Strings 4

1.1.2 Graph 4

1.1.3 Deterministic Finite Automata 6

1.1.4 The Galois Field GF (pm) 7

1.2 Digital Image Steganography 8

1.3 Exact Pattern Matching 11

1.4 Longest Common Subsequence 12

1.5 Searchable Encryption 15

CHAPTER 2 DIGITAL IMAGE STEGANOGRAPHY BASED ON THE GALOIS FIELD USING GRAPH THEORY AND AUTOMATA 16

2.1 Introduction 16

2.2 The Digital Image Steganography Problem 18

2.3 A New Digital Image Steganography Approach 19

2.3.1 Mathematical Basis based on The Galois Field 19

2.3.2 Digital Image Steganography Based on The Galois Field GF (pm) Using Graph Theory and Automata 21

2.4 The Near Optimal and Optimal Data Hiding Schemes for Gray and Palette Images 29

2.5 Experimental Results 34

2.6 Conclusions 38

CHAPTER 3 AN AUTOMATA APPROACH TO EXACT PATTERN MATCHING 40

3.1 Introduction 40

3.2 The New Algorithm - The MRc Algorithm 42

3.3 Analysis of The MRc Algorithm 48

3.5 Conclusions 56

CHAPTER 4 AUTOMATA TECHNIQUE FOR THE LONGEST COMMON SUBSEQUENCE PROBLEM 57

4.1 Introduction 57

Trang 6

4.2 Mathematical Basis 58

4.3 Automata Models for Solving The LCS Problem 62

4.5 Conclusions 68

CHAPTER 5 CRYPTOGRAPHY BASED ON STEGANOGRAPHY AND AUTOMATA METHODS FOR SEARCHABLE ENCRYPTION 69

5.1 Introduction 69

5.2 A Novel Cryptosystem Based on The Data Hiding Scheme (2; 9; 8) 71

5.3 Automata Technique for Exact Pattern Matching on Encrypted Data 75

5.4 Automata Technique for Approximate Pattern Matching on Encrypted Data 77 5.5 Conclusions 79

CONCLUSION 81

LISTOFPUBLICATIONS 82

BIBLIOGRAPHY 83

ii

Trang 7

LIST OF SYMBOLS

An alphabetThe set of all strings on

The empty stringjSj The number of elements of a set S

juj The length of a string u

GF (pm) The Galois eld is constructed from the polynomial ring Zp[x],(GF n(pm); +; ) where p is prime and m is a positive integer

A vector space over the eld GF (pm)LCS(p; x) A longest common subsequence of p and x

lcs(p; x) The length of a LCS(p; x)

LeftID(u) The least element the leftmost location of u

Rmp(u) The last component of LeftID(u) in p

(I; M; K; Em; Ex) A data hiding scheme

I A set of all image blocks with the same size and image format

M A nite set of secret elements

Em An embedding function embeds a secret element in an image

block

Ex An extracting function extracts an embedded secret element

from an image block

q

colour The number of di erent ways to change the colour of each

pixel in an arbitrary image block

Adjacent(cp; a) An adjacent vertex of cp

c block A string of length c

Posp(z) The last position of appearance of z in p

Mp An automaton accepting the pattern p

Con g(p) The set of all the con gurations of p

Wp(u) The weight of u in p

Wp(C) The weight of C

WCon g(p) The set of the weights of all the con gurations of p

Wpi(a) The weight of a at the location i in p

Wmp(a) The heaviest weight of a in p

W (a) The weight of a in p

Trang 8

LIST OF ABBREVIATIONS

AOSO Average Optimal Shift Or

BNDM Backward Nondeterministic Dawg Matching

EBOM Extended Backward Oracle Matching

FOPA Fastest Optimal Parity Assignment

HCIH High Capacity of Information Hiding

MSDR Maximal Secret Data Ratio

PSNR Peak Signal to Noise Ratio

SAE Searchable Asymmetric Encryption

SSE Searchable Symmetric Encryption

TVSBS Thathoo Virmani Sai Balakrishnan Sekar

iv

Trang 9

LIST OF FIGURES

Figure 1.1 A simple graph 5

Figure 1.2 A spanning tree of the graph given in Figure 1.1 6

Figure 1.3 The transition diagram of A in Example 1.3 7

Figure 1.4 The basic diagram of digital image steganography 9

Figure 1.5 The degree of appearance of the pattern p 12

Figure 2.1 The nine commonly used 8-bit gray cover images sized 512 512 pixels 35 Figure 2.2 The nine commonly used 8-bit palette cover images sized 512 512 pixels 36

Figure 2.3 The binary cover image sized 2592 1456 pixels 36

Figure 3.1 Sliding window mechanism 41

Figure 3.2 The basic idea of the proposed approach 45

Figure 3.3 The transition diagram of the automaton Mp, p = abcba 47

Trang 10

LIST OF TABLES

Table 1.1 An adjacency list representation of the simple graph given in Figure 1.1 5

Table 1.2 The performing steps of the BF algorithm 11

Table 1.3 The dynamic programming matrix L 13

Table 2.1 Elements of the Galois eld GF (22) represented by binary strings and decimal numbers 30

Table 2.2 Operations + and on the Galois eld GF (22) 30

Table 2.3 The representation of E and the arc weights of G for the gray image 31 Table 2.4 The payload, ER and PSNR for the optimal data hiding scheme (1; 2n 1; n) for palette images with qcolour = 1 37

Table 2.5 The payload, ER and PSNR for the near optimal data hiding scheme (2; 9; 8) for gray images with qcolour = 3 37

Table 2.6 The payload, ER and PSNR for the near optimal data hiding scheme (2; 9; 8) for palette images with qcolour = 3 38

Table 2.7 The comparisons of embedding and extracting time between the chapter’s and Chang et al.’s approach for the same optimal data hiding scheme (1; N; blog2(N + 1)c), where N = 2n 1, for the binary image with qcolour = 1 Time is given in second unit 38

Table 3.1 The performing steps of the MR1 algorithm 47

Table 3.2 Experimental results on rand4 problem 52

Table 3.9 Experimental results on a genome sequence (with j j = 4) 55

Table 3.10 Experimental results on a protein sequence (with j j = 20) 56

Table 4.1 The Refp of p = bacdabcad 60

Table 4.2 The comparisons of the lcs(p; x) computation time for n = 50666 67

Table 4.3 The comparisons of the lcs(p; x) computation time for n = 102398 68

vi

Trang 11

In the modern life, when the use of computer and Internet is more and more essential,digital data (information) can be copied as well as accessed illegally As a result,information security becomes increasingly important There are two popular methods toprovide security, which are cryptography and data hiding [2, 5, 6, 20, 56, 62, 81].Cryptography is used to encrypt data in order to make the data unreadable by a third party[5] Data hiding is used to embed data in digital media Based on the purpose of theapplication, data hiding is generally divided into steganography that hides the existence ofdata to protect the embedded data and watermarking that protects the copyrightownership and authentication of the digital media carrying the embedded data

Steganography can be used as an alternative way to cryptography However,steganography will become weak if attackers detect existence of hidden data Henceintegrating cryptography with steganography is as a third choice for data security [2, 5,

6, 12, 19, 61, 62, 81, 86, 93]

With the rapid development of applications based on Internet infrastructure, cloud computing becomes one of the hottest topics in the information technology area Indeed, it is a computing system based on Internet that provides on-demand services from application and system software, storage to processing data For example, when cloud users use the storage service, they can upload information to the servers and then access it on the Internet online Meanwhile, enterprises can not spend big money on maintaining and owning a system consisting of hardware and software Although cloud computing brings many bene ts for individuals and organizations, cloud security is still an open problem when cloud providers can abuse their information and cloud users lose control of it Thus, guaranteeing privacy of tenants’ information without negating the bene ts of cloud computing seems necessary [28,

38, 40, 41, 60, 95, 102] In order to protect cloud users’ privacy, sensitive data need to be encoded before outsourcing them to servers Unfortunately, encryption makes the servers perform search on ciphertext much more di cult than on plaintext To solve this problem, many searchable encryption techniques have been presented since 2000 Searchable encryption does not only store users’ encrypted data securely but also allows information search over ciphertext [26, 28, 29, 38, 40, 60, 71, 85, 102].

Searchable encryption for exact pattern matching is a new class of searchableencryption techniques The solutions for this class have been presented based onalgorithms for [26] or approaches to [41, 89] exact pattern matching

As in retrieving information from plaintexts, the development of searchableencryption with approximate string matching capability is necessary, where the searchstring can be a keyword determined, encrypted and stored in cloud servers or anarbitrary pattern [28, 40, 71]

From the above problems, together with the high e ciency of techniques using graph and automata proposed by P T Huy et al for dealing with problems of exact pattern matching (2002), longest common subsequence (2002) and steganography (2011, 2012 and 2013), as well as potential applications of graph theory and automata approaches suggested by Late Assoc Prof Phan Trung Huy in steganography and searchable encryption, and under

Trang 12

the direction of supervisors, the dissertation title assigned is research on development

of methods of graph theory and automata in steganography and searchableencryption

The purpose of the dissertation is to research on the development of new andquality solutions using graph theory and automata, suggesting their applications in,and applying them to steganography and searchable encryption

Based on results published and suggestions presented by Late Assoc Prof PhanTrung Huy in steganography and searchable encryption, the dissertation will focus onfollowing four problems in these elds:

- Digital image steganography;

- Exact pattern matching;

- Longest common subsequence;

- Searchable encryption

The rst problem is stated newly in Chapter 2, the three remaining problems arerecalled and clari ed in Chapter 1 In addition, background related to these problems ispresented clearly and analysed very carefully in Chapters of the dissertation

For the rst three problems, the dissertation’s work is to nd new and e cient solutionsusing graph theory and automata Then they will be used and applied to solve the lastproblem

The dissertation has been completed with structure as follows Apart fromIntroduction at the beginning and Conclusion at the end of the dissertation, the maincontent of it is divided into ve chapters

Chapter 1 Preliminaries This chapter recalls basic knowledge indicated throughoutthe dissertation (strings, graph, deterministic nite automata, digital images, the basicmodel of digital image steganography, some parameters to determine the quality ofdigital image steganography, the exact pattern matching problem, the longest commonsubsequence problem, and searchable encryption), re-presents important conceptsand results used and researched on development in remaining chapters of thedissertation (adjacency list, breadth rst search, Galois eld, the fastest optimal parityassignment method, the module method and the concept of the maximal secret dataratio, the concept of the degree of fuzziness (appearance), the Knapsack Shakingapproach, and the de nition of a cryptosystem)

Chapter 2 Digital image steganography based on the Galois eld using graph theoryand automata Firstly, from some proposed concepts of optimal and near optimalsecret data hiding schemes, this chapter states the interest problem in digital imagesteganography Secondly, the chapter proposes a new approach based on the Galoiseld using graph theory and automata to design a general form of steganography inbinary, gray and palette images, shows su cient conditions for existence and provesexistence of some optimal and near optimal secret data hiding schemes, applies theproposed schemes to the process of hiding a nite sequence of secret data in an imageand gives security analyses Finally, the chapter presents experimental results to showthe e ciency of the proposed results

Chapter 3 An automata approach to exact pattern matching This chapter proposes

a exible approach using automata to design an e ective algorithm for exact patternmatching in practice In given cases of patterns and alphabets, the e ciency of theproposed algorithm is shown by theoretical analyses and experimental results

2

Trang 13

Chapter 4 Automata technique for the longest common subsequence problem.This chapter proposes two e cient sequential and parallel algorithms for computing thelength of a longest common subsequence of two strings in practice, using automatatechnique Theoretical analysis of parallel algorithm and experimental results con rmthat the use of the automata technique in designing algorithms for solving the longestcommon subsequence problem is the best choice.

Chapter 5 Cryptography based on steganography and automata methods forsearchable encryption This chapter rst proposes a novel cryptosystem based on a datahiding scheme proposed in Chapter 2 with high security Additionally, ciphertexts do notdepend on the input image size as existing hybrid techniques of cryptography andsteganography, encoding and embedding are done at once The chapter then appliesresults using automata technique of Chapters 3 and 4 to constructing two algorithms forexact and approximate pattern matching on secret data encrypted by the proposedcryptosystem These algorithms have O(n) time complexity in the worst case, togetherwith an assumption that the approximate algorithm uses d(1 )me processors, where ; mand n are the error of the string similarity measure proposed in this chapter and lengths ofthe pattern and secret data, respectively In searchable encryption, the cryptosystem can

be used to encode and decode secret data on users side and pattern matching algorithmscan be used to perform pattern search on cloud providers side

The contents of the dissertation are written based on the paper [T1] published in

2019, the paper [T4] accepted for publication in 2020 in KSII Transactions on Internetand Information Systems (ISI), and the papers [T2, T3] published in Journal ofComputer Science and Cybernetics in 2019 The main results of the dissertation havebeen presented at:

- Seminar on Mathematical Foundations for Computer Science at Institute of Mathematics, Vietnam Academy of Science and Technology,

- The 9th Vietnam Mathematical Congress, Nha Trang, August 14-18, 2018,

- Seminar at School of Applied Mathematics and Informatics, Hanoi University of Science and Technology

Trang 14

CHAPTER 1 PRELIMINARIES

This chapter will attempt to recall terminologies, concepts, algorithms and resultswhich are really needed in order to present the dissertation’s new results clearly andlogically, as well as help readers follow the content of the dissertation easily Thebackground knowledge re-presented here consists of basic structures (Section 1.1:strings (Subsection 1.1.1), graph (Subsection 1.1.2), deterministic nite automata(Subsection 1.1.3), and the Galois eld GF (pm) (Subsection 1.1.4)), digital imagesteganography (Section 1.2), exact pattern matching (Section 1.3), longest commonsubsequence (Section 1.4) and searchable encryption (Section 1.5)

x = x[1]x[2]::x[n]; x[i] 2 ; 1 i n;

where n is a positive integer

A special string is the empty string having no letters, denoted by The length of thestring x is the number of letters in it, denoted by jxj Then j j = 0

Notice that for the string x = x[1]x[2]::x[n], we can also write x = x[1::n] in short The set of all strings on the alphabet is denoted by The operator of strings is

concatenation that writes strings as a compound The concatenation of the two strings

u1 and u2 is denoted by u1u2

Let x be a string A string p is called a substring of the string x, if x = u1pu2 for somestrings u1 and u2 In case u1 = (resp u2 = ), the string p is called a pre x (resp su x) ofthe string x The pre x (resp su x) p is called proper if p 6= x Note that the pre x or the

su x can be empty

1.1.2 Graph

Besides some basic concepts in graph theory, this subsection recalls the way representing

a graph by adjacency lists and breadth rst search [82] These are used in Chapter 2.

A nite undirected graph (hereafter, called a graph for short) G = (V; E) consists of anonempty nite set of vertices V and a nite set of edges, where each edge has eitherone or two vertices associated with it A graph with weights assigned to their edges iscalled a weighted graph

4

Trang 15

An edge connecting a vertex to itselfSendis calledto a loop Multiple edges are edges connecting the same vertices A graph having no loops and no multiple edges is called a simple graph.

In a simple graph, the edge associated to an unordered pair of vertices fi; jg is called the

Two vertices i and j in a graph G are called adjacent if they are vertices of an edge of

A graph without multiple edges can be described by using adjacency lists, which

specify adjacent vertices of any vertex of the graph

Example 1.1 Using adjacency lists, the simple graph given in Figure 1.1 can be

Breadth First Search:

Input: A connected simple graph G with vertices ordered as i1; i2; : : : ; in

Output: A spanning tree T

Set T to be a tree consisting only i1;

Set L to be an empty list;

Trang 16

Channel

Send to

Secret Data

For each adjacent vertex j of i

If (j is not in L and T ) Cover

ImagefAdd j to the end of L;

Add j and the edge fi; jg to T ;

gReturn T ;

Figure 1.2 A spanning tree of the graph given in Figure 1.1

A graph with directed edges (or arcs) is called a directed graph Each arc isassociated with the ordered pair of vertices In a simple directed graph, the arcassociated with the ordered pair (i; j) called the arc (i; j) And the vertex i is said to beadjacent to the vertex j and the vertex j is said to be adjacent from the vertex i

1.1.3 Deterministic Finite AutomataStudy on the problem of the construction and the use of deterministic nite automata

is one of objectives of the dissertation Hence, this subsection will clarify this model ofcomputation [44, 82]

De nition 1.1 ([44]) Let be an alphabet A deterministic nite automaton (hereafter,called an automaton for short) A = ( ; Q; q0; ; F ) over consists of:

A nite set Q of elements called states, An initial state q0, one of the states in Q,

A set F of nal states The set F is a subset of Q,

A state transition function (or simply, transition function), denoted by , that takes

as arguments a state and a letter, and returns a state, so that : Q ! Q,The transition function can be extended so that it takes a state and a string, andreturns a state Formally, this extended transition function can be de ned recursively by

:Q!Qsuch that for all q 2 Q; s 2 ; a 2 ; (q; as) = ( (q; a); s) and (q; ) = q

6

Trang 17

An alternative and simple way presenting an automaton is to use the notation \

transition diagram" A transition diagram of an automaton A = ( ; Q; q0; ; F ) is a

directed graph given as follows [44]

a) Each state of Q is a vertex

b) Let q0 = (q; a), where q is a state of Q and a is a letter of Then the transition

diagram has an arc (q0; q) labeled a If there are several letters that cause transitions

from q0 to q, then the arc (q0; q) is labeled by a list of these letters

c) There is an arrow into the initial state q0 This arrow does not originate at any

vertex

d) States not in F have a single circle Vertices corresponding to nal states are

marked by a double circle

Example 1.3 Consider an automaton A = ( ; Q; q0; ; F ) over = fa; bg, where

Q = fq0; q1; q2g, F = fq2g, and is given by the following table Then the transition

diagram of A is shown in Figure 1.3

Figure 1.3 The transition diagram of A in Example 1.3

([82]) A string p is said

A = ( ; Q; q0; ; F Secret Data to a nal state, it means thatSecret(qData0;p) is

) if it takes the initial state q 0

This subsection describes how to construct a nite eld with p elements, called the

GF pm p Image 1 Image algebraic structure

Galois eld ( ), where is prime andm is an integer [88] The

Send to

will be used in Chapter 2

Let p be a prime number De ne Zp[x] to be the set of all polynomials with the variable

x, whose coe cients belong to the eld Z p Addition and multiplication in Z p [x] are de ned

Secret Key Secret Key in the usual way and then reduce the coe cients modulo p at the end.

For f(x) 2 Zp[x], the degree of f(x), denoted by deg(f), is the largest exponent of x in f(x)

A polynomialSenderf(x) 2 Zp[x] is called to be irreducible if there doesReceivernotexist

CoverImage

Trang 18

polynomials f1(x); f2(x) 2 Zp[x] such that

f(x) = f1(x)f2(x);

where deg(f1) > 0 and deg(f2) > 0 = m1.

Let f(x) 2 Zp[x] be an irreducible polynomial with deg(f) De ne

Zp[x]=(f(x)) to be the set of pm polynomials of degree at most m 1 in Zp[x] Additionand multiplication in Zp[x]=(f(x)) are given as in Zp[x], followed by a reduction modulof(x) Then Zp[x]=(f(x)) with these operations is a eld having pm elements, called theGalois eld GF (pm) Note that for p is prime and m 1, the Galois eld GF (pm) is unique.1.2 Digital Image Steganography

The interest problem in Chapter 2 is digital image steganography This section willrecall the concept of digital images, the basic model of digital image steganography, someparameters to determine the e ciency of digital image steganography and lastly re-presentresults researched on development and used in Chapter 2 such as the fastest optimalparity assignment (FOPA) method, the module method and the concept of the maximalsecret data ratio (MSDR) [18, 20, 21, 39, 49, 50, 51, 53, 61, 63, 65, 76, 78, 104]

A digital image is a matrix of pixels Each pixel is represented by a non negativeinteger number in the form of a string of binary bits This value indicates the colour ofthe pixel [39]

Note that based on the way representing of colours of pixels, digital images can bedivided into following di erent types [78]

1 Binary image: Each pixel is represented by one bit In this image type, the colour

of a pixel is white, \1" value, or black, \0" value

2 Gray image: Each pixel is typically represented by eight bits (called 8-bit grayimage) Then the colour of any pixel is a shade of gray, from black corresponding tocolour value \0" to white corresponding to colour value \255"

3 Red green blue image: Each pixel is usually represented by a string of 24 bits(called 24-bit RGB image), where the rst 8 bits, the next 8 bits and the last 8 bitscorresponds to shades of red, green and blue, specifying the red, green and bluecolour components of the pixel, respectively Then the colour of the pixel is acombination of these three components

4 Palette image: The colour of each pixel is not shown directly by the numberrepresenting the pixel as for RGB images Instead, this number is a colour index of thecolour of the pixel existed in the colour table (the palette), an ordered set of values(strings of 24 bits) which represent all colours as in RGB images used in the imageand contained in the le with the image The size of the palette is the same as thelength of a bit string representing a pixel and is limited by 8 bits For a string of 8 bits,call palette images 8-bit palette images

The objective of digital image steganography is to protect data by hiding the data in a digital image well enough so that unauthorized users will not even be aware of their existence [21, 18] Figure 1.4 shows the basic model of digital image steganography, where the cover image is a digital image used as a carrier to embed secret data into, the stego image is digital image obtained after embedding secret data into the cover image by the

8

Trang 19

function block Embed with the secret key on the Sender side For steganography generally,

a Payload Corresponding to a certain Payload, to measure the embedding capacity of the

cover image, the embedding rate (ER) is used and de ned as follows [104]

StegoImage

CommunicationChannelSend to

StegoImage

Secret Data

Extract

Secret KeyReceiver

CoverImage

Figure 1.4 The basic diagram of digital image steganography The peak signal to noise ratio (PSNR) is used to evaluate quality of stego image Based on

the value of PSNR, we can know the degree of similarity between the cover image and stego

image If the PSNR value is high, then quality of stego image is high Conversely,

quality of stego image is low In general, for the digital image, PSNR is de ned by the

where B(i; j); G(i; j); R(i; j); B0(i; j); G0(i; j) and R0(i; j) are the colour value of the Blue,

Green and Red components of a pixel at position (i; j) in the cover and stego image,

respectively For human’s eyes, the threshold value of PSNR value is 30dB [20, 53,

65, 104], it means that the PSNR value is higher than 30dB, it is hard to distinguish

between the cover image and its stego image

Let G be a palette image and P = fc1; c2; : : : ; cn g be its palette, where ci is the

colour of a pixel of G corresponding to the colour index i Each colour c in P is

considered as a vector consisting of red, green and blue components Suppose d is a

distance function on P The FOPA method [50] tries to get functions Next, Next: P !

P , and Val, Val: P ! Z2, where two conditions are satis ed for all c 2 P as follows

2

4

Trang 20

1 d(c; Next(c)) = minv6=c2P d(c; v),

2 Val(c) =Val(Next(c)) + 1 on the eld Z 2

Call GP = (VP ; EP ) a weighted complete undirected graph of the palette image G,where VP = P and the weight of the edge fc; c0g is d(c; c0) The function Nearest,Nearest: P ! P , is given by Nearest(c) = c0 holding d(c; c0) = minv6=c2P d(c; v) A rhoforest F = (V; E) is a directed graph with vertices weighted by the functionVal, where V

= VP , E is a set of all arcs (v; Next(v)), the vertex v has the weightVal(v) for all v 2 V The construction of a algorithm determining F is the essence of the FOPA method.Algorithm for FOPA:

Input: A weighted complete undirected graph GP , the function Nearest

Output: A rho forest F = (V; E)

Choose a vertext c 2 P , set V = fcg, and set C = P

nfcg; SetVal(c) = 0; // Or 1 randomly While (C is not

empty) // Update F f

a) Take one element v 2 C;

b) Initialize v 0 = v, setVal(v 0 ) = 0 (or 1 randomly), by a nite loop, nd a longest sequence of k + 1 di erent elements in P consecutively, v 0 ; v 1 ; : : : ; v k , such that

Nearest(vi) = vi+1 for i = 0; k 1; vi 2 C; vk 2 C or vk 2 V , and set

Next(vi) = vi+1; i = 0; k 1;

b1) Case vk 2 C: SetVal(vi) = 1+Val(vi 1); i = 1; k and Next(vk) = vk 1;

Set V = V [ fv0; v1; : : : ; vkg and C = Cnfv0; v1; : : : ; vkg;b2) Case vk 2 F : SetVal(vi) = 1+Val(vi+1); i = k 1; : : : ; 1; 0;

Set V = V [ fv0; v1; : : : ; vk 1g and C = Cnfv0; v1; : : : ; vk 1g;g

Return F ;

End

De nition 1.3 ([51]) Let M be a module over the ring Zm, k > 0 be a natural number,and U be a subset of Mnf0g Call U a k-base of M if for any v in Mnf0g, there exist telements v1; v2; : : : ; vt 2 U; t k, together with a1; a2; : : : ; at 2 Zm such that v = v1a1 +

v2a2 + :: + vtat

Let G be a digital image, call CG the set of all colours of pixels in G Consider thecase m = 2 and G is a binary image Then CG = f0; 1g, and for n is a positive integer,the set M = Z2n = f(x1; x2; : : : ; xn)jxi 2 Z2; i = 1; ng with element addition and scalarmultiplication de ned as usual is a module over the ring Z2 [49] For k = 1, the set

U = Mnf0g is an unique 1-base of M [51] Two functions Next, Next: CG ! CG, and Val,Val: CG ! Z2, satisfying the condition Val(c) =Val(Next(c)) + 1 on the ring Z2, are dened in [49] Suppose that for N jUj, I = fI1; I2; : : : ; IN g is an arbitrary image block

of G, K = fK1; K2; : : : ; KN jKi 2 Z2; i = 1; Ng is a secret key, d is any element in M, and

h is a surjective function from I to U In the module method, d is considered as a secretdata, embedded in and extracted from the image block I with the key K by the blocksEmbed and Extract as follows [49, 51]

10

Trang 21

The block Embed (embedding d in I):

De nition 1.4 ([49]) MSDRk(N) is the largest number of embedded bits of secret data

in an image block of N pixels by changing colours of at most k pixels in the imageblock, where k; N are positive integers

Given a positive integer qcolour, call qcolour the number of di erent ways to changethe colour of each pixel in an arbitrary image block of N pixels According to [49]

MSDRk(N) = blog2(1 + qcolourCN1 + qcolour2CN2 + + qcolourkCNk )c: (1.3)1.3 Exact Pattern Matching

This section will restate the exact pattern matching problem, and recall the concept

of the degree of fuzziness (appearance) used in Chapter 3 [24, 52, 68]

Let x be a string of length n Denote the substring x[i]x[i + 1]::x[j] of x by x[i::j] for all

1 i j n, the ith element of x by x[i] and i is called a position in x Let p be a substring oflength m of x, where m is a positive integer, then there exists i for 1 i n m + 1 such that

p = x[i::i + m 1] And say that i is an occurrence of p in x or p occurs in x at position i

De nition 1.5 ([68]) Let p be a pattern of length m and x be a text of length n over thealphabet Then the exact pattern matching problem is to nd all occurrences of thepattern p in x

The following example uses the Brute Force (BF) algorithm [24] to demonstrate themost original way solving this problem

Table 1.2 The performing steps of the BF algorithm

Trang 22

Example 1.4 Given a pattern p = fah and a text x = dfahfkfaha Then there are twooccurrences of p in x as shown below: dfahfkfaha The BF algorithm is performed bythe following steps presented in Table 1.2, the bold letters correspond to themismatches, the underlined letters represent the matches when comparing the letters

of the pattern and the text We know that many letters scanned will be scanned again

by the BF algorithm because each time either a mismatch or a match occurs, thepattern is only moved to the right one position

Chapter 3 uses the degree of fuzziness in [52] to determine the longest pre x of thepattern in the text at any position However, this terminology can lead to severalmisunderstandings for the readers So throughout this dissertation, the degree offuzziness will be replaced with the degree of appearance The concept of the degree

of appearance is restated as follows

De nition 1.6 ([52]) Let p be a pattern and x be a text of length n over the alphabet Then for each 1 i n, a degree of appearance of p in x at position i is equal to the length

of a longest substring of x such that this substring is a pre x of p, where the right endletter of the substring is x[i]

Notice that obviously, if the degree of appearance of p in x at an arbitrary position iequals jpj, then a match for p in x occurs at position i j pj + 1 Figure 1.3 illustrates theconcept of the degree of appearance of the pattern p in x

The degree of appearance of p in x at the position being scanned is equal to 4

Figure 1.5 The degree of appearance of the pattern p

1.4 Longest Common Subsequence

This section will recall the longest common subsequence (LCS) problem, and theKnapsack Shaking approach addressing the problem studied on development inChapter 4 [24, 47, 94, 101]

De nition 1.7 ([101]) Let p be a string of length m and u be a string over the alphabet Then u is a subsequence of p if there exists a integer sequence j1; j2; : : : ; jt such that

(i) u is a common subsequence of p and x,

(ii) There does not exist any common subsequence v of p and x such that jvj > juj

12

Trang 23

Denote an arbitrary longest common subsequence of p and x by LCS(p; x) Thelength of a LCS(p; x) is denoted by lcs(p; x).

By convention, if two strings p and x does not have any longest commonsubsequences, then the lcs(p; x) is considered to equal 0

Example 1.5 Let p = bgcadb and x = abhcbad Then string bcad is a LCS(p; x) andlcs(p; x) = 4

Let p and x be two strings of lengths m and n over the alphabet ; m n The longestcommon subsequence problem for two strings (LCS problem) can be stated in twofollowing forms [24, 47]

Problem 1 Find a longest common subsequence of p and x

Problem 2 Compute the length of a longest common subsequence of p and x.The simple way to solve the LCS problem is to use the algorithm introduced by Wagnerand Fischer in 1974 (called the Algorithm WF) This algorithm de nes a dynamicprogramming matrix L(m; n) recursively to nd a LCS(p; x) and compute the lcs(p; x) as

where L(i; j) is the lcs(p[1::i]; x[1::j]) for 1 i m, 1 j n

Example 1.6 Let p = bgcadb and x = abhcbad Use the Algorithm WF, the L(m; n) isobtained below Then lcs(p; x) = L(6; 7) = 4 In Table 1.3, by traceback procedure,starting from value 4 back to value 1, a LCS(p; x) found is a string bcad

Table 1.3 The dynamic programming matrix L

De nition 1.10 ([47]) Let u = p[j1]p[j2] : : : p[jt] be a subsequence of p Then an element

of the form (j1; j2; : : : ; jt) is called a location of u in p

From De nition 1.10, the subsequence u has at least a location in p If all the dierent locations of u are arranged in the dictionary order, then call the least element theleftmost location of u, denoted by LeftID(u) Denote the last component of LeftID(u) by

Rmp(u) [47]

Trang 24

Example 1.7 Let p = aabcadabcd and u = abd Then u is a subsequence of p and has seven di erent locations in p, in the dictionary order they are

(1; 3; 6); (1; 3; 10); (1; 8; 10); (2; 3; 6); (2; 3; 10); (5; 8; 10); (7; 8; 10):

It follows that LeftID(u) = (1; 3; 6) and Rmp(u) = 6

De nition 1.11 ([47]) Let p be a string of length m Then a con guration C of p is de ned as follows

1 Or C is the empty set Then C is called the empty con guration of p, denoted by

C0

2 Or C = fx1; x2; : : : ; xtg is an ordered set of t subsequences of p for 1 t m such that the two following conditions are satis ed

(i) For all 1 i t, jxij = i,

(ii) For all xi; xj 2 C, if jxij > jxjj then Rmp(xi) >Rmp(xj)

Set of all the con gurations of p is denoted by Con g(p)

De nition 1.12 ([47]) Let p be a string of length m on the alphabet , C 2 Con g(p) and a

2 Then a state transition function ’ on Con g(p) such that

’ : Con g(p)! Con g(p) de ned as follows

1 ’(C; a) = C if a 2= p

2 ’(C 0 ; a) = fag if a 2 p.

3 Set C0 = ’(C; a) Suppose a 2 p and C = fx1; x2; : : : ; xtg for 1 t m Then C0 isdetermined by a loop using the loop control variable i whose value is changed from t down to 0:

a) For i = t, if the letter a appears at a location index in p such that index is greater than Rmp(xt), then xt+1 = xta;

b) Loop from i = t 1 down to 1, if the letter a appears at a location index in p such that index 2 (Rmp(xi); Rmp(xi+1)), then xi+1 = xia;

c) For i = 0, if the letter a appears at a location index in p such that index is smaller than Rmp(x1), then x1 = a;

d) C0=C

4 To accept an input string, the state transition function ’ is extended as

follows ’ : Con g(p) ! Con g(p)such that for all C 2 Con g(p); s 2 ; a 2 ; ’(C; as) = ’(’(C; a); s) and ’(C; ) = C

Example 1.8 Let p = bacdabcad and C = fc; ad; babg Then C is a con guration of pand C0 = ’(C; a) = fa; ad; ada; babag

In 2002, P T Huy et al introduced a method to solve the Problem 1 by using theautomaton given as in the following theorem In this way, they named their method theKnapsack Shaking approach [47]

Theorem 1.1 ([47]) Let p and x be two strings of lengths m and n over the alphabet ;

m n Let Ap = ( ; Q; q0; ’; F ) corresponding to p be an automaton over the alphabet ,where

The set of states Q = Con g(p),

14

Trang 25

The initial state q0 = C0,

The transition function ’ is given as in De nition 1.12,

The set of nal states F = fCng, where Cn = ’(q0; x)

Suppose Cn = fx1; x2; : : : ; xtg for 1 t m Then

1 For every subsequence u of p and x, there exists xi 2 Cn; 1 i t such that the two following conditions are satis ed

(i) juj = jxij,

(ii) Rm p (x i ) Rm p (u) 2 A

LCS(p; x) equals x t

1.5 Searchable Encryption

This section clari es the term of searchable encryption (SE) and recalls the de nition of

a cryptosystem They will be studied and used in Chapter 5 [26, 40, 60, 85, 88, 102].Consider a problem to occur in cloud security as follows [60, 85, 102] Cloud tenants,for example enterprises and individuals with limited resource including software andhardware, store data with sensitive information on cloud servers Assume that theseservers cannot be fully trusted This means they may not only be curious about the users’information but also abuse the data received Then users wish to encrypt their data beforeuploading them to servers Because of limitations of cloud users’ information technologysystem, users also wish that cloud providers can help them perform information searchdirectly on ciphertexts However, encryption brings di culties for servers to do search onthe encrypted data These lead to a problem that is to nd a solution to satisfy the twowishes of cloud users when they choose cloud storage service

SE is a way to solve the above problem It is indeed a system consisting of two maincomponents, a cryptosystem is used to encode and decode on cloud users side andalgorithms for searching on encrypted data are done on cloud providers side [40, 102]

In cryptography, SE can be either searchable symmetric encryption (SSE) orsearchable asymmetric encryption (SAE) In SSE, only private key holders can createencrypted data and produce trapdoors for search In SAE, users who have the public keycan make ciphertexts but only private key holders can generate trapdoors [26, 102]

Since the dissertation proposes a new symmetric encryption system for SSE inChapter 5, the correctness of this system needs to prove In this dissertation, thecomponents and properties of a cryptosystem de ned in [88] will be considered as astandard form to verify Here recalls this de nition

De nition 1.13 ([88]) A cryptosystem is a ve-tuple (P; C; K; E; D) such that thefollowing properties are satis ed

1 P is a nite set of plaintexts,

2 C is a nite set of ciphertexts,

3 K is a nite set of secret keys,

4 For every k 2 K, there exists an encrypting function ek 2 E and a correspondingdecrypting function dk 2 D, where ek : P ! C and dk : C ! P holds dk(ek(x)) = x for each

x 2 P

Trang 26

CHAPTER 2 DIGITAL IMAGE STEGANOGRAPHY BASED ON THE GALOIS FIELD USING GRAPH THEORY AND AUTOMATA

This chapter rst proposes concepts of optimal and near optimal secret data hidingschemes The chapter then proposes a new digital image steganography approachbased on the Galois eld GF (pm) using graph and automata to design the data hidingscheme of the general form (k; N; blog2 pmnc) for binary, gray and palette images withthe given assumptions, where k; m; n; N are positive integers and p is prime, shows sucient conditions for existence and proves existence of some optimal and near optimalsecret data hiding schemes These results are derived from the concept of themaximal secret data ratio of embedded bits, the module method and the FOPAmethod proposed by P T Huy et al in 2011, 2012 and 2013, recalled in Section 1.2 ofChapter 1 An application of the schemes to the process of hiding a nite sequence ofsecret data in an image is also considered Security analyses and experimental resultscon rm that the proposed approach can create steganographic schemes whichachieve high e ciency in embedding capacity, visual quality, speed as well as security,which are key properties of steganography

The results of Chapter 2 have been published in [T1]

in the image [17, 57, 62, 76, 100] The chapter’s work focuses on steganography indigital images in spatial domain

Digital image steganography studies the steganographic schemes, where each schemeconsists of an embedding function and extracting function The embedding function showshow to embed secret data in the digital image and the extraction function describes how toextract the data from the digital image carrying the embedded data [46, 87]

In digital image steganography, a few main factors must be taken in consideration when we design a new secret data hiding scheme, which are embedding capacity of the cover image, quality of stego image and security However, as well known, embedding capacity of the cover image and quality of its stego image are irreconcilable con ict A balance achieved of the two factors can be done according to di erent application requirements In addition to the three main factors, speed of the embedding and extracting functions also

16

Trang 27

plays an important role in steganographic schemes It is considered as a lastconstraint to determine e ciency of schemes [46, 53, 65, 69, 87, 104].

The simplest and most popular spatial domain image steganography method is theleast signi cant bit (LSB) substitution (called LSB based method) For 24-bit RGB and 8-bitgray images, in this method the data is embedded in the cover image by changing theleast signi cant bits of the image directly, therefore it becomes vulnerable to securityattacks [18, 62, 72, 75, 76, 97, 104] EZ Stego method for palette images is similar to thecommonly used LSB based method However, this method does not guarantee quality ofstego images [36, 37, 97] To alleviate this problem, in 1999, Fridrich proposed a newmethod based on the parity bits of colour indexes of pixels in palette cover images, calledthe parity assignment (PA) method Then EZ Stego method can be considered as anexample of PA method [36, 50] In 2000, Fridrich et al improved the method byinvestigating the problem of optimal parity assignment for the palette and this version iscalled the optimal parity assignment (OPA) method [37] To easily control quality of stegoimages, Huy et al introduced another OPA method, called the FOPA method, in 2013[50] Unlike the colour and gray images, each pixel in binary images only requires one bit

to represent colour values (black and white), therefore, modifying pixels can be easilydetected So, binary image steganography is a more di cult and challenging problem Forbinary images, block based method is usually used to maintain quality of stego images Inthis method, the cover and stego images are partitioned into individual image blocks of thesame size, embedding and extracting secret data are based on the characteristic valuescalculated for the blocks WL (Wu et al., 1998), PCT (Pan et al., 2000), modi ed PCT(Tseng et al., 2001), CTL (Chang et al., 2005) schemes are all well known and blockbased for binary images [21, 18, 48, 75, 92]

Given a qcolour which is the number of di erent ways to change the colour of eachpixel in an arbitrary image block, and use the concept of the maximal secret data ratio

of embedded bits proposed by Huy et al in 2011 [49], the chapter introduces concepts

of optimal and near optimal secret data hiding schemes Actually, the optimality ofsteganographic schemes has been considered in [37, 46] However, the authors usedthe time complexity of embedding and extracting functions, or the concept of optimalparity assignment that minimizes the energy of the parity assignment for the colourpalette to determine whether a steganographic scheme is optimal

By the block based method, call a secret data hiding scheme a data hiding scheme(k; N; r), where k; N; r are positive integers, if the embedding function can embed r bits

of secret data in each image block of N pixels by changing colours of at most k pixels

in the image block The chapter’s work is concerned with the problem of designingoptimal or near optimal data hiding schemes (k; N; r) for digital images (binary, grayand palette images)

Based on the module approach and the (FOPA) method using graph theory proposed

by Huy et al in 2011 and 2013 [49, 50], the chapter proposes a new approach based onthe Galois eld using graph and automata in order to solve the problem By this approach,the chapter proposes schemes consisting of the optimal data hiding scheme (1; 2n 1; n)for binary, gray and palette images with qcolour = 1, where n is a positive integer, the nearoptimal data hiding scheme (2; 9; 8) and the optimal data hiding scheme (1; 5; 4) for grayand palette images with qcolour = 3 Security analyses show that an application of theseschemes to the process of hiding a nite sequence of secret data in an image can avoid

Trang 28

detection from brute-force attacks.

The experimental results reveal that the e ciency in embedding capacity and visualquality of the near optimal data hiding scheme (2; 9; 8) for gray images with qcolour = 3 isindeed better than the e ciency of the HCIH scheme [104] The embedding and extractingtime of the proposed approach are faster than that of the Chang et al.’s approach [18] Forthe near optimal data hiding scheme (2; 9; 8) for palette images with qcolour = 3 and theoptimal data hiding scheme (1; 2n 1; n) for palette images with qcolour = 1, values of ERcan be selected suitably to achieve acceptable quality of the stego images

The rest of the chapter is organized as follows Section 2.2 gives some new conceptsand states the chapter’s digital image steganography problem Section 2.3 consists of twoSubsections 2.3.1 and 2.3.2 Subsection 2.3.1 introduces mathematical basis based onthe Galois eld GF (pm) for the digital image steganography problem, where p is prime and

m is a positive integer Subsection 2.3.2 rstly proposes a digital image steganographyapproach based on the Galois eld GF (pm) using graph and automata to design the datahiding scheme of the general form (k; N; blog2 pmnc) for the given assumptions, where k;m; n; N are positive integers and p is prime Secondly, the subsection gives su cient

subsection shows that there exists the optimal data hiding scheme (1; 2n 1; n) for binary,

gray and palette images with qcolour = 1, where n is a positive integer At the end ofSubsection 2.3.2, the way applying the data hiding scheme (k; N; blog2 pmnc) to theprocess of hiding a nite sequence of secret data of length blog2 pmnc bits in an image isconsidered Subsection 2.4 proves that there exist the near optimal data hiding scheme(2; 9; 8) and the optimal data hiding scheme (1; 5; 4) for gray and palette images with

qcolour = 3 Section 2.5 shows experimental results in order to evaluate the e ciency of theproposed data hiding schemes and approach Lastly, some conclusions are drawn fromthe proposed approach and experimental results in Section 2.6

2.2 The Digital Image Steganography Problem

This section gives some new concepts and states the chapter’s digital imagesteganography problem

De nition 2.1 A block based secure data hiding scheme in digital images (for short, called

a data hiding scheme) is a ve-tuple (I; M; K; Em; Ex), where the following conditions are satis ed

1 I is a set of all image blocks with the same size and image type,

2 M is a nite set of secret elements,

3 K is a nite set of secret keys,

4 Em is an embedding function to embed a secret element in an image block,

Trang 29

De nition 2.2 A data hiding scheme (I; M; K; Em; Ex) is called a data hiding scheme(k; N; r), where k; N; r are positive integers, if each image block in I has N pixels andthe embedding function Em can embed r bits of secret data in an arbitrary image block

by changing colours of at most k pixels in the image block

De nition 2.3 For a given qcolour, a data hiding scheme (k; N; r) is called an optimaldata hiding scheme if r = MSDRk(N) and there does not exist a positive integer N0such that N0 < N, r = MSDRk(N0) Then N is denoted by Noptimum

De nition 2.4 For a given qcolour, a data hiding scheme (k; N; r) is called a nearoptimal data hiding scheme if r = MSDRk(N) and N > Noptimum

The chapter’s digital image steganography problem Design optimal or near optimaldata hiding schemes (k; N; r) for digital images (binary, gray and palette images).2.3 A New Digital Image Steganography Approach

This section introduces mathematical basis based on the Galois eld for the digital image steganography problem (Subsection 2.3.1), proposes a digital image steganography approach based on the Galois eld using graph theory and automata to design the data hiding scheme of the general form (k; N; blog2 pmn c) for the given assumptions, where k; m; n; N are positive integers and p is prime, shows su cient conditions for existence and proves existence of some optimal data hiding schemes (Subsection 2.3.2) Security analyses and an application of these data hiding schemes to the process of hiding a nite sequence of secret data in an image are considered in Subsection 2.3.2.

2.3.1 Mathematical Basis based on The Galois Field

This subsection constructs mathematical basis based on the Galois eld GF (pm) forthe digital image steganography problem, where p is prime and m is a positive integer(Propositions 2.2, 2.4 and Theorem 2.1)

Given the Galois eld GF (pm), recalled in Subsection 1.1.4 of Chapter 1, where p isprime and m is a positive integer Let GF n(pm) = f(x1; x2; : : : ; xn)jxi 2 GF (pm); i = 1;

ng, where n is a positive integer, with two operations of vector addition + and scalarmultiplication are de ned as follows

Trang 30

Proof Suppose [x]\[y] 6= ?, then there exists z in [x]\[y] By De nition 2.5, z = ax = by.Since a 2 GF (pm)nf0g, x = a 1by Thus x 2 [y] and therefore [x] [y] Similarly, [y] [x]and hence [x] = [y]

Propostion 2.1 The set of all classes forms a partition of the set GF n(pm)

Proof For all x 2 GF n(pm), then x 2 [x] by De nition 2.5 Thus the union of all classes is

GF n(pm) By Lemma 2.1, any two distinct classes are disjoint The proof is complete Denote the set of all classes by [GF n(pm)] This can be represented by [GF n(pm)] =f[x]jx 2 GF n(pm)g The number of elements of a set S is denoted by jSj.

Propostion 2.2 j[GF n(pm)]nf0gj = ppm 1 .

P t

i =1 aivi, where

Trang 31

Proof Evidently, jGF n(pm)nf0gj = pmn 1 GF (pm) Consider

X

B = f[ a0ivi0]ja0i 2 GF (pm)nf0g; [vi0] 2 S; i = 1; t; t kg:

i=1

To prove that S does not depend on the choice of representatives of classes, it su ces to

show that A = B By the hypothesis [v0 ] = [v i ], then v i = b i v0 Suppose [x] 2 A, then

x = ( i=1 a i v i ) = ( i=1 i 6 a i b i v 0 ) Clearly, a i b i = 0 by the de nition of the class, then

x B since b = 0, then there exists b 1, thus v = b 1v Similarly, B A.

So, A = B

De nition 2.7 Let V be a vector space over a eld K, S V Then S is called a k-Generatorsfor V , where k is a positive integer, if the two following conditions are satis ed

a) For all v; v0 2 S, there does not exist a 2 K such that v0 = av, b)

For all v 2 V nf0g, there exists t such that 1 t k and v =

v1; v2; : : : ; vt 2 S; a1; a2; : : : ; at 2 Knf0g

Lemma 2.2 Let S = fv1; v2; : : : ; vtg be a k-Generators for the vector space GF n(pm).Then S0 = f[v1]; [v2]; : : : ; [vt]g is a k-[Generators] for the set [GF n(pm)]

20

Trang 32

Proof Since S is a k-Generators for GF n(pm), then for all v; v0 2 S, there does not exists

a in GF (pm) such that v0 = av By Proposition 2.1 and De nition 2.5, [vi] 6= [0vi0] and

[v i ] 6= 0, for all v i 2 S; 1 i t For all [u] 2 [GF n(pm)]nf0g, then u = P k

i=1

The proof is complete

Lemma 2.3 Let S0 = f[v1]; [v2]; : : : ; [vt]g be a k-[Generators] for the set [GF n(pm)] Then

S = fv1; v2; : : : ; vtg is a k-Generators for the vector space GF

n(pm) Proof For all v 2 GF n(pm)nf0g, then

i = 1; k0; k0 k For all [v]; [v0] 2 S0, then there does not exists a in GF (pm) such that

v0 = av by Proposition 2.1 It means that for all v; v0 2 S, there does not exists a in GF(pm) such that v0 = av The proof is complete

Theorem 2.1 There exists S to be a k-Generators for the vector space GF n(pm) withjSj = N if and only if there exists S0 to be a k-[Generators] for the set [GF n(pm)] with

jS0j = N

Proof This is deduced immediately from Lemmas 2.2 and 2.3

Propostion 2.4 Let c be the number of k-[Generators] of N elements for the set[GF n(pm)] Then the number of k-Generators of N elements for the vector space

GF n(pm) is c(pm 1)N

Proof Suppose S0 is a k-[Generators] for [GF n(pm)] with jS0j = N Since S0 does notdepend on the choice of representatives of classes by Proposition 2.3, the number ofways to change representatives of all classes in S0 is (pm 1)N By the hypothesis, thenumber of k-[Generators] of N elements for the set [GF n(pm)] is c, then the number ofk-Generators of N elements for the vector space GF n(pm) is c(pm 1)N by Lemma 2.3and Theorem 2.1

2.3.2 Digital Image Steganography Based on The Galois Field GF (pm) Using GraphTheory and Automata

This subsection rstly proposes a digital image steganography approach based on theGalois eld GF (pm) using graph and automata to design the data hiding scheme of thegeneral form (k; N; blog2 pmnc) for the given assumptions, where k; m; n; N are positiveintegers and p is prime (Theorem 2.2 and Security analysis (2.12)) Secondly, thesubsection gives su cient conditions for existence of the optimal data hiding schemes

21

Trang 33

Let I be a set of all image blocks with the same size and image type and assumethat each image block in I has N pixels, where N is a positive integer For simplicity,the structure of an arbitrary image block I in I can be represented by

I = fI1; I2; :::; IN g;

where Ii is a colour value for binary and gray images or colour index in the palette forpalette images of the ith pixel in I with i = 1; N Consider C to be a set of all colourvalues or indexes of pixels of I

Let M be a nite set of secret elements and set M = GF n(pm)

Let K be a nite set of secret keys For all K 2 K, also assume that the structure ofthe key K is the same as the structure of the image block I So, we can write

K = fK1; K2; :::; KN gfor Ki 2 GF (pm) with i = 1; N

Assume that we nd a k-Generators S for GF n(pm) with jSj = N and S = fv1; v2; : : : ;

Given a ip graph G, we denote by Adjacent(cp; a) an adjacent vertex of cp

(Adjacent(cp; a) is adjacent from cp), where the weight a is assigned to the arc(cp; Adjacent(cp; a))

Assume that we build a ip graph G = (V; E)

From the way to determine the arc set E in De nition 2.8, assume that

Trang 34

De nition 2.10 Let 2 = GF n(pm); N = f1; 2; : : : ; Ng; 2N GF (pm)nf0g - the set of allsubsets of the set N GF p m

2.1 For the case v = q, then v + ( q) = 0.

Since is a k-Generators forRemark>

GF n(pm); jSj = N;S = fv 1 ; v 2 ; : : : ; v N g, thus there exist k0; k0 k; v i t 2 S;

1 i t N; a t 2 0g ; t =1; k0 such that v + ( q) = at v

i t

So, 2 given in De nition 2.10 is a function P

De nition 2.11 Let I 2 I; M 2 M and K 2 K The automaton A(I; M; K) is ave-tuple ( ; Q; q0; ; T ), where

2 The set of states Q = fqi; i = 0; N + 1jq0 = Pi=1 K i v i ; q i = 1(qi 1; (i; Ii));

i = 1; N ; qN+1 = 2(qN ; M)g;

3 The initial state q 0 ;

4 The set of nal states T = fqN+1g;

5 The transition function : Q! Q, (qi 1; Ii) = qi; i = 1; N ; (qN ; M) = qN+1:

Remark 2.2 The set of states Q and the transition function given in De nition 2.11 arecompletely determined based on the functions 1; 2 and it follows that the automatonA(I; M; K) is constructed accurately in De nition 2.11

Let an image block I 2 I, a secret element M 2 M, a key K 2 K By using theautomaton A(I; M; K) and the ip graph G, two functions Em and Ex in the data hidingscheme (I; M; K; Em; Ex) are designed as follows

The function Em (embedding M in I):

Remark 2.3 Consider I0 = Em(I; M; K), by (2.5), Em only changes colours of jqj pixels

in I based on the ip graph G, then I0 2 I So, Em designed holds De nition 2.1

The function Ex (extracting M from I0):

Trang 35

Propostion 2.5 For all (I; M; K) 2 I M K; Ex(Em(I; M; K); K) = M.

Proof Set M0 = Ex(I0; K) By De nitions 2.9 and 2.11, M0 = PN

i=1(V al(Ii0) + Ki)vi (2.9)

P NAfter implementing (2.3) q = i=1(V al(Ii) + Ki)vi By De nitions 2.10 and 2.11, after implementing (2.4) we consider two cases of q:

If q = ?, then (2.5) is not implemented and hence I is not changed Thus I 0 I and therefore

Theorem 2.2 Suppose that a k-Generators S for the vector space GF n(pm) is found and

a ip graph G is built Then there exists the data hiding scheme (k; N; blog2 pmnc), where N = jSj

Proof For the assumption that a k-Generators S for GF n(pm); jSj = N is found and a

ip graph G is built, we o er the way to construct the data hiding scheme (I; M; K; Em;Ex) based on the Galois eld GF (pm) by using the ip graph G and the automaton A(I;M; K) Em changes colours of at most k pixels I to embed M in I for all I 2 I; M 2 M by

De nition 2.10 and Statement (2.5)

Consider B to be the set of all secret data of length r bits, then jBj = 2r jMj = pmn by M

= GF n(pm) Suppose that we construct an injective function f; f : B ! M Then the Em isused to embed b 2 B in I as follows

I0 = Em(I; M; K);

Since f is injective by our supposition, after extracting M from I0 by Ex, the secret data

b will be determined accurately based on f

Since B and M are nite sets, thus to exist the injective function f, we let jBj jMj, itmeans 2r pmn, then r log2 pmn, choose r = blog2 pmnc So, for r = blog2 pmnc, the r

Trang 36

bits of the secret data b can be embedded in I By De nition 2.2, the data hidingscheme (I; M; K; Em; Ex) is a data hiding scheme (k; N; blog2 pmnc) So, the datahiding scheme (k; N; blog2 pmnc) exists

Security analysis of the data hiding scheme proposed (k; N; blog2 pmnc): Assumethat parameters k; N, Em, Ex, the vector space GF n(pm) and the ip graph G in thedata hiding scheme (k; N; blog2 pmnc) are published The secret element M isextracted from I0 by the extracting function Ex as follows

is pmN because K 2 K Consider GF to be an arbitrary subset of 2blog2 pmnc elements ofthe set GF n(pm ), B to be the set of all secret data of length blog2p mn c bits, it means

mn

f; f : B ! GF By (2.10), to decrypt the secret element M to the secret data b, we need tolog 2

2 b c ! Then

for a brute force attack, an attacker has to try every possible combination of S, K and f

in the given data hiding scheme The number of combinations of S, K and f is

Trang 37

Propostion 2.6 For n is a positive integer, there exists the optimal data hiding scheme(1; 2n 1; n) for binary, gray and palette images with qcolour = 1.

Proof For qcolour = 1, from (2.1), therefore p = 2; m = 1 If we build a ip graph G, thenthere exists the optima data hiding scheme (1; 2n 1; n) with qcolour = 1 by Theorem2.3 The Galois eld GF (pm), GF (pm) = GF (2) is the same as the eld Z2 (see [88]).Next, we show ways to build ip graphs G = (V; E) on the eld Z2 for binary, gray andpalette images as follows

For the binary image, then C = f0; 1g, cp 2 C, cp is a colour value of a pixel

V = C and for all v 2 V , the vertex v is assigned a weight by a functionVal such thatVal(v) = v;

E = f(cp; cp0)jcp; cp0 2 V; cp 6= cp0g and every arc (cp; cp0) has the same weight 1.For the gray image, then C = f0; 1; : : : ; 255g, cp 2 C, cp is a colour value of a pixel

V = C and for all v 2 V , the vertex v is assigned a weight by a functionVal such thatVal(v) = v mod 2;

E = f(255; 254); (cp; cp + 1)jcp 2 V; 1 cp 254g and every arc (cp; cp0) is assigned the same weight 1

For the palette image, then C = f0; 1; : : : ; 2t 1g, t is the number of bits to representcolour indexes, cp 2 C, cp is a colour index of a pixel The palette P = fp0; p1; : : : ; p2t

1g, pi 2 P , pi is the colour corresponding to the colour index i; i = 0; 2t 1 To unifynotations throughout this dissertation, here changes the name of the functionVal in theFOPA method, recalled in Section 1.2 of Chapter 1, to Valp and setVal(cp) = Valp(p),where the colour index cp 2 C corresponds to the colour p 2 P

Consider G to be the rho forest built by the algorithm for FOPA and assign the same weight 1 to all arcs of G However, all colours of the rho forest are replaced with their colour indexes

By De nition 2.8, it is not di cult to verify that the graphs G for binary, gray and paletteimages built as above are all ip graphs on the eld Z2 So, there exists the optimal datahiding scheme (1; 2n 1; n) for binary, gray and palette images with qcolour = 1

Notice that if we set N = 2n 1, then the data hiding scheme (1; 2n 1; n) becomes thedata hiding scheme (1; N; blog2(N + 1)c) Remember that for N is a positive integer,the data hiding scheme (1; N; blog2(N + 1)c) for binary image with qcolour = 1 is the datahiding scheme CTL [18] So, Proposition 2.6 shows that the data hiding scheme CTLreaches an optimal data hiding scheme for N = 2n 1, where n is a positive integer.

Theorem 2.4 Suppose that a 2-Generators S for the vector space GF n(pm) with

exists the optimal data hiding scheme (2; jSj; blog2 pmnc) for qcolour = pm 1.

Proof For the assumption of the theorem, by Theorem 2.2, there exists the data hidingscheme (2; jSj; blog2 pmnc) According to the proposed approach, the data hidingscheme (2; jSj; blog2 pmnc) is designed based on the assumption qcolour = pm 1 by2.1 Now, we prove it to be optimal for qcolour = pm 1

26

Trang 38

Suppose the data hiding scheme (2; N; r) is optimal for qcolour = pm 1, then )

r = MSDR 2 (N) = blog 2 (1+q colour C N1 +q colour2 C N2 )c = log 2 (1+qcolour N+q

For r = blog2 pmnc and q

colour 6 m 1, from (2.19), we obtain 7

Trang 39

Given an image F used as a carrier to embed a secret data sequence into, partition Finto disjoint image blocks of N pixels, F = fF1; F2; : : : ; Ft2 g Let D = D1D2 : : : Dt3 be asecret data sequence embedded in the cover image F , where Di is secret data of lengthblog2 pmnc bits, i = 1; t3 Since each blog2 pmnc bits of secret data is only embedded

in one image block of F , t3 t2

Let Jump be a bijective function used to determine the order of blocks in F in the process of hiding D in F , Jump : f1; 2; : : : ; t2g ! f1; 2; : : : ; t2g

Consider GF to be an arbitrary subset of 2blog2 pmnc elements of the set GF n(pm), B to

be the set of all secret data of length blog2 pmnc bits, it means B = f0; 1; : : : ; 2blog2 pmnc

1g in the decimal system Then there exists a bijective function f; f : B ! GF

In real applications, when apply the data hiding scheme (k; N; blog2 pmnc) based

on the proposed approach to the process of hiding D in F , use the secret key set K,

K = fK1; K2; : : : ; Kt1 g instead of one secret key The process of hiding D in F by usingthe data hiding scheme (k; N; blog2 pmnc) consists of the embedding algorithm EmDFand the extracting algorithm ExDF proposed as follows

The embedding algorithm EmDF (embedding a secret data sequence D in F ):

F 0 = F ; // F’ is called a stego image

The extracting algorithm Ex DF (extracting the secret data sequence D embedded from F 0):

t = 1;

For i = 1 to t3 Do

f

M = Ex(FJump(i); Kt); // Use the automaton A(FJump(i); M; Kt) (2.24)

Di = f 1(M); //f 1 is the inverse function of f (2.25)g

D = D1D2 : : : Dt3 ;

Propostion 2.7 For a cover image F , a secret data sequence D, a bijective function Jump,

a bijective function f, a secret key set K and the data hiding scheme (k; N; blog 2 pmnc) based

on the proposed approach given as above Suppose the stego image block F 0 is generated after D is embedded in F by the embedding algorithm Em DF Then the data sequence D0extracted from F 0 by the extracting algorithm ExDF is exactly the secret data sequence D.

Trang 40

Proof By (2.21) and (2.23), EmDF in (2.22) and ExDF in (2.24) use the same secret key

Kt The bijective function Jump guarantees for all i; j 2 f1; 2; : : : ; t3g; i 6= j;Jump(i) 6= Jump(j), it means that an arbitrary image block in F is only used at mostone time in the process of hiding By Proposition 2.5, M extracted by (2.24) is thesame as M embedded by (2.22) Then the bijective function f guarantees that Diencrypted by (2.20) is the same as Di decrypted by (2.25), i 2 f1; 2; : : : ; t3g Therefore

we complete the proof

Security analysis of process of hiding D in F : Assume that parameters k; N, Em, Ex, the vector space GF n(pm) and the ip graph G in the data hiding scheme (k; N; blog 2 pmnc) are published The secret element M is extracted from F Jump0(i) by (2.24), we have

M = Ex(FJump0(i); Kt);

from De nitions 2.9 and 2.11 and by (2.9), we obtain

of choices for the k-Generators S is c(pm 1)N N! The number of choices for the key set

K, two bijective functions Jump and f are pmt1 N , t2 ! and C 2 blog2 pmnc 2blog2 p mn c

This section shows that there exist the near optimal data hiding scheme (2; 9; 8)(Theorem 2.5 and Security analyses (2.45), (2.46)) and the optimal data hidingscheme (1; 5; 4) (Corollary 2.1 and Security analyses (2.47), (2.48)) for gray andpalette images with qcolour = 3

According to the way of constructing the Galois eld GF (pm) from the polynomial ring

Zp[x], where p is prime and m is a positive integer [88], here consider the case p = m = 2and use the irreducible polynomial g(x) = x2 + x + 1 in Z2[x] to construct the Galois eld GF(22) from the polynomial ring Z2[x], we obtain the Galois eld GF (22) as follows

GF (22) = f0; 1; x; x + 1gwith two operations addition + and multiplication are de ned as in Z2[x], followed by areduction modulo g(x)

29

Tiêu đề	Research on Development of Methods of Graph Theory and Automata in Steganography and Searchable Encryption
Tác giả	Nguyen Huy Truong
Người hướng dẫn	Assoc. Prof. Dr. Sc. Phan Thi Ha Duong, Dr. Vu Thanh Nam
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Mathematics and Informatics
Thể loại	doctoral dissertation
Năm xuất bản	2020
Thành phố	Hanoi

Định dạng
Số trang	104
Dung lượng	3,54 MB