1. Trang chủ
  2. » Luận Văn - Báo Cáo

Fooling deepfake detectors with fake personas using semantic adversarial examples = đánh lừa các hệ thống nhận diện deepfake bằng nhận dạng giả thông qua biến đổi ngữ nghĩa ảnh

63 14 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 63
Dung lượng 0,97 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGYMASTER THESIS Fooling Deepfake detectors with fake personas using semantic adversarial examples NGUYEN HONG NGOC Ngoc.NH202706M@sis.hust.edu.vn

Trang 1

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MASTER THESIS

Fooling Deepfake detectors with fake personas using

semantic adversarial examples

NGUYEN HONG NGOC Ngoc.NH202706M@sis.hust.edu.vn School of Information and Communication Technology

Supervisor’s signature

May 19, 2022

Trang 2

Graduation Thesis Assignment

Name: Nguyen Hong Ngoc

Phone: +84947265498

Email: Ngoc.NH202706M@sis.hust.edu.vn; ngocnguyen.nd97@gmail.com

Class: 20BKHDL-E

Affiliation: Hanoi University of Science and Technology

Nguyen Hong Ngoc- hereby warrants that the work and presentation in this thesis wereperformed by myself under the supervision of Assoc Prof Huynh Thi Thanh Binh andProf Yew Soon Ong All the results presented in this thesis are truthful and are not copiedfrom any other works All references in this thesis including images, tables, figures, andquotes are clearly and fully documented in the bibliography I will take full responsibilityfor even one copy that violates school regulations

Student

Signature and name

Nguyen Hong Ngoc

Trang 3

This Master thesis would not have been possible without the support of many people.First of all, I would like to acknowledge and give my warmest thanks to my supervisor,Assoc Prof Huynh Thi Thanh Binh, who has given me a lot of motivation to completethis work

I also thank Prof Yew Soon Ong, Doctor Alvin Chan, and especially Doctor Nguyen Thi

My Binh, for being wonderful mentors and for all the support, I could not have made itwithout your help and guidance I would also like to thank my committee members foryour thoughtful comments and suggestion to complete this thesis

I would also like to give a special thanks to my wife Pham Diem Ngoc and my family

as a whole for their mental support during my thesis writing process, you truly mean theworld to me Furthermore, in the absence of my friends, Vinh Tong, Quang Thang, MinhTam, Thanh Dat, and Trung Vu, I could hardly melt away all the tension from my work.Thanks for always accompanying me through ups and downs

Finally, this work was funded by Vingroup and supported by Vingroup Innovation dation (VINIF) under project code VINIF.2020.ThS.BK.06 I enormously appreciate allthe financial support from Vingroup, allowing me to stay focused on my research withoutworrying about my financial burden

Trang 4

Recent advances in deep generative modeling techniques such as Generative AdversarialNetworks (GANs) can synthesize high-quality media content (including images, videos,and sounds) This content, collectively known as deepfake, can be really difficult to dis-tinguish from real ones due to their extremely realistic looks and high resolution Theinitial purpose of synthesizing media content is to provide more examples for trainingdeep models, thus, improving the performance and robustness of these models However,nowadays, deepfakes are also being abused for many cybercrimes such as fake personas,online frauds, misinformation, or producing media featuring people without their consent.Deepfake has become an emerging threat to human life in the age of social networks

To fight against and prevent these aforementioned deepfake abuses, forensic systems withthe ability to detect synthetic content, have recently been exclusively studied by the re-search community At the same time, anti-forensic deepfakes are being investigated tounderstand the gaps in these detection systems and pave the way for improvement In thescope of this Master thesis, I investigate the threat of anti-forensic fake personas with theuse of semantic adversarial examples, where a fraudster creates a fake personal profilefrom multiple anti-forensic deepfakes portraying a single identity To comprehensivelystudy this threat model, three approaches that an attacker may use to conduct such attacksare considered, encompassing both white- and black-box scenarios A range of defensestrategies is then proposed with the aim to improve the robustness of current forensic sys-tems against such threats Experiments show that while the attacks can bypass currentdetection, the proposed defense approaches that consider the multi-image nature of a fakepersona can effectively mitigate this threat by lowering the attack success rate The re-sult of this thesis can help strengthen the defense in the fight against many cybercrimesutilizing deepfakes

Student

Signature and Name

Nguyen Hong Ngoc

Trang 5

TABLE OF CONTENTS

CHAPTER 1 INTRODUCTION 1

1.1 Deepfake 1

1.2 Applications of deepfake 3

1.2.1 Image editing 3

1.2.2 Digital cinematic actors 4

1.2.3 Generating training examples 5

1.3 Deepfake abuses 6

1.3.1 Disinformation 6

1.3.2 Fake personas/identities 7

1.4 Forensic and anti-forensic deepfake 8

1.5 Research challenge: Anti-forensic deepfake personas 10

1.6 Motivations 11

1.7 Thesis methodology 11

1.8 Contributions 12

1.9 Thesis organization 13

CHAPTER 2 BACKGROUND 14

2.1 Deepfake generators 14

2.1.1 Autoencoder 14

2.1.2 Generative Adversarial Networks 15

2.2 Semantic modification for GAN 16

2.3 Deepfake forensic systems 17

Trang 6

2.4 Attacks to deepfake forensic systems 18

2.4.1 Spatial transformations 18

2.4.2 Pixel-level adversarial examples 19

2.4.3 Semantic adversarial examples 19

CHAPTER 3 ANTI-FORENSIC FAKE PERSONA ATTACK 21

3.1 Problem modeling 21

3.2 White-box approaches 21

3.2.1 Two-phases approach 22

3.2.2 Semantic Aligned Gradient Descent approach 23

3.3 Black-box approach 25

3.3.1 Introduction to Evolutionary Algorithms 25

3.3.2 Semantic Aligned Evolutionary Algorithm 26

CHAPTER 4 DEFENSES AGAINST ANTI-FORENSIC FAKE PERSONAS 30

4.1 Defense against Single-image Semantic Attack task 30

4.2 Defenses against anti-forensic fake persona attack 31

4.2.1 Naive Pooling defense 32

4.2.2 Feature Pooling defense 33

CHAPTER 5 EXPERIMENT RESULTS AND ANALYSIS 35

5.1 Experiment setup 35

5.1.1 General setup 35

5.1.2 Hyper-parameters setting 37

5.2 Single-image Sematic Attack task evaluation 37

5.2.1 Baseline 38

Trang 7

5.2.2 Two-phases white-box approach evaluation 39

5.2.3 SA-GD white-box approach evaluation 40

5.2.4 SA-EA black-box approach evaluation 40

5.2.5 Comparison between the approaches for SiSA 41

5.2.6 Visual quality evaluation 42

5.2.7 Computational time evaluation 44

5.3 Anti-forensic fake persona attack evaluation 45

5.4 Discussions 46

5.4.1 Visual quality trade-off between approaches 46

5.4.2 Query-based defenses 47

5.4.3 Ethical discussions 47

CHAPTER 6 CONCLUSION AND FUTURE WORKS 48

6.1 Contributions 48

6.2 Limitations and future works 48

Trang 8

LIST OF FIGURES

1.1 Examples of deepfake images from website thispersondoesnot exist.com.These images are generated from StyleGAN2 [2] 11.2 Barrack Obama deepfakes video created from a random source video 21.3 The four types of face manipulation in deepfake 31.4 Popular Faceapp filters, utilizing deepfake technology to edit images invarious ways such as: older self, cartoon style, adding facial hair or swap-ping the gender 41.5 CGI in Rogue One movie to recreate young princess Leia, later improvedwith deepfakes by fans 51.6 Deepfake video of Donald Trump aired by Fox affiliate KCPQ 61.7 With the rising of deepfake technology, any social account could be fake 71.8 Andrew Walz was, according to his Twitter account and webpage, runningfor a congressional seat in Rhode Island In reality, Mr Walz does notexist, and is the creation of a 17-year old high-school student 81.9 Original deepfake image is detected ‘fake’ by the forensic system How-ever, after adding specially crafted imperceptible adversarial perturba-tions, the deepfake image, even though looks the same, is detected ‘real’ 91.10 Attacker bypasses forensic systems with seemingly legit fake persona pro-file, created by semantically modifying certain attributes of one sourcedeepfake 12

2.1 Architecture of an autoencoder, includes an encoder and a decoder 152.2 Architecture of a Generative Adversarial Network, includes a generatorand a discriminator [23] 152.3 The modeling of a simple GAN based deepfake generator The GANgenerator takes latent code z as input and output the deepfake image x 162.4 Semantically modifying the attribute smile of a face image using the at-tribute vector Va = smile The attribute vector is learned from the latentspace, using the method proposed in [24] 172.5 Spatial transformation adversarial attack to CNN classifier The classifierfails to classify these images after simple rotation and translation 18

Trang 9

2.6 The creating of pixel-level adversarial examples which uses gradient propagation to update the perturbations The loss function here is theprediction score Fd(x) of the detector 192.7 The creating of semantic adversarial examples based on gradient back-propagation Different from adversarial examples, the gradient is back-propagated to update perturbation δ, which is added directly to the origi-nal latent code z 20

back-3.1 Two-phases approach illustration Phase 1: Semantic modifying the inal deepfake x along the target attributes to create x′ = G(z + αVA).Phase 2: Adding pixel-level adversarial perturbation σ to create the anti-forensic deepfake x′+ σ 223.2 Gradient back-propagation step of Semantic Aligned Gradient Descentapproach, where a perturbation δ is added to the latent code z and up-dated by gradient descent This step is similar to the semantic adversarialexample attack 243.3 Example of semantic aligning the perturbation δ into δ′, with the orthogo-nal threshold h⊥and only one attribute vector Vais targeted In the case oftwo or more target attributes, the perturbation is projected onto the spacespanned from the target attribute vectors 243.4 An example of 1-point crossover in SA-EA The first half of f and thesecond half of m are concatenated to create offspring c 273.5 An example of average crossover in SA-EA Offspring c is created bytaking average of f and m 273.6 An example of random noise mutation in SA-EA Chromosome c is mu-tated to chromosome c′ by adding a noise uniformly sampled in range

orig-∆ 28

4.1 Retraining the deepfake detector with addition of semantic attack images 304.2 Illustration of Naive Max-pooling defense, where m images of the pro-file are fed into the detector D to get m corresponding prediction scores.Then, m prediction scores are fed through a max pooling layer to get theoverall score of the profile 324.3 Illustration of Feature Max-pooling Defense, where m images of the pro-file are fed into the cnn layer of the detector then into a max-pooling layer

to get the profile feature vector Lastly, the profile feature vector is fedinto the f c layer to get the prediction 33

Trang 10

5.1 Two-phases white-box ASR: (a) against original detector with differenttarget attributes ; (b) against original and defense detector (average valueacross target attributes) 395.2 The ASR of SA-GD white-box: (a) against original detector with differ-ent target attributes ; (b) against original and defense detector (averagevalue across target attributes) 405.3 The attack success rate of SA-EA black-box: (a) against original detectorwith different target attributes ; (b) against original and defense detector(average value across target attributes) 415.4 The ASR of SA-GD white-box, SA-EA black-box and grid-search ap-proaches giving the same h⊥(average value across target attributes) 425.5 FIDCelebA score (smaller is better) of each attack approach against theoriginaland defense detector Red dash line shows the FIDCelebAvalue ofStyleGAN-generated-images from the input latent codes 435.6 Two-phases approach: samples of inputs and corresponding outputswith different target attributes (ϵ = 0.25) Inputs are predicted ‘fake’while outputs are predicted ‘real’ 435.7 SA-GD approach: samples of inputs and corresponding outputs withdifferent values of orthogonal threshold h⊥ Beside the target attributeage, other attributes such as smile, pose, hairstyle and background aresometimes changed, more often and more intense when the orthogonalthreshold h⊥increases 445.8 The P-ASR of two phases approach: (a) against Naive Max-pooling strat-egy with different ϵ, (b) Naive Max-pooling vs Feature Max-pooling strat-egy where ϵ = 0.2 (m is the number of images in a profile) 455.9 Exaggerated examples of how larger perturbation affects the visual qual-ity: two-phases approach generates noisier images while SA-GD/SA-EAoutput non-target attributes changes 46

Trang 11

LIST OF TABLES

5.1 Comparison on the accuracy (Acc.) and the average precision (AP) tween the defense and the original detector, test sets are from [17] (no-crop evaluation) 38

Trang 12

be-CHAPTER 1 INTRODUCTION

In this chapter, deepfake technology together with its promising applications, as well asits malicious abuses, are introduced The concept of forensic deepfake systems and anti-forensic deepfake examples are also presented Lastly, the research challenges of thethesis are raised and the motivations behind these challenges are discussed

1.1 Deepfake

Originated from a Reddit user who shared synthetic fake pornography videos featuringfaces of celebrities, the term “deepfakes” refers to high-quality media content generatedfrom deep-learning generative techniques Even though the term has only been popularsince 2019, the techniques of image manipulation were developed way back in the 19thcentury and mostly applied to motion pictures The technology is steadily improved dur-ing the 20th century, and more quickly with the invention of digital video [1] Deepfaketechnology has been developed by researchers at academic institutions, beginning in the1990s, and later by amateurs in online communities Over the last few years, deepfakehas drastically improved in generation quality due to the advance in graphic computa-tional power and deep learning techniques

Figure 1.1: Examples of deepfake images from website thispersondoesnot exist.com.These images are generated from StyleGAN2 [2]

Nowadays, with the power of artificial intelligence and deep learning, the quality of fakes synthetic content is greatly enhanced to a remarkable realistic level For instance,

deep-in Figure 1.1, these two seemdeep-ingly normal facial photos of two normal people turn out to

be deepfake images, which are taken from the website thispersondoesnotexist.com True to its name, these two people do not exist since these images are generatedcompletely random by computer, to be more specific, by a deep generative architecturecalled StyleGAN2 [2] Even if we examine these images carefully, it is nearly impossible

Trang 13

to tell any difference between these deepfake images and real ones Not to mention thatthe resolution of these deepfakes is also profoundly high with razor-sharp image quality.Deepfake has gained a lot of attention in 2018 when Jordan Peele and BuzzFeed cooper-ated to synthesize a fake PSA video delivered by Barrack Obama [3] utilizing deepfaketechnology From an arbitrary source video of a person giving a random speech, deepfakecan swap the face and the voice of the person with the face and voice of Barrack Obamawhile the content of the speech is unchanged (Figure 1.2) Even though the deepfakevideo was supposed to be for entertainment purposes, the realism of its visual and audiocontent had made many wonders about the safety of the technology and the possibility ofabusing deepfake for cybercrimes.

Figure 1.2: Barrack Obama deepfakes video created from a random source video.Deepfake comes in many forms, from the most common form of image [4]–[6] to video[7]–[9] and even audio deepfakes [9], [10] The Barrack Obama deepfake video men-tioned above (Figure 1.2) is an example that combines all of these forms together

Among the subjects of deepfakes, the most widely studied is human facial deepfake as itcould be greatly used for many applications Within the field of human facial deepfake,there are four common types of face manipulation techniques, which are (Figure 1.3) [11]:

• Entire face synthesis: Refers to the case where an entire facial image is ated/synthesized by computer techniques The face image is synthesized from arandom seed and usually belongs to a non-existed person

gener-• Face identity swap: Deepfakes where a target facial image of a person is swappedwith a source facial image of another person To be more specific, only the faceidentity is swapped while other content in the image is unchanged

• Facial attributes manipulation: Manipulation of a target facial image to changecertain attributes such as hairstyle, eyeglasses, or even age For instance, this ma-nipulation technique can semantically change a facial image to create an older look

of the person

• Facial expression manipulation: Manipulation of a target facial image to changethe expression of the person such as smile, surprise, angry, etc

Trang 14

Figure 1.3: The four types of face manipulation in deepfake.

Even though each type of face manipulation has its own application, in the scope of thisthesis, I exclusively study face synthesis techniques [11], in which entire non-existentface images are generated

1.2 Applications of deepfake

1.2.1 Image editing

One of the most well-known applications of deepfake technology is image editing Faceapp(https://www.faceapp.com/) is a famous software that allows image editing us-ing deepfake Faceapp provides dozens of different filters that can be used on users’uploaded images to create various effects These filters usually apply the aforementionedfacial attributes manipulation deepfake, targeting different attributes that can semanticallymodify the image in the most realistic way Figure 1.4 illustrates a few of the most popularfilters in Faceapp, including:

• Older filter: creates an image of older self from the input image, allows users to seewhat they may look like in the future

• Genderswap filter: swaps the gender of the person in the input image, allowing

Trang 15

users to see what they look like in the opposite gender.

• Cartoon filter: creates a cartoon version of the input image

• Add facial hair filter: adds facial hair to the input image

Figure 1.4: Popular Faceapp filters, utilizing deepfake technology to edit images in ous ways such as: older self, cartoon style, adding facial hair or swapping the gender

vari-Facial expression manipulation deepfake can also be used to edit images and videos ple may use expression manipulation to change the expression of a person in images orvideos as their desire Furthermore, face identity swap deepfake can be used for imageediting, by allowing users to insert their face identity into the images of others

Peo-Compared to traditional image processing techniques (e.g tools such as OpenCV andframeworks such as Photoshop), deepfake image editing has the advantage of being fullyautomatic, since, with a well-trained deepfake generative model, the input image throughthe generator is automatically transformed In contrast, with traditional techniques, eachinput must be manually handled, which often takes a lot of time and effort Not only that,deepfake can generate a very natural look to the image, which with traditional techniquesdepends a lot on the skills of the editor

1.2.2 Digital cinematic actors

As mentioned above, one of the biggest applications of deepfakes is for creating digitalactors in the cinematography industry Although image manipulation appeared a longtime ago in form of computer graphic effects (CGI), more recent deepfakes technology

is promising to generate even better quality in a much shorter time and takes much lesseffort Deepfake technology has already been used by fans to insert faces into existingfilms, such as the insertion of Harrison Ford’s young face onto Han Solo’s face in the

Trang 16

Figure 1.5: CGI in Rogue One movie to recreate young princess Leia, later improved withdeepfakes by fans.

movie Solo: A Star Wars Story, and similar techniques were used for the acting of PrincessLeia in the movie Rogue One [1] As in Figure 1.5, CGI is used to recreate young PrincessLeia, which is based on the face expression scanned from another actress using hundreds

of motion sensors With deepfake technology, instead of the motion sensors to capture thefacial expression, we only need reference videos, which can either be the original video ofthe target (in this case, princess Leia from original movies), or the source video where wewant to replace the face of the target in The quality of deepfake videos is getting betterevery day, while the cost of time and resources to synthesize deepfake is much less thanthe cost for CGI

1.2.3 Generating training examples

One other important application of deepfake is to generate more examples for trainingdeep neural networks As many of us may know, the capability of artificial neural net-works, regardless of size, is highly dependent on the data on which the networks aretrained If the data is too small or too biased, the performance of the networks in reallife may be significantly affected For instance, a face recognition system, which is onlytrained the data of facial images of young people, will perform poorly when recognizingthe faces of older people An animal image classifier that is trained only with images ofblack cats will likely not be able to correctly classify images of white cats Since the era ofArtificial Intelligence (AI), biased data is always one of the biggest problems when train-ing neural networks The problem also extremely takes a lot of effort to solve, becausethe only way to make biased data non-biased is to collect even more data to improve thediversity of training samples Collecting data is usually very time-consuming and oftencosts a fortune to do

Deepfake comes in handy as a promising answer to the biased-data problem without ing too many resources With deepfake, people can easily generate new examples toimprove the diversity of the training dataset For instance, in the above face recognition

Trang 17

cost-system example, we can use deepfake to generate older versions of young people’s ages in the dataset and use those deepfakes to train the model In the case where thedeep learning system lacks training data, a deepfake generator can be used to generatemore data for training This solution helps save a lot of time and money for the systemdesigners compared to manually collecting real data.

im-1.3 Deepfake abuses

In contrast to promising applications, deepfakes are mostly being abused for many illegalactivities and cybercrimes Two most dangerous crimes can be done with deepfakes aredisinformation and fake personas/identities

1.3.1 Disinformation

Deepfake’s remarkable performances in generating photo-realistic content involving facesand humans have raised concerns about issues such as malicious use of fake media tospread misinformation [12] and fabricating content of people without their consent [13].With deepfakes, one can also spread fake news, and hoaxes targeting celebrities whichare backed up by convincingly high-quality images/videos For instance, the origin ofdeepfakes is from synthetic pornographic videos featuring the faces of celebrities, whichcan be used to black-mailed or disrepute these people without their consent A reportpublished in October 2019 by Dutch cyber-security startup Deeptrace estimated that 96%

of all deepfakes online were pornographic [1]

Figure 1.6: Deepfake video of Donald Trump aired by Fox affiliate KCPQ

Some people can also use deepfake videos to misrepresent well-known politicians invideos, targeting their rivals to achieve an advantage in politics Some incidents havebeen recorded in the past such as: in January 2019, Fox affiliate KCPQ aired a deep-fake video of Donald Trump during his Oval Office address, mocking his appearance and

Trang 18

skin color [1] (Figure 1.6) In April 2020, the Belgian branch of Extinction Rebellionpublished a deepfake video of Belgian Prime Minister Sophie Wilm`es on Facebook [1].

1.3.2 Fake personas/identities

Deepfake is also being abused a lot to create fake personas/identities and pretend to

be other people For instance, someone with access to the technology may open uct/social accounts using the identities of others or even of non-existed people with theintention to do cybercrimes such as scams and financial frauds Criminals can easily pre-tend to be other people online and do crimes without the consequence of being tracked(see Figure 1.7 and 1.8) With the support of deepfake, they can even generate photo-realistic ID card images to gain the trust of others, thus, successfully scamming in onlinetransactions

prod-Figure 1.7: With the rising of deepfake technology, any social account could be fake

A famous example of online fake personas deepfake is the case of the Twitter accountAndrew Walz According to this account, Andrew was a congressional candidate runningfor office in Rhode Island, who called himself “Republican” with the tagline “Let’s makechanges in Washington together” Walz’s Twitter account was complete with his pictureand a prized blue check-mark, showing that he had been verified by Twitter as one of theaccounts of congressional and gubernatorial candidates (Figure 1.8) Andrew Walz, how-ever, was actually the creation of a 17-year-old high-school student During his holidaybreak, this student created a website and Twitter account for this fictional candidate [14].The Twitter profile picture was downloaded from the website thispersondoesnotexist.com.These are just a few of many abuses of deepfake, which are increasing in quantity andquality every day Even though deepfake has a lot of great applications, we need to bemore aware and cautious of deepfake’s potential threat

Trang 19

Figure 1.8: Andrew Walz was, according to his Twitter account and webpage, runningfor a congressional seat in Rhode Island In reality, Mr Walz does not exist, and is thecreation of a 17-year old high-school student.

1.4 Forensic and anti-forensic deepfake

Since the advent of deepfake abuses and cybercrimes, there is a wide array of defensesproposed to mitigate this emerging threat and prevent the risk These defenses, usuallyaim to counter against deepfake by detecting and classifying the deepfake content amongreal ones, also known as deepfake forensic/detection systems In recent years, deep-fake forensic systems are extensively studied and developed by the research communities.Most forensic systems can be divided into two main groups:

• The first group of measures seeks to detect fake content based on high-level mantic features such as behavioral cues [13] like inconsistent blinking of the eyes[15] These methods have the advantage of fast validation of new instances butare usually quickly outdated when the deepfake technology improves over time.Today’s deepfake content has developed into a near-perfection quality and excep-tionally natural looks, which makes these high-level features highly realistic andnon-distinguishable

se-• The second group of defenses is based on low-level features underneath the imagepixels by training a convolution neural network (CNN) to classify images/videos aseither fake or real [16]–[19] These forensic detectors normally achieve state-of-the-art performance due to the ability to automatically learn feature extraction ofCNN

On the opposite side of forensic systems, we have anti-forensic deepfake - deepfakeexamples that are specially crafted to bypass forensic systems, fooling these detectors

Trang 20

to detect synthetic content as real These anti-forensic deepfakes, also called ial examples, are most commonly generated by using gradient back-propagation to addimperceptible adversarial perturbations to the pixels of the original deepfake [12], [20].Figure 1.9 illustrates an example of the adversarial example Despite that to human eyes,the deepfake image seems to remain unchanged after adding the perturbations, forensicsystems are still fooled and decide that the image is real Many experiments have shownthat recent deepfake forensic systems are extremely vulnerable to adversarial examples,revealing a big gap in current detection techniques [12].

adversar-Figure 1.9: Original deepfake image is detected ‘fake’ by the forensic system However,after adding specially crafted imperceptible adversarial perturbations, the deepfake image,even though looks the same, is detected ‘real’

Forensic versus anti-forensic deepfake has been back and forth for years While forensicsystems get improved over time due to recent advanced classification techniques and ex-tensive training, anti-forensic is also getting more effective with new generative networksdeveloped every day Nonetheless, forensics (defenses) and anti-forensic (attacks) are twosides of the problem To make progress on one side, researchers must have knowledgefrom both sides For instance, knowledge obtained from the attack methods can be used tounderstand the weakness of the defenses, thus, proposing counter techniques to mitigatethe attacks

Understanding this relationship between forensic and anti-forensic, in the fight againstdeepfake abuses, there are normally two main groups of approaches The first group is toexplore and discover different types of attacks on the forensic detectors, since it is crucial

to be aware of possible attacks and prepare the corresponding counter defenses againstthem The second group is to propose techniques that focus directly on improving theforensic systems, either it is to boost the performance of the deepfake detectors in general

or may simply gain robustness against a certain type of attack Either way, both groups

of approaches are equally important and must be conducted simultaneously for the bestefficiency

Trang 21

1.5 Research challenge: Anti-forensic deepfake personas

As introduced in the section 1.3.2, the astonishing ability of deepfake technology to thesize photo-realistic media has raised many questions on the risk of deepfake abusesand cybercrimes Deepfake persona/identity attacks, among those crimes, are highlydangerous since they can cause tremendous loss to victims To create such fake per-sonas/identities on the internet, a fraudster/attacker has to generate a set of many deepfakemedia (including images, voices, and videos) that satisfy three following conditions:(i) Quality and quantity - the quality of the deepfake media has to be realistic enough

syn-to fool a human So as the quantity of the media, since a profile that contains onlyone image of the person would not be so convincing to others

(ii) Identity consistency - the identity of the fake persona has to remain consistentbetween the deepfake media For instance, deepfake images have to be of thesame person, preferably with different semantics (different poses, facial expres-sions, ages) and scenarios to make it seem more legit

(iii) Anti-forensic - With the reputation of deepfake, many of today social networks aregetting integrated with forensic systems to help detect and filter out malicious fakecontent Therefore, the generated deepfake media used to form the fake persona has

to bypass current forensic systems in order to successfully attack As a result, thefake persona profile has to be anti-forensic

In the scope of this Master thesis, I perform an exclusive study on this anti-forensicdeepfake personas/identities abuse Not to forget that both forensic and anti-forensic ap-proaches are equally important in the fight against deepfake abuses, the challenges of thisresearch is:

1 to study and come up with different attack methods that can be used to create suchanti-forensic deepfake persona profiles that satisfy all three aforementioned condi-tions

2 to analyze the attack methods and propose corresponding defenses that counter theattacks, improving the robustness of deepfake forensics against the attacks

To the best of my knowledge, this thesis is among the first research works to study thethreat of anti-forensic fake persona attacks Although recent works [12], [20] show that

it is possible to create high-quality anti-forensic deepfakes that bypass state-of-the-artforensic systems with adversarial examples, this attack can only create separated anti-forensic fake images with no correlation Hence, in the context of deepfake personaattacks, adversarial attacks alone do not satisfy the identity consistency and the quantitycondition For these reasons, I find this research topic to be novel, highly challenging andcomplies perfectly with the requirements for my Master program

Trang 22

1.6 Motivations

Today, social network media has become an irreplaceable part of our life This networkenvironment allows great connection between people, but at the same time provides idealplaces for fraudsters to commit cybercrimes online Many of these cybercrimes in socialnetworks apply deepfakes to create fake identities, fake personas, and fake news, causingtremendous loss to millions of victims Online transaction frauds can cause a loss inmoney/properties depending on the size of the transactions Fake news attacks, on theother hand, may be used to create a bad reputation for celebrities or politicians, indirectlycausing an enormous loss in many aspects to the victim

Realizing this emerging threat of deepfake, this Master work is performed with the aim toprovide a safer and more secured social network life, protecting users from being victims

of the aforementioned cybercrimes The proposed methods in this research can be used tostrengthen current deepfake detectors, improve the robustness of forensic systems againstfake persona attacks in particular, and adversarial attacks in general With the obtainedresults, I also hope to facilitate the research community in developing better defense tech-niques, bridging the gap in current deepfake detection

1.7 Thesis methodology

In this thesis, to study the anti-forensic deepfake personas abuse threat, I investigate threedifferent approaches that an attacker in real life may use to perform the attack that satisfiesall three conditions presented in Section 1.5 Essentially, these approaches are designed toincrease the classification error of the forensic systems through iterative edits on a sourcefake image Simultaneously, to ensure identity consistency, the changes are constrained

to alter only certain target semantic attributes of the image (e.g identity-preserving tributes such as pose or facial expressions) with minimal changes to others This processcan be repeated on the same source image with different target attributes to create a set ofdiverse deepfake images that are consistent in identity, which is later used to form a fakepersona profile (see Figure 1.10)

at-The proposed methods also take into account both the white-box setting (based on dient back-propagation) and the black-box setting (based on the evolutionary algorithm)

gra-To be more specific, a white-box scenario refers to the case where the attackers have fullaccess to the forensic detectors, useful when testing the limits of the defense In contrast,the black-box scenario assumes a more realistic case where the attackers do not have in-formation about the architecture and the gradient of the detectors Experiments of theproposed fake persona attacks on a state-of-the-art forensic system show that the attackscan achieve a high success rate, revealing a gap in current detection techniques againstdeepfake

Trang 23

Figure 1.10: Attacker bypasses forensic systems with seemingly legit fake persona profile,created by semantically modifying certain attributes of one source deepfake.

As a means to defend and counter this threat, two defense strategies are proposed to prove the robustness of forensic systems The first defense strategy is based on adversarialretraining, where adversarial examples are used to enhance the training data and improvethe detection accuracy against future attacks The second strategy is specially made forthe fake persona attacks, by treating the profile of interest as a set of images and takinginto account the correlation between these images to decide Through the experiments,the defense strategies are shown to be effective in reducing the attack success rate of suchthreats

im-1.8 Contributions

The fake persona attack approaches investigated in this work would enable the forensicdesigner to identify the major weaknesses, if any, of current deepfake detector systems.From there, designers and managers can have a better idea of the limitation of the sys-tems and implement suitable actions in response Simultaneously, defense techniques arealso proposed in this research to improve the performance of current forensic systems,boosting the robustness of these systems against attacks and deepfake abuses

Toward the research community, the scientific contributions of this work are summarized

as follows:

• This thesis investigates the possibility of fake persona attacks on the internet thatsatisfy both quality, quantity, identity consistency, and anti-forensic (to be able to

Trang 24

bypass forensic systems).

• To achieve such attacks, different approaches that alter only targeted semantic tributes while fooling forensic systems are proposed, including:

at-– Two approaches based on white-box attack assumption, which are the phases approach and the Semantic Aligned Gradient Descent (SA-GD) ap-proach

Two-– Semantic Aligned Evolutionary Algorithm (SA-EA) approach, which is spired by Evolutionary Algorithm, is based on the black-box attack assump-tion

in-• To counter against the attacks, three defense strategies are proposed, including versarial retraining defense, Naive Pooling defense, and Feature Pooling defense.These defenses are shown to improve the robustness of the forensic systems andmitigate the fake persona threat

ad-• Discussion on the performance of the approaches under different circumstancesshows great insights While the Two-phases approach has the highest success rate,

it may degrade the visual quality of the deepfake images in a certain way compared

to other approaches

1.9 Thesis organization

The remainder of this thesis is organized as follows Chapter 2 presents the fundamentalbackground knowledge and a survey of scientific works that are related to the researchchallenges addressed in this thesis work Chapters 3 and 4 address the research challengestated above, by proposing different approaches for fake persona attacking, and discussingdefense strategies respectively Chapter 5 is the experimental results, evaluation, andanalysis of the proposed approaches Finally, chapter 6 concludes the thesis and discussesthe future works

Trang 25

CHAPTER 2 BACKGROUND

In the second chapter, fundamental background knowledge that is necessary to formulatethe proposed research challenges is presented Some of the most famous attacks to deep-fake forensic systems and their counter defenses are introduced as related works to thisthesis The differences between these works and the Master thesis are also raised to showthe novelty of this work

2.1 Deepfake generators

2.1.1 Autoencoder

In early works, content such as images and written texts is usually generated with coders - a type of artificial neural network used to learn efficient coding of unlabeled data(unsupervised learning) The encoding is validated and refined by attempting to regener-ate (or decode) the input from the encoding A common autoencoder includes two maincomponents:

autoen-• Encoder: a neural network which maps from the data space to an encoding spacewhich is normally has much less dimensions than the input data space The encodertries to learn an encoding (representation) for a set of data, typically for dimension-ality reduction, by training the network to ignore insignificant data

• Decoder: a neural network which maps from the encoding space back to the dataspace The decoder learns to reconstruct the data from an arbitrary encoding.Figure 2.1 illustrates a simple autoencoder To use an autoencoder to generate data (which

in this case, deepfake content), first, the autoencoder is trained to reconstruct the exact put image More specifically, each input sample is forwarded through the encoder andthen through the decoder to get the corresponding output Reconstruction loss (the differ-ence between the input and the output) is calculated and back-propagation is performed toupdate the encoder and the decoder After training, the autoencoder learned the mappingfrom the data to the encoding and vice versa At this step, we take the decoder part of theautoencoder to make the deepfake generator By that, feeding a random encoding that issampled from the encoding space into the decoder, we would receive a deepfake image asthe output Recent variational autoencoder, also known as a VAE, is an improved version

in-of the autoencoder which in-offers better training quality

Recent works [21], [22] applied VAEs for generating images Although the VAE hassimple architecture and is easy to use, the generated deepfake is somewhat restricted tothe data and the quality is not particularly high Therefore, most autoencoders are used tolearn the representation of the data instead of generating new data

Trang 26

Figure 2.1: Architecture of an autoencoder, includes an encoder and a decoder.

2.1.2 Generative Adversarial Networks

Generative Adversarial Networks (GANs), proposed by Goodfellow in 2014 [4], is one ofthe most efficient generative architectures for synthesizing deepfake content Compared

to VAEs, GANs provide much superior image quality and the generated deepfakes can

be much more diverse Figure 2.2 illustrates the architecture of the original GAN model,which includes two main components:

Figure 2.2: Architecture of a Generative Adversarial Network, includes a generator and adiscriminator [23]

• Generator: a neural network that takes a random noise vector z (also known as thelatent code) as the input and outputs an image The initial generator simply outputs

a noisy image with no semantic content The generator is sometimes compared tothe decoder in the autoencoder due to similarity

• Discriminator: a neural network, usually a convolution neural network (CNN)which acts as a classifier that classifies an input image into two classes: ‘real’(the image is originated from real data) and ‘fake’ (the image is generated from thegenerator)

Trang 27

The training process of GAN requires a training dataset of real data, for example, real ages collected online During the training, the generator and the discriminator are trainedsimultaneously with different loss functions The discriminator is trained to classify anddiscriminate generated samples from real ones better In contrast, the generator is trained

im-to generate better samples that look more similar im-to the real one, i.e harder im-to be sified correctly by the discriminator The generator and the discriminator are trained andimproved together over time until they reach convergence, similar to the Minimax rule inGame theory The only downside of the training process of GAN is that it takes a longtime to complete since GAN trains two networks simultaneously at a time

clas-After training, the generator part is taken to create the final generator We simply input

a random latent code vector and the generator will output the deepfake example Thesemantic content of the deepfake highly relies on the value of input latent code Recentlyproposed GANs such as ProGAN [5], StyleGAN [6] and StyleGAN2 [2] showed highlyrealistic results with spectacular level of details For example, Figure 1.1 in Section 1.1shows deepfake facial images generated from StyleGAN2

Figure 2.3: The modeling of a simple GAN based deepfake generator The GAN tor takes latent code z as input and output the deepfake image x

genera-Currently, most deepfake generators are based on GAN due to their high visual quality ofthe deepfake images In this thesis, GAN is mainly used as the deepfake generator model.Hereinafter, a GAN is notated and modeled as G such that: x = G(z) : Z 7→ X, where

z ∈ Z is the input latent code vector and x ∈ X is the output synthesized image (Figure2.3) Z is also called the latent space of GAN

2.2 Semantic modification for GAN

The proposed anti-forensic fake persona attacks in this thesis also involve the concept ofsemantic modification for GAN In here, semantic refers to the content of the deepfakeimages, that establishes the meaning of the image For a facial image, semantic modifica-tions can be modifications to certain facial attributes of the image, such as expression, age,hair, eyes, or skin of the person In GAN model, certain semantic attributes of the syn-thetic images can be modified by properly editing the input latent codes However, theseattributes are usually entangled in the latent space, making controlled semantic modifica-tions to images difficult to accomplish In [24], the authors proposed a method to interpret

Trang 28

the latent space of GANs, thus, providing a way to arbitrarily edit certain attributes of faceimages such as age, gender, smile and pose For each attribute, a corresponding attributevector Va ∈ Z is generated by learning from the latent space of GANs Semantic mod-ification is done by translating the latent code along the direction defined by a selectedattribute vector Va Figure 2.4 illustrates an example of semantic modifying the attributesmileof an image by translating the latent code along attribute vector Vsmile.

It is worth noting that, in this thesis, the semantic modification for GAN is done with trained attribute vectors proposed in [24], including vectors for many different attributes:age, gender, smile and pose

pre-2.3 Deepfake forensic systems

As introduced earlier in Section 1.4, deepfake forensics are techniques that allow tinguishing the deepfake content from real ones At first, most forensic systems werebased on high-level semantic features or fingerprints that GANs produced in the genera-tive process [8], [15], [25]–[27] Currently, the state-of-the-art detectors [7], [17]–[19],[28], [29] are based on image classifiers that work on low-level and pixel-related features,which are usually not visible to humans These detectors are regarded to be more versa-tile for different deepfake techniques and generators The experiment showed that theseCNN-based detectors can generalize to effectively detect deepfakes from many previouslyunseen generators

dis-Here, forensic systems are modeled as neural binary classifiers which map a given inputimage x ∈ X into two classes: ‘real’ and ‘fake’ For notation, hereinafter, a detector D isdefined as in (2.1), where Fd(x) : X 7→ [0, 1] is corresponding to the predicted probability

of the image x to be synthesized and λ is the real/fake threshold which is often set at a

Figure 2.4: Semantically modifying the attribute smile of a face image using the attributevector Va = smile The attribute vector is learned from the latent space, using the methodproposed in [24]

Trang 29

neutral value such as 0.5.

D(x) =

(real if Fd(x) = activation(CNN(x)) < λ

f ake if otherwise

(2.1)

2.4 Attacks to deepfake forensic systems

Since state-of-the-art detectors are image classifiers, techniques used to manipulate CNNclassifiers’ predictions can also be applied to attack deepfake forensic systems Mostattacks exploit the unrefined boundaries between classes in the loss landscape of the clas-sifier In this section, some of the most popular attacks to deepfake forensic systems inspecific and CNN classifier, in general, are introduced

trans-Figure 2.5: Spatial transformation adversarial attack to CNN classifier The classifier fails

to classify these images after simple rotation and translation

Trang 30

2.4.2 Pixel-level adversarial examples

Adversarial examples [31]–[33] to attack deepfake forensic systems is a more popularchoice due to its high success rate and image quality retraining In here, adversarialexamples refer to deepfake images that are adversarial to deepfake forensic detectors,meaning that these deepfakes are detected as real by the detectors

The pixel-level adversarial example is the first and most common type of adversarial ample, in which, adversarial perturbations are added to the deepfake images in the pixelspace These adversarial perturbations are visually imperceptible to human eyes and arecrafted by increasing the classification loss of the classifier More recently, [12], [34] pro-posed white- and black-box attacks based on pixel-level adversarial perturbation to fooldeepfake detectors Figure 2.6 illustrates the process of creating pixel-level adversarialexamples, where the gradient is back-propagated to update the perturbations with the lossfunction is the prediction score of the detector The perturbations here are also limitedwith the condition that the p-norm of the perturbations has to be lower than an amount ϵ.This limitation is to ensure that the perturbations are imperceptible to human eyes, thus,retaining the quality of the image

ex-Figure 2.6: The creating of pixel-level adversarial examples which uses gradient propagation to update the perturbations The loss function here is the prediction score

back-Fd(x) of the detector

2.4.3 Semantic adversarial examples

Several previous works [12], [20], [35] proposed semantic adversarial attacks on fiers by altering the semantics of images through a generator’s latent space Instead ofupdating the perturbations in the pixel space of the image, back-propagation in semanticadversarial example updates the latent code directly, hence, changing the semantic mean-ing of the image This attack targets the fact that uncommon attributes in the trainingdata may cause the forensic to fail when detecting Several works also perform differentways to semantically modify the deepfakes with the aim to bypass deepfake detectors.For example, Ho et al in [36] designed attacks based on both small and large image

Trang 31

classi-perturbations, resulting from camera shake and pose variation.

Figure 2.7 illustrates the creation of semantic adversarial examples In which, the bations here (denoted as δ) are added directly to the original latent code z The gradientback-propagation is also performed to update the perturbation δ also with the loss func-tion is the prediction score of the detector

pertur-Figure 2.7: The creating of semantic adversarial examples based on gradient propagation Different from adversarial examples, the gradient is back-propagated toupdate perturbation δ, which is added directly to the original latent code z

back-These works, however, do not constrain the modifications to targeted attributes and are,thus, prone to altering the identity of a fake persona Different from these works, this the-sis studies a different scenario where attackers create a fake persona profile by generatingmultiple semantically different examples portraying one single identity Current attackscan not be directly applied here, thus, I design targeted semantic perturbations that aim toretain the identity of the image while still fooling the detector Nonetheless, the defenseapproach to augment training data is partially inspired by adversarial training [37], [38]where image classifiers are trained on adversarial examples to improve their adversarialrobustness

Ngày đăng: 20/07/2022, 08:05

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[2] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8110–8119 Sách, tạp chí
Tiêu đề: Analyzingand improving the image quality of stylegan
[3] “Watch jordan peele use ai to make barack obama deliver a psa about fake news,”The Verge, 2021, https : / / www . theverge . com / tldr / 2018 / 4 / 17 / 17247334/ai-fake-news-video-barack-obama-jordan-peele-buzzfeed(accessed Sep 1, 2021) Sách, tạp chí
Tiêu đề: Watch jordan peele use ai to make barack obama deliver a psa about fake news
[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680 Sách, tạp chí
Tiêu đề: Generative adversarial nets
[5] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017 Sách, tạp chí
Tiêu đề: Progressive growing of gans forimproved quality, stability, and variation
[6] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for gener- ative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2019, pp. 4401–4410 Sách, tạp chí
Tiêu đề: A style-based generator architecture for gener-ative adversarial networks
[7] D. G¨uera and E. J. Delp, “Deepfake video detection using recurrent neural net- works,” in 2018 15th IEEE International Conference on Advanced Video and Sig- nal Based Surveillance (AVSS), IEEE, 2018, pp. 1–6 Sách, tạp chí
Tiêu đề: Deepfake video detection using recurrent neural net-works
[8] Y. Li and S. Lyu, “Exposing deepfake videos by detecting face warping artifacts,”arXiv preprint arXiv:1811.00656, 2018 Sách, tạp chí
Tiêu đề: Exposing deepfake videos by detecting face warping artifacts
[9] S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Synthesizing obama:Learning lip sync from audio,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–13, 2017 Sách, tạp chí
Tiêu đề: Synthesizing obama:Learning lip sync from audio
[10] T. Chen, A. Kumar, P. Nagarsheth, G. Sivaraman, and E. Khoury, “Generalization of audio deepfake detection,” in Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, pp. 132–137 Sách, tạp chí
Tiêu đề: Generalizationof audio deepfake detection
[11] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega-Garcia, “Deep- fakes and beyond: A survey of face manipulation and fake detection,” arXiv preprint arXiv:2001.00179, 2020 Sách, tạp chí
Tiêu đề: Deep-fakes and beyond: A survey of face manipulation and fake detection
[12] N. Carlini and H. Farid, “Evading deepfake-image detectors with white-and black- box attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 658–659 Sách, tạp chí
Tiêu đề: Evading deepfake-image detectors with white-and black-box attacks
[13] S. Agarwal, H. Farid, Y. Gu, M. He, K. Nagano, and H. Li, “Protecting world leaders against deep fakes.,” in CVPR Workshops, 2019, pp. 38–45 Sách, tạp chí
Tiêu đề: Protecting worldleaders against deep fakes
[14] “A high school student created a fake 2020 candidate. twitter verified it,” Edition CNN, 2021, https://edition.cnn.com/2020/02/28/tech/fake-twitter-candidate-2020/index.html Sách, tạp chí
Tiêu đề: A high school student created a fake 2020 candidate. twitter verified it
[15] Y. Li, M.-C. Chang, and S. Lyu, “In ictu oculi: Exposing ai created fake videos by detecting eye blinking,” in 2018 IEEE International Workshop on Information Forensics and Security (WIFS), IEEE, 2018, pp. 1–7 Sách, tạp chí
Tiêu đề: In ictu oculi: Exposing ai created fake videosby detecting eye blinking
[16] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nieòner, “Face- forensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1–11 Sách, tạp chí
Tiêu đề: Face-forensics++: Learning to detect manipulated facial images
[17] S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn-generated im- ages are surprisingly easy to spot... for now,” in Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, https://github.com/peterwang512/CNNDetection (accessed Nov 1, 2020), vol. 7, 2020 Sách, tạp chí
Tiêu đề: Cnn-generated im-ages are surprisingly easy to spot... for now
[18] J. Stehouwer, H. Dang, F. Liu, X. Liu, and A. Jain, “On the detection of digital face manipulation,” arXiv preprint arXiv:1910.01717, 2019 Sách, tạp chí
Tiêu đề: On the detection of digital facemanipulation
[19] J. Frank, T. Eisenhofer, L. Sch¨onherr, A. Fischer, D. Kolossa, and T. Holz, “Lever-aging frequency analysis for deep fake image recognition,” arXiv preprint arXiv:2003.08685, 2020 Sách, tạp chí
Tiêu đề: Lever-aging frequency analysis for deep fake image recognition
[20] D. Li, W. Wang, H. Fan, and J. Dong, “Exploring adversarial fake images on face manifold,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5789–5798 Sách, tạp chí
Tiêu đề: Exploring adversarial fake images on facemanifold
[21] W. Xu, S. Keshmiri, and G. Wang, “Adversarially approximated autoencoder for image generation and manipulation,” IEEE Transactions on Multimedia, vol. 21, no. 9, pp. 2387–2396, 2019 Sách, tạp chí
Tiêu đề: Adversarially approximated autoencoder forimage generation and manipulation

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w