Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views46 pages

Media Forensics and Deepfake - Systematic Survey: RS Open Journal On Innovative Communication Technologies - Issue 8

Uploaded by

Ashu Ashu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views46 pages

Media Forensics and Deepfake - Systematic Survey: RS Open Journal On Innovative Communication Technologies - Issue 8

Uploaded by

Ashu Ashu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

RS Open Journal on Innovative Communication Technologies • Issue 8

Media Forensics and


Deepfake - Systematic
Survey
Nadeem Jabbar CH*1 Aqib Saghir2 Ayaz Ahmad Meer2
Salman Ahmad Sahi2 Bilal Hassan3 Siddiqui Muhammad Yasir4
1Department of
Computer Science,The Superior University, Lahore, Pakistan
[email protected],
2Department of Computer Science,The Superior University, Lahore, Pakistan,
3Northumbria University, London Campus, UK,
4School of Arti cial Intelligence, Tongmyong University, Busan, South Korea

ResearcherStore

Published on: May 31, 2023


DOI: https://doi.org/10.46470/03d8 bd.7351a3bb
License: Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-
NC 4.0)
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

ABSTRACT
Deepfake is a generative deep learning algorithm that creates or changes facial features in a very realistic way,
making it hard to differentiate the real from the fake features. It can be used to make movies look better, as
well as to spread false information by imitating famous people. In this paper, many different ways to make a
Deepfake are explained, analyzed and separated categorically. Using Deepfake datasets, models are trained and
tested for reliability through experiments. Deepfakes are a type of facial manipulation that allow people to
change their entire faces, identities, attributes, and expressions. The trends in the available Deepfake datasets
are also discussed, with a focus on how they’ve changed. Using Deep learning, a general Deepfake detection
model is made. Moreover, the problems in making and detecting Deepfakes are also mentioned. As a result of
this survey, it is expected that the development of new Deepfake-based imaging tools will speed up in the
future. This survey gives in-depth review of methods for manipulating images of face, and various techniques
to spot altered face images. Four types of facial manipulation are specifically discussed which are attribute
manipulation, expression swap, entire face synthesis, and identity swap. Across every manipulation category,
we yield information on manipulation techniques, significant benchmarks for technical evaluation of
counterfeit detection techniques, available public databases, and a summary of the outcomes of all such
analyses. From all of the topics in the survey, we focus on the most recent development of Deepfake, showing
its advances and obstacles in detecting fake images.

1 Introduction
Many deep learning-based methodologies are used to generate temporal data and fake spatial by changing one
person’s face (posture, identification, expression) and so on with another is referred to as the famous term
"Deepfake." Digitally altered photos and videos, especially with Deepfake technologies, have been a major
source of public concern. This technology was being used for fake news, fake pornography, hoaxes, and
financial fraud, etc.[1]

2
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Fig. 1. Different categories of deepfake Data [1]

However, many things changed dramatically in the recent years, starting with the early works and continuing to
the present. The ability to mechanically synthesize non-existent faces or modify the genuine face of a single
person in an image or video is becoming increasingly common these days. Before Deepfake, fake media
generation required expertise in the field, and took more time and effort. Tools like Adobe Lightroom and
Adobe Pohtoshop are used by experts for retouching and modifying a picture[2]. Deepfakes, which are now
generated using artificial intelligence, have no need for too much expertise. Fake data can be generated only in
a few clicks. In graphics industries and computer vision Deepfake has a wide range of benefits as well as
harmful uses; with the help of Deepfake stunning landscapes can be generated; on the other hand, fake news
generation can mislead the audience, which may increase security risks. To generate fake movies of female
celebrities, many people with malicious motives have used these technologies in such a way that triggered
important societal challenges, including sexual harassment. Furthermore, several malicious applications, like
DeepNude4[3], have taken advantage of Deepfake it allows them to take a snapshot of a fully suited lady and
produce an image in which her clothing are removed. Since a few years, researchers have been working on
developing various techniques to distinguish images and videos that are synthesized with artificial intelligence,
especially those manipulated with GANs and their variants) from those that are captured naturally with a
camera. There are competition such as DFGC ( Deepfake game challenge)[4] encourage researchers to propose
better solution for detection. Detecting fakes in the spatial domain is one of the most commonly used strategies
for finding fake videos and images in recent studies [2]. In the frequency domain, several researchers are
attempting to distinguish between original and forged data, which is a relatively new field[2] Deepfake creation
is classified into four types: 1. Facial Reenactment 2. Face Replacement 3. Face Editing 4. Face synthesis

In this survey, Deepfake detection strategies for images and videos are gone through. Moreover, the Deepfake
creation processes and datasets that are used to detect Deepfakes, as well as the Deepfake detection algorithms,
are discussed.

Below are the major objectives of this article, which are summarized as follows:

3
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Deepfake tools, which are used to change many features of photos and videos, are undergone.
Deepfake datasets, as well as certain standard datasets, are introduced for forensic review.
Some of the most recent methods for detecting Deepfake in images and videos are mentioned.
The rest of the article is decomposed as: Paper collection and review methods are provided in 2 . Then 3 is
about deepfake creation. 4 is outlined with Public Datasets. However, in 5, Deepfake detection is focused.
Finally, discussions and future work are drawn in 6.

2 Prepare Collection And Review Methods

Fig. 2. Tree diagram showing structure of the paper

2.1 Planning the Review


The primary objective of this step is to deduce the problem that needs to be solved. Next, to come up with the
idea through which the problem can be solved, and evaluate or solve the problem. In this step, the idea and its
requirements are provided. Eventually, the final step is to test the reliability of the criteria and procedures used
to solve the problem.

2.2 Conducting the Review


It includes six obligatory steps on the basis of the guiding concepts outlined in this phase.

Research Questions (RQs)

Because the proper question leads to the right direction and gives the study significance. Therefore, the
success of the study is dependent on the appropriate and relevant research questions[5]. So, we outline the

4
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

following research questions: 1) The first question is, what are the most famous Deepfake detection methods?
2) What datasets are typically utilized for detecting Deepfake? 3) What are the features typically utilized in
detecting Deepfake? 4) What models are used to detect Deefake manipulation? 5) For Deepfake Detection
Techniques, what is the Classification Structure? 6) What is the overall productivity of various Deepfake
detection techniques on the basis of experimental evidence?

Search Strategy

In this stage, the maximum possible relevant data is gathered. In order to detect Deepfake, this article
includes as many combinations of related search phrases as possible. Using Boolean terminology, search terms
were combined with "AND" or "OR," which is the main idea. The search terms are: Deepfake AND Digital
Media Forensics, FaceSwap OR Deepfake, Video manipulation OR Fake image/video/face detection OR Facial
Manipulation. For research purposes, more than one or two sources were used. Furthermore, the authenticity of
the research was tested thoroughly using multiple resources as mentioned below. The numerous research
articles are available online, the top repositories that were both important and easy to use were selected which
are shown

ScienceDirect (ELSEVIER)
ACM Digital Library
Web of Science
Google Scholar
Database Systems and Logic Programming (DBLP)
SpringerLink
Computing Research Repository
IEEE Xplore Digital Library
Semantic Scholar

Selection Criteria

To find the most reliable articles, three rules were set up for this research.

The keywords, title, or abstract must contain the search phrases (Some articles, although based on Deepfake,
did not contain words like "Deepfake" in the keywords, title, or abstract.)
In writing, empirical evidence is made clear.
The main goal of this study is about how images or videos can be changed and how we can detect change
images and videos

Quality Assessment Criteria (QAC)

5
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Analyzing the material in the document and evaluating the quality of the evidence are equally important.
Researchers should be cautious when interpreting the findings of a poorly executed study since they may have
been influenced by methodological flaws. For that reasons, right criteria must be chosen in order to assess the
quality of the evidence. We validate the selected studies using the criteria defined in[6] and evaluate the
gathered material by applying these criteria.

In our quality assessment phase, we decided 96 research articles and 21 additional reviews (7 SLRs, 10
analyses, and 4 surveys) relating to Deepfake detection.

Data Synthesis The goal of data synthesis is to arrange and compile the findings of the conducted
research. We evaluate the data, once accumulated, in order to extract additional information. Also, we use
various tools for data visualization, including tables and charts, to display the acquired data.

Reporting the Review After reviewing all of the research, we organized the results properly for the target
readers and distribution medium.

2.3 Survey Scope


The technical aspects of Deepfakes are the paper’s primary focus. This paper briefly discussed spotting a
deepfake and avoiding being caught, as different research publications cover it. The moral and social elements
of Deepfakes, which are discussed in this survey, are not the primary focus of this study. There are several
types of media manipulation, including body, expression, and tone. However, we focused only on Deepfakes of
facial in our survey.

6
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Fig. 3. Number of published paper from year 2015 to 2022 as per Google Scholar and [7]

2.4 Paper Collection Method


First, over 100 papers on Deepfakes generation, Deepfakes detection, and Deepfakes evasion were gathered
from arxiv. Next, on two major scientific databases (IEEE, DBLP, Google Scholar, and various other sources
from which the latest research publish) we used keywords in searching to recognize and collect papers on
Deepfake. To ensure a more comprehensive and accurate survey, publications in major conferences and
respective workshops were also browsed of last three years to ensure a more thorough and precise survey. The
following keywords are used: video synthesis, tampered face, facial image, face swapping, deepfake, GAN-
synthesized, , forgery, and AI-synthesized - manipulation.

2.5 Paper Collection Reserves


We found 287 papers from arXiv, Google Scholar and IEEE. The papers were mostly about how to make
Deepfake, Deepfake detection, and how to avoid detection of Deepfakes. Figure 3illustrates the allocation of
research that was released in various locations. Here, papers from the top journals and sources were divided
into several groups. The "Others" category included publications from lesser-known publications and
conferences. Furthermore, Unpublished papers were categorized as "arXiv".

7
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

3 Deepfake Creation
In recent years, Computer vision has progressed and achieved excellent results through deep learning. Deep
learning is also one of the most widely used technologies for creating artificial content. Meanwhile, Digital
images and videos manipulation is one of the leading interests in deep learning. A wide range of workshops
and conferences have a special session for Deepfake, which reflects the interest of the research community [7].
Furthermore, different challenges like Media forensics Challenge (MFC 2018) and Deep Fake Detection
Challenge (DFDC) have been launched, which also reflects the significance of the topic.

Fig. 4. Number of papers collected for the current study

The technology to create Deepfakes has existed for decades; however, creating these effects took an entire
studio of experts with just a short time ago. Currently, Deepfake technologies are capable of manipulating
pictures or videos much more swiftly due to the addition of Artificial Intelligence (Deepfake means AI, Deep

8
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

learning base content). GAN’s (Generative Adversarial Networks) is the most known source that provides
unbelievably realistic results in deep learning[8]. An undeniable fact is that GANs are tough to work with and
need a lot of training data. [9]There are many challenges in the development of GAN while creating smooth
and manageable syntheses, mostly focused on the high-resolution domains. On top of that, there are many
mobile and desktop-based applications in the market that assist people in creating Deepfakes. Each application
uses its own developed GAN to achieve its purposes. ZAO, a mobile application[10], AutoFaceswap[11] and
FaceApp[12] help the general public on the internet to create Deepfakes without any difficulties. Face
swapping is a highly used technique in the field of Deepfake. Through this technique, one face can be moved
from the source image to the target image and the main idea behind this technique is also GANs. Over time,
images created by Style-GAN[13] and its variants such as Style-Gan2[14] and Style-GAN-Ada[15]are getting
ever more lifelike and utterly unrecognizable to the naked human eye. Each manipulation method has its
unique effect, such as without modifying any other facial traits changing eye size or altering skin color. In
contrast, it is impossible to generate HD human faces using Style-GAN[13]. On the other hand, BigGAN[16]
cannot manipulate the length and complexion of a person without modifying other features of images.

9
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Fig. 5. Flow chart of explained information in current study.

Below, we cover the categories of manipulation in more detail.

3.1 Editing
In the enchantment, Deepfake, the features of the target image can be changed, removed or added. The
technology can be used to manipulate facial hair, weight, age and clothes[17] etc. FaceApp[12]is widely used
for editing that is done just for entertainment purposes. However, it can also be used falsely to attack a target
personality. Due to this, a constantly growing concern has settled related to the developing of Deepfake
mechanics.This technology has made it possible to produce proofs of the actions that never happened. For
instance, (i)an ill leader could be made to appear well.[18]. (ii) remove object’s clothes for the entertainment
and blackmailing purpose[19] etc.

10
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Fig. 6. GAN fingerprint example, that can be found in Deepfake-generated media

11
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

3.2 Synthesis
Synthesis method uses extremely intelligent GANs such as Style-GAN[13] to generate faces which do not exist
in the form of images. This method generates amazing outputs such as facial images of high-level resolution. It
can be used in different fields such as video games, movies. Yet, like the editing method, synthesis can also be
used for negative purposes.

3.3 Identity Replacement


It is another trait of deep learning. When the content of the target is replaced by the source while taking care of
identity, it is called identity replacement. The following are the two methods used for identity replacement.

Swap

Identity swap is quite common in the world of Deepfake. This method is also commonly known as face
swap. It can be generated by transferring content to target from source which is driven by target. In simple
words, one can replace only a few features of the face like open eyes OR smile persons digitally.

Transfer

In the transfer technique output is generated when the content of the source image is exchanged with that
of the target image. The fashion business frequently uses facial transfers to visualize people wearing various
outfits.

Identity Replacement or attack model

It is well known and widely used in negative activities and applications[20]. For example, an attacker can
swap the face of the victim with a naked porn-stars in order to defame and blackmail the victim. This strategy
has previously been employed as a means of disseminating political viewpoints.[21]

3.4 Reenactment
Another manipulation technique for creating Deepfakes is reenactment. Based on GAN architectures, the
available techniques like neural textures[22] and Face2Face[23]are used to make a video, in which the facial
expression of one person can be replaced by the facial expression of some other person. The following list of
reenactment’s branches is provided:

Expression Swap is best described as a face modifier technique, which is generated when source image is
used to derive the expression of target. It is also the most common technique used in movies, video games,
digital media etc. where the performance of the person can be enhanced and improved. Researchers altered 3D-
scanned head models in 2003[24]. Without a 3D model, it was proved that it was possible in 2005. Eventually,
between 2015 and 2018, Justus Thies et al. [23][25][26] demonstrated the use of 3D parametric models to

12
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

provide high-quality, in-the-moment outcomes. In 2022, S.M. Yasir [27]proposed a method to detect 3D
instance from 3D point cloud that is valuable for 3D face detection and its applications.

Mouth reenactment has been used for many years. This method is recognized as dubbing in which the
mouth of the target person is derived by the source or an audio input containing conversation. For long, this
technology is being widely used when dubbing a movie into another language. For example, if a movie is
originally directed in Spanish Language, it can be dubbed into English language using this method. The
developers of Obama Net[28] proposed a network that uses text as input rather than audio to reproduce a
person’s mouth and voice.

Gaze reenactment is another kind of manipulation approach generated when the direction of target eyes
and eyelids are driven by the source. This procedure is generally used to get realistic output of photographs to
maintain eyes contact[29] or to change the direction of one’s gaze. The Gaze Redirection Network was
proposed by Y. Yu et al. [30] (GRN). In GRN, the source angle, head pose, and cropped eye of the target are
individually encoded before being sent via an encoder-decoder network to produce an optical flow field. The
generator is driven by an optical flow field that is applied to successive frames in a source video.

Pose reenactment is a commonly used technique of manipulation generated when the head position of
the target is derived by the source. This method has mainly been used for scanning the faces of individuals in a
security footage. Moreover, this approach can be used as a means for improving facial recognition
software[31]. J. Cao et al.[32], The authors recommend employing two GANs: The first frontalis the face to
produce a Ultraviolet map, [27] and the second turn around the face at the required target angle. As a result,
each model executes a simpler procedure. Thus, the models work together to create an image of superior
quality.

Body reenactment, which is also known as pose transfer and human pose synthesis which is a branch of
reenactment approach that are listed above and except that it is pose of source body transfer to the target body.

Table 1: Facial manipulation techniques used to create Deepfakes

Table 1
Facial Manipulation Key Idea

Entire Face Synthesis Generates perfect, nonexistent facial images using a powerful
GAN model, such as StyleGAN or StyleGAN2-Ada.

Identity Swap Using FaceSwap or Deepfake methods to swap out a person’s


face for another person’s face in a photograph or video.

13
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Attribute Manipulation Altering some facial features, such as the colour, age, gender,
skin, hair, and eyeglasses, for example, using Star-GAN.

Expression Swap Altering the facial expression of one person in a video with the
facial expression of another person, e.g., Face2Face,
NeuralTextures.

Miscellaneous Face morphing is the process of producing synthetic biometric


face samples that mirror the provided biometric data. Remove
the identity information shown on a face image or video with
face de-identification. Facial expression swap, sometimes
known as lipsinc deepfakes, is a technique used to synthesise
video from audio or text.

3.5 Miscellaneous
Other than the above given categories of creating Deepfake, there are some further types of manipulation.
Hereunder, we discuss three of the various evolving techniques, including Face morphing, Face de-
identification, and text to video or audio to video swaps of facial expression.

Face Morphing generates biometric faces, which are purely artificial and are copies of biometric features
from different sources. In other words, face morphing is the utilization of picture control projects to consolidate
two separate pictures into a fresh brand-new picture. U. Scherhag et al.[33] investigation of the face morphing
approach in 2019 demonstrates the use of attack detectors and morphing algorithms.

Face de-identification The primary objective is to obscure a person’s identity from a facial video or
image to protect their privacy [34]. There are various ways to accomplish this. The simplest method may be to
blur or pixelate the face (e.g., in Street View of Maps). More advanced techniques attempt to produce face
images with distinct identities while keeping all other elements (position, emotion, illumination, etc.) the same.
As a result, the idea of face de-ID is relatively broad. Face identity swapping could be one method for
achieving face de-identification. Applying face de-ID to still photos was the foundation of earlier studies in this
field. A multifactor framework for de-ID was provided by Gross et al. in [35], including linear, bilinear, and
quadratic models. On an expression variant face database, they demonstrated that their technology could
safeguard anonymity while maintaining the usefulness of the data. Recent advances in image synthesis
methods depend on generative deep neural networks, especially GAN, and have prompted new face de-ID
techniques that substitute synthesized faces for real ones, such as those used in [36] and [37]. A further
suggestion made by the authors in [38] was to confuse random face-based gender classifiers by using semi-
adversarial networks (SAN). Recently, in [39], Gafni et al. revealed 2019 a technique for face de-ID with
compelling performance even in unconstrained movies. Their strategy is based on a learned face classifier and
an adversarial auto-encoder. They can create a rich latent space in which identification and expression

14
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

information are embedded. A fresh face de-ID technique built on a deep transfer model was also given in [40].
[41]This technique interprets the non-identity-related facial characteristics as the original faces’ style. It applies
a trained facial attribute transfer model to extract and map those characteristics to various faces, yielding
highly encouraging results in still photos and videos.

Text to video or Audio to video both are the branches that also come under the tree of lip-sync
Deepfakes[42]. These methods create videos by synthesizing the target’s expression through text or voice.
Artificial Intelligence has made people capable of generating Deepfakes that they type whatever they want
their subject will say. For example, these Deepfake methods are used in making three-dimensional videos, such
as cartoon videos. Using these methods, the written script or the audio of a speaker can directly be converted
into a video just like converting text to video. The method of generating high-quality video of someone
conversing with an exact lip-sync track is explained through an example of a video[43]. In [44], [45] discuss
more key state-of-the-art techniques Video portrait editing technique and Audio-based technique. Furthermore,
Fried et al.[46] came up with a text-based editing approach for changing speech content in videos. This is done
to create a new video by taking clips from a video of a person speaking and combining the necessary spoken
material with the person’s lips to match the new words.

4 Public Datasets
4.1 Real Datasets
CelebA: Liu et al. [47] it built through labeled images of celebrities faces. There are ten thousand celebrities in
CelebA dataset, each of which has twenty images. This means there are 200,000 images in total. A professional
labelling company annotates every photo in CelebA with and five key points and forty face characteristics.

Fig. 7. The frequency of used datasets in recent researchs.

15
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

4.2 Fake Datasets


The UADFV is the dataset from Li et al.[48]. It has 98 videos, 49 of which are real YouTube videos, and the
number of fake videos created with the help of FakeApp is 49.
Korshunov et al.[49] made a database called Deepfake-TIMIT in 2018. From 32 different subjects, It has
620 videos which are Deepfake. There are 20 different Deepfake videos in each dataset. Ten videos are 128
x 128 in size, while the others are 64 x 64 in size. Faceswap-GAN is used to make the synthesized videos.
DFDC Preview is a dataset in which 5000 videos generated with two facial manipulation
algorithms.Dolhansky et al.[50] The character of said dataset are of different genders, skin tones, ages, etc.
Video participants were allowed to record their videos with any background they desired, which resulted in
visually diverse backgrounds. Videos include a variety of lighting conditions and head poses.
Google DFD Dufour and Gully et al.[51] More than 3000 tampered videos featuring 28 characters in various
scenes. Deepfake generation techniques are used to create the videos from 100s of real videos that are freely
available on the internet.
FaceForensics++ is a popular dataset that consists of one thousand real video sequences and four thousand
videos Rossler et al. [52] that were generated by four automatic face manipulation algorithms as Deepfakes,
Face2Face, FaceSwap, and NeuralTexture A total of 977 traceable YouTube videos were used in the creation
of the videos. The majority of people in the videos have frontal faces.
Celeb-DF comes with a tremendous challenging video dataset of Deepfake [53], Celeb-Df used an improved
synthesis method which generate around 5,639 high-quality videos.Its also have 590 original videos
collected from YouTube with different ages, ethic groups and genders.
DeeperForensics-1.0 The largest dataset for face forgery detection has been made available Jiang et al. [54];
it consists of 60,000 videos with 17.6 million frames. There are 100 actors included in the 10,000 edited and
50,000 raw videos. There are 55 men and 45 women among them. The actors’ skin tones range from black,
brown, yellow, and white. Without glasses or decorations, all faces are clean in it.
Wild Deepfake Zi et al. [55] The dataset Wild Deepfake , is made up of 7,314 face sequences takenfrom
707 Deepfake videos that were all found on the internet. By testing Deepfake detectors against real-life
Deepfake , WildDeepfake is able to evaluate their effectiveness.
OpenForensics On multi-person images, Le et al. [56] takes care of Deepfake’s detection capabilities. With
almost 334,126 faces and a total of 115,325 images (45,473 original photos and 70,325 manipulated
images). OpenForensics can also be utilized for segmentation and tasks standard object detection,
demonstrating its versatility. It also includes facial information such as facial landmarks, a forgery category,
a segmentation mask, and a bounding box.
The Diverse Fake Face Dataset (DFFD) Dang et al. [57] have created a large dataset of face counterfeit.
Males account for 47.7 percent of all photographs or videos, 52.37 percent are female, with an averageage of
21-50 years. The real face samples are taken from FFHQ, CelebA, and FaceForensics++. Video clips from
FaceForensics++ were used for facial identification and expression swapping. In order to generate attribute

16
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

manipulated images, they have used two techniques:StarGAN[17] and FaceAPP[12]. In recent years,
PGGAN and StyleGAN [58] have been used to synthesize face images.
The GBDF Gender Balanced Deepfake Dataset[59] dataset is created using FF++(c23 version), Celeb-DF,
Deeper Forensics-1.0 and consist of 10, 000 videos with 5000 each for males and females.
DFDC Facebook has built a huge dataset of face videos that can be used to train detection models. A well-
known dataset Dolhansky et al. [60] Deepfake Detection Challenge (DFDC) Kaggle competition was
arranged by them. There are 128,514 videos in DFDC, around 3,426 actors helped to create 100,000.
MICC: F220 , F2000 , F600 [61]Images from the MICC-F600, MICCF2000, and MICC-F220 datasets are
used to spot copy-move modifications. In MICC-F220, there are 110 altered and 110 original photographs;
there are 700 tampered and 1300 authentic images in MICC-F2000. Moreover, in MICC-F600, there are 160
modified and 440 raw images.
Table 2: Details of publicly available data set for Deepfake

Table 2

Year Dataset Original Fake Details of


Dataset

Image Videos Image Videos

2011 MICC [61] 1300 700 Used for


image copy-
move
tampering
detection.

2015 WWD [62] 13,500 92 variations


of forgery, 82
instances of
forgery,

and 101
different splice
detection
masks.

17
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

2015 CelebA [47] 202,000 This dataset of


photos
includes a
wide range of

poses and
cluttered
backgrounds.

2017 VISION 34,400 1914 A video and


[63] image source
identification

application-
based dataset
(35 portable
devices

of 11 major
brands).

2018 UADFV 17,300 49 17,300 49 The FakeAPP


[48] is used to
create the
Deepfake

videos; it is
simple to use
and only
contains two

classes: real
and fake.

18
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

2018 DF-TIMIT 34,000 320 68,000 640 Low-


[49] quality(LQ)
and high-
quality
DFTIMIT(

HQ), obtained
using a
Faceswap
GAN

model.

2019 FF [64] 500,000 1004 521,400 Two ways to


generate
Deepfake:
Face2Face,
and

self-
reenactment.

2019 FF++ [52] 509,900 1,000 509,000 4000 Face2Face and


FaceSwap are
two graphics-
based

methods, and
two learning-
based methods
(Deepfakes

and Neural
Textures).

2019 DFFD [57] 58,700 1,000 240,300 3,000 Several forgery


types are
combined in
the DFFD

dataset.

19
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

2019 DFD [51] 315,400 363 2,242.7k 3,068 Google


contributed
with the FF++
Dataset;
additionally

actors, to
create
maipulated
videos

2019 DFDC-P 488,400 1,131 1,783,300 4,113 Featuring two


[50] facial
modification
algorithm

2020 DFDC [60] 23k 104k To expand the


DFDC-P, 8
facial
modification
algorithms

have been
applied.

2020 Celeb-DF 225,400 590 2,116,800 5,639 59 celebrity


[53] YouTube
videos were
created
utilising

an upgraded
Deepfake
synthesis
technique.

2020 DF-1.0 [54] 12.6M 50,000 5.0M 10,000 A large-scale


dataset for
real-world face
forgery

detection.

20
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

2020 WDF [55] 11.8M 7,314 707 A modest


dataset that
examines the
real-world

dataset’s
ability to
recognise
Deepfakes.

2022 GBDF [59] 7500 2500 It consist of


10,000 videos
with 5000 each
for males

and females
with 1:4 real to
fake ratio.

2021 OF [56] 16,000 173,000 Huge


challenging
dataset for
segmentation
in the

wild and multi-


face forgery
detection.

WWD The Wild Web Dataset (WWD) [62]contains 101 distinct mask splice detections and 82 instances of
92 forging variations. The WWD aims to close a gap in the evaluation of localization techniques for image
alteration.
VISION A VISION dataset that contains 648 raw videos and 11,732 original images. [63] The pictures were
posted on social websites such as WhatsApp and Facebook, while the videos were distributed on WhatsApp
and YouTube, generating 1,914 videos and 34,427 images.
FaceFornesics (FF) [64] dataset is based on DeepFake aims to carry out forensic operations on manipulated
images, such as segmentation and facial recognition. Over 500,000 frames from 1004 (facial videos taken
from YouTube) are included.

21
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

5 Deepfake Detection
5.1 Fake Image Detection

Spatial based Detection


Afchar et al.[65] trained CNN classifiers by deepfake online video and Face2Face dataset. Two types of
Meso-4, Mesoscopic and MesoInception-4, have been presented in it. In detection of fake videos the accuracy
was 98 percent and 95 percent achieved respectively. Nataraj et al.[66] purposed a co-occurrence matrices on
the bases of more than 56,000 RGB images, generated through cycleGAN and StarGAN, and achieved 99.71
percent and 99.37 percent accuracy respectively. Another method based on inter-frame and temporal
inconsistencies used by Güera et al.[67] through CNN and LSTM models, trained by a collection of 600 videos
obtained from multiple websites. The highest accuracy of the model is greater than 97 percent. Nguyen et al.
[68] used VGG-19, Capsule networks on Deepfake online videos, FF, replay-attack datasets and achieved
accuracy 96.52 percent, 94.47 percent and 99.13 percent respectively moreover, presented an FDFtNet, a novel
detector to increase the capacity of current CNN models like SqueezeNet, ShallowNetV3, ResNetV2, and
Xception. FDFtNet achieves an overall accuracy of 90.29 percent in detecting fake images generated from the
GANs-based dataset, outperforming the state-of-the-art. Jeon et al.[69] used PGGAN, DF, and FF datasets, and
obtained 90.29 percent, 97.02 percent, 96.67 percent accuracy respectively. Using universal texture data,
researchers enhanced the validity and adaptability capacities of existing CNNs in identifying generated
fraudulent faces. Fung et al.[70] used FF++, UADFV and Celeb-DF datasets with Xception network, SVM,
and Bayes classifier technique and found high accuracy 99.7 percent, 96.8 percent, 90.5 percent. Li and Lyu.
[71] embraced a model based on deep learning capable of distinguishing between Deepfake and real videos.
Furthermore, VGG-16, ResNet with UADFV, and DF-TIMIT datasets were prepared using the Face-warping
artifacts feature. It gained 97.4 percent and 93.2 percent accuracy. Using CelebA, LSUN bedroom scene
datasets with Image fingerprint feature, Yu et al.[72] achieved 99.43 percent, 98.58 percent accuracy.

Fine-tuning is a powerful strategy to protect the DNN model from adverse disruptions in fingerprint pictures.
According to Chai et al. [73], duplicative artifacts were assessed from local patches in order to find the fake
face. This theory has been tried using various available techniques, such as CNN [74], Xception [75],
MesoInception4 [65], and Resnet18 [76] having p values 0.5 and 0.1 using CelebA-HQ dataset.

Matern et al. [77] adopted visual features and fake detection methods founded on elementary visual elements,
like missing reflections, eye color, and missing details in the dental and eye regions. He included two distinct
classifiers in this case study: 1st is the Logistic regression model, and the 2nd is Multilayer Perceptron (MLP).
The own dataset was used to train the model, and 85 percent accuracy was achieved. Yang et al. [78] presented
Head Pose Features for detecting the Deepfake efficiently. SVM classifier trained by Own dataset like:
UADFV, Deepfake TIMIT (LQ) Deepfake TIMIT (HQ), FF++ / DFD, DFDC Preview, Celeb-DF, used for
detection and result obtained upto 89 percent.

22
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Bharati et al. [79]proposed a novel supervised deep Boltzmann machine algorithm.The proposed approach for
classifying images as original or retouched yields an accuracy of over 99 percent on three makeup data
sets[17]. Rossler¨ et al.[52] came up with the Mesoscopic, Steganalysis, and Deep Learning Features. The
detection system was totally based on a convolutional neural network classifier that was trained on FF++
(Face2Face, RAW), FF++ (NeuralTextures, RAW) datasets.. The maximum accuracy was approximately 99 to
100 percent.

Table 3: Spatial based techniques used for deepfake detection

Table 3

Year Study Methods Techniques Accuracy Dataset

2022 [80] Hybrid Image Dual CNN 98.57% FaceForensics++


Transformer

2018 [65] Meso-4, CNN 95% & 98% Deepfake online


MesoInception-4 videos, FF.

2019 [66] Co-occurrence DNNs 99.71% &99.37% Own(CycleGAN


matrices and StarGAN)
datasets

2018 [67] inter-frame and CNN, LSTM 97% A bundle of 600


temporal videos gathered
inconsistencies from various
websites.

2019 [68] Capsule-forensics VGG-19, Capsule 94.47% & 99.13% Deepfake online
networks videos, FF,
REPLAYATTACK
database

2020 [69] Fine-Tune and SqueezeNet, 90.29%, 97.02%, PGGAN, DF, FF


transformer ShallowNet,ResNe 96.67%
t, Xception

2021 [70] Unsupervised Xception network, 99.7%, 96.8%, FF++, UADFV


Contrastive SVM, and Bayes 90.5% and Celeb-DF.
Learning classifier

2018 [71] Face-warping VGG-16, ResNet 97.4%, 93.2% UADFV and DF-
artifacts TIMIT

23
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

2019 [72] Image fingerprint DNN 99.43%, 98.58% CelebA, LSUN


bedroom scene
dataset

2020 [73] Patch-based Resnet-18, 100% CelebA-HQ


classification Xception,
MesoInception4

2019 [77] Visual Features Logistic 85% Customized dataset


Regression, MLP (DFDC Preview,
DeepfakeTIMIT
LQ/HQ, Celeb-DF
, UADFV,
FF++/DFD )

2019 [78] Head Pose SVM 89% Customized dataset


Features (DFDC Preview,
DeepfakeTIMIT
LQ/HQ, Celeb-DF
, UADFV,
FF++/DFD )

2019 [52] Mesoscopic CNN 100%, 99% FF++ (Face2Face,


,Steganalysis and RAW), FF++
Deep Learning (NeuralTextures,
Features RAW)

2020 [56] Deep Learning CNN + Attention 99% FF++ (Face2Face,


Features Mechanism -)

Khan et al. [80] proposed hybrid image transformation using XceptionNet and EfficientNet-B4. The
Faceforeincis ++ dataset was used to train the model and achieved 98.57 percent accuracy.

Frequency based Detection


In 2019, a generic fake face photodetection technique was presented by Xuan et al. [81]. DCGAN,
WGAN-GP, PGGAN trained by CelebA-HQ dataset. The primary goal was to explicitly include a
preprocessing stage in the training process for removing petty instability artifacts in GAN images, which drives
the forensics classifier to concentrate on higher intrinsic forensic indications for recognizing GAN-based
pictures. The highest accuracy was 95.45 percent. Another method based on Temporal discrepancies was
created by Sabir et al.[82]. CNN and RNN models were trained by FF++ dataset to detect the Deepfake. In this
analysis, only low-quality videos were included and 96.9 percent accuracy was gained. Jeon et al. [83] came up

24
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

with a self-training method and trained EfficientNet and ResNext architectures by their own (TPGGAN and
StyleGAN) based dataset, which efficiently detects Deepfake images with the accuracy of 98.49 percent.

Best results have been recently addressed by Guarnera et al.[84] presented a fake detection model. This model
is based on the research of convolutional imprints. Expectation Maximization Algorithm [85] was used to
extract features. K-NN, SVM, and LDA classifiers were trained by Own(AttGAN, GDWCT,StarGAN,
StyleGAN, StyleGAN2) dataset for best detection. Final result was 99.81 percent accurate.

Nirkin et al. [86] identified the distinctions between real faces and their setting to identify unreal faces. The
first network is prepared to determine an individual’s face with FF++, Celeb-DF, DFDC datasets, whereas the
second context recognition network considers the context of the person’s face. 99.7 percent, 66.0 percent, 75
percent accuracy was achieved using.

McCloskey et al. [87] presented a detection approach subject to color components and a Linear Support Vector
Machine (SVM). It was proposed for the GAN-Pipeline Features technique, having a detection method based
on color characteristics and a linear SVM for the conclusive categorization with NIST MFC2018 dataset. The
writers assumed that the color differs between fake synthesis pictures and real camera photographs. While
evaluating 70.0 percent accuracy was achieved. Marra et al. [88] adopted unique Deep Learning Features for
the best detection. Authors suggested a multi-task incremental learning detection method for detecting and
classifying new types of GAN generated images while maintaining past performance. Own (CycleGAN,
ProGAN,Glow, StarGAN, StyleGAN) dataset was used for model training and achieved 99.3 percent accuracy.

Dang et al. [57] comprehensively analyzed different facial manipulations. The authors suggested using
attention mechanisms and famous CNN models such as Xception- Net and VGG16 using a dataset based on
DFFD (ProGAN, StyleGAN).Final accuracy was 100 percent. Hulzebosch et al. [89] adopted Deep Learning
Features and considered different scenarios like cross-data, cross-model, and Post-processing. Fake detection
techniques were based entirely on the renowned Xception network along with ForensicTransfer [90], which is
basically an Autoencoder method. CNN, AE classifiers were trained by their own (StarGAN, Glow, ProGAN,
StyleGAN) based dataset to detect fake images and got 99.8 percent accuracy, which is best performance.

A deep learning technique based on Restricted Boltzmann Machine (RBM) was proposed by Bharati et al. [79]
in order to identify digital editing of face pictures. As a concerned database, the writers designed two fake
databases from the actual ND-IIITD database [91] and a group of celebrity face photos that were downloaded
from the internet. Their approach achieved 96.20 percent accuracy respectively.

Zhang et al.[92] presented a spectrum domain component to witness faux images. It was trained on its own
(StarGAN/CycleGAN) dataset. Regarding the classifier, AutoGAN was put forward; it’s a GAN simulator
capable of making GAN artifacts in any photo without using an already trained GAN model. Ultimately, the
accuracy obtained was 100 percent.

25
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Table 4: Frequency based techniques used for deepfake detection

Table 4
Year Study Methods Techniques Accuracy Dataset

2019 [81] Preprocessing DCGAN, WGAN- 95.45% CelebA-HQ


combined with GP, PG-GAN
deep network

2019 [82] Temporal CNN and RNN 96.9, 94.35, 96.3% FF++
discrepancies

2020 [83] Self-training EfficientNet and 98.49% Customized dataset


ResNext (StyleGAN and
TPGGAN)

2021 [86] Discrepancies Xception networks 99.7%, 66.0%, FF++, Celeb-DF,


between two 75% DFDC
regions

2018 [87] GAN-Pipeline SVM 70.00% NIST MFC2018

2020 [89] GAN-Pipeline k-NN, SVM, LDA 99.81% Own(AttGAN,


GDWCT,StarGAN
, StyleGAN,
StyleGAN2)

2019 [88] Deep Learning CNN + 99.30% Own(CycleGAN,


Features Incremental ProGAN,Glow,
Learning StarGAN,
StyleGAN)

2020 [57] Deep Learning CNN + Attention 100.00% DFFD (ProGAN,


Features Mechanism StyleGAN)

2020 [89] Deep Learning CNN, AE 99.80% Own(StarGAN,Pro


GAN, StyleGAN)

2016 [79] Deep Learning RBM 96.20% Own(Celebrity


Features (Face Retouching, ND-
Patches) IIITD Retouching)

26
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

2019 [92] Spectrum Domain GAN 100.00% Own


Features Discriminator StarGAN/CycleG
AN based

5.2 Fake Video Detection


Lip Forensics is a method for detecting face forgeries in videos that is generalizable and robust. Haliassos et al.
[93] used Semantic irregularities methods and used FaceForensics++, DF-1.0, Celeb-DF, DFDC datasets with
the ResNet-18 technique and obtained the accuracy of 99.7 percent, 82.4 percent, 73.5 percent respectively.
Tariq et al.[94] used FF++, DFD and Deepfake Wild videos datasets on Convolutional LSTM-based Residual
Network (CLRNet) and got 97.50 percent, 97.13 percent, 97.23 percent accuracy. The fundamental concept is
to use a convolutional LSTM-based residual network (CLRNet) with a unique training technique to track the
spatial and temporal information in Deepfakes.

In [95] Deepfake detection, a blend of stationary biometrics on facial recognition and secular behavioral
biometrics on facial sentiments and head motions was presented. In addition, their focus was on features,
appearance and behavior so they used ResNet-101, VGG technique utilizing DFD, Celeb-DF, DFDC, and
FF++ datasets and achieved 97.7 percent, 98.9 percent, 93.2 percent, 96.45 percent accuracy. Agarwal and
Farid [96] proposed a detection system based on both facial expressions and head movements, using an SVM
(Support Vector Machine) classifier trained over the Own (FaceSwap, HQ) dataset to detect the fake videos.
The accuracy was achieved 96.3

Mittal et al.[97] utilized the Siamese network to simulate the visuals and audio in movies as well as trained the
Siamese network with DF-TIMIT and DFDC datasets and gained 94.9 percent, 84.4 percent accuracy.
Furthermore, Chugh et al. [98]also used DFDC and DF-TIMIT datasets with MDS(Modality Dissonance
Score) network technique, to obtain 91.54 percent and 94.7 percent accuracy. Evolutionary divergence results
are computed between audiovisual segments across one-second video gaps, and the MDS is ascertained after
aggregating all parts.

Agarwal et al.[42] Proposing an approach for detecting fraudulent videos based on irregularities in the
dynamics of the mouth shape (visemes) and the prominent phonetic even more Phoneme-viseme mismatches
technique was used through CNN along with A2V, T2V datasets OpenFace2 toolkit OpenFace 2.0: Facial
Behavior Analysis Toolkit [99] was considered for feature extraction. and gained 96.9 percent, 71.1 percent,
accuracy. Qi et al.[100] adopted the Heartbeat rhythms method to highlight cardiac rhythm signals; the
scientists built a motion-magnified spatial-temporal representation (MMSTR) for the video. After using
DeepRhythm technique with FF++, DFDC-P datasets, they obtain 99.7 percent, 74.5 percent results.

Table 5: List of techniques and methods used for fake video detection

Table 5

27
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Year Study Methods Techniques Accuracy Dataset

2021 [93] Semantic ResNet-18 99.7%, 82.4%, FF++, DFDC,


irregularities 73.5% Celeb-DF, DF-1.0

2021 [94] Spatial and CLRNet 97.50%, 97.13%, DFD, FF++,


temporal 97.23% Deepfake -in-the-
information Wild videos

2018 [101] Eye blinking CNN, LRCN 98%, 99% For LRCN EBV
dataset and For
CNN CEW
Dataset

2020 [95] Using appearance ResNet-101, VGG 97.7%, 98.9%, The world leaders
and behavior 93.2% dataset along
DFDC, DFD,
FF++ and Celeb-
DF.

2020 [97] Using emotion Siamese network 94.9%, 84.4% DF-TIMIT and
audio-visual architecture DFDC.
affective

2020 [102] Phoneme-viseme CNN 96.9%, 71.1% A2V, T2V


mismatches 80.7%

2020 [98] Modality MDS network 91.54%, 94.7% DFDC, DF-TIMIT


Dissonance Score

2020 [100] Heartbeat rhythms DeepRhythm 99.7%, 74.5% FF++, DFDC-P

2021 [103] Consistency of 3 dense layers 93.28%, 80.00% FF++, CelebDF


eyes and gazes network
architecture

2019 [96] Head Pose and SVM 96.30% Own (FaceSwap,


Facial Features HQ)

2020 [104] Eye Blinking Distance 87.50% Own

28
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

2018 [105] Deep Learning CNN, SVM 85.10% Customized dataset


Features+ (DFDC Preview,
Steganalysis DeepfakeTIMIT
Features LQ/HQ, Celeb-DF
, UADFV,
FF++/DFD )

2019 [106] Deep Learning AE + Multi-Task 76.30% FF++ and DFD


Features Learning

2019 [79] Deep Learning CNN + AM 99.40% DFFD


Features

2020 [57] Facial Regions CNN 100%, 99.4%, DFDC Preview,


Features 91%, 83.6% UADFV, FF++
(HQ, FaceSwap),
Celeb-DF

In [101] three steps were involved in the composition of the LRCM model: (i) Characteristic extraction from
the eye series using VGG16, (ii) Sequence comprehension through LSTM, and (iii) State prediction, which
yields the probability of eye open and closed conditions on the basis of the result of LSTM. Li et al. [101] used
CEW Dataset and Eye blinking method for EBV and CNN dataset for the LRCN system and attained 98
percent and 99 percent accuracy. Furthermore, Jung et al. [104] developed a system known as DeepVision to
examine fluctuations in blinking patterns. Fast-HyperFace and Eye-Aspect-Ratio (EAR) were used together to
find the face and figure out the eye aspect ratio. Finally features based on the number of blinks and the time of
the blinks were encountered to determine whether the video was fake or real. This technique had an absolute
accuracy rate of 87.5 percent over its own dataset.

Zhou et al. [105] proposed Deep Learning Features and Steganalysis Features methods with convolutional
neural network and SVM classifier using UADFV dataset and a final accuracy rate of 85.1 percent was
obtained.

Nguyen et al. [106] presented a Capsule Networks-based fake detection model. This system had relatively
fewer parameters than a standard CNN; however, it had the same outcomes. A detection system that used an
auto-encoder was considered. Final accuracy was 76.3 percent. Nguyen et al. [68] proposed a new fake
detection system based on Capsule Networks. This method had fewer parameters than a traditional CNN with
the same results. The proposed detection system was tested with the FaceForensics++ dataset, which had
accuracy rates of more than 96 percent. Dang et al. [57] came up with the convolutional neural network and
Attention Mechanism classifier using the DFFD dataset. In the case of detecting identity swaps, the proposed
method had an accuracy of 99.43 percent.

29
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Guerra et al . [67] proposed video temporal features that were capable of telling if a frame is manipulated or
not with a combination of CNNs and RNNs model.The authors utilized InceptionV3 which had been pre-
trained with the ImageNet dataset for the CNN and LSTM model with one hidden layer and 2048 memory
blocks that could be used for the RNN system. The accuracy was 97.1 percent. Tolosana et al. [107] came up
with the Facial Regions Features method. The author considered a fake detection system based on a CNN
classifier trained on DFDC Preview, Celeb-DF datasets. The result obtained was 91.0 percent, 83.6 percent
respectively.

5.3 Methodologies for Deepfake Detection


In this part, we will look at various approaches to Deepfake detection. The PRNU (Photo Response Non
Uniformity) pattern of a digital image, is a noise pattern created by small factory defects in the light-sensitive
sensors of a digital camera. This noise pattern is highly individualized and often referred to as the fingerprint of
the digital image. PRNU analysis considers a method of interest because it expects that manipulating the facial
area will affect the local PRNU pattern in the video frames [108].

Face tempting videos are also automatically detected using Meso-4 and Misconception-4 [19], both the
networks used less number of layers for data. As a result of this study, an accuracy of 98 percent and 95
percent have been achieved.

Nataraj et al. [66] used a steg analysis method to find fake images. The co-occurrence matrices methods were
made from RGB images, and the values were trained with a deep convolutional neural network to differentiate
the real ones from the fake ones. RGB images were used to make the co-occurrence matrices methods.
Moreover, the values were prepared with a deep convolutional neural network (CNN) to distinguish the real
from the fake ones. In [57] StarGAN and CycleGAN-based forged photos could be classified, achieving 99
percent valid results. Li et al. [109] have looked at the statistical properties of deep network-generated pictures,
like the connection between bordering pixels in HSV and YCbCr color spaces, in order to see if the difference
between real and fake images can be detected using DNNs with the linear discriminative technique using
dataset LFW, LSUN, FFHQ, CelebA, FFHQ.

There is another generalizable and robust technique lip forensics which is used for detecting face forgeries
Haliassos et al. [93] propose to utilize the high-level semantic irregularities using face forensics ++, DF-1.0,
Celeb-DF, DFDC datasets with the ResNet-18 model and obtained this accuracy 99.7 percent, 82.4 percent,
73.5 percent.

Lugstein et al. [110] came up with a new way to tell whether data is a Deepfake or not, based on how different
the photo response no uniformity (PRNU) is. So, the PRNU and SVM technique is well-known because facial
retouching and face morphing attacks can be detected.

30
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Guerra et al. [67]provided CNN and LSTM detection method which is based on face-swapping. In this method,
InceptionV3 which is CNN is used to gather frame-level features, those features passed to LSTM to build a
sequence descriptor for classification and 97.1 percent accuracy was achieved.

Fig. 8. List of models used for deepfake detection in current study

Nguyen et al.[68] operated VGG-19 along with Capsule networks in order to recognize fake images and
videos, including computer-generated image detection and replay attack detection using these techniques with
dataset Deepfake online videos and FF and got the accuracy of 94.47 percent, 99.13 percent.

Xuan et al.[81] used Preprocessing combined with a deep network method with the DCGAN, WGAN-GP, and
PGGAN techniques. The intention was to dismiss low-level, inconsistent artifacts of GAN visuals to make the
forensics classifier prioritize higher innate forensic indicators so that GAN-based images could be detected The
95.45 percent accuracy was achieved . Sabir et al. [82] created a method based on Temporal discrepancies,
using CNN and RNN technique on the video to recognize, trim, and align faces in a video. The model used
micro-, meso-, and macroscopic features for detection. These landmark-based components produce facial
alignment with recurring directional DenseNet that excels at spotting face-manipulated videos. With CNN and
RNN models trained on FF++ dataset,accuracy was gained 96.9 percent, 94.35 percent, 96.3 percent. Jeon et
al. [69] created CNN models like Squeeze Net, ShallowNetV3, ResNet2, and Xception more powerful. The
fine-tuning technique is employed to get the characteristics out of MBblockV3 whereas PGGAN, DF, FF

31
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

datasets are used for training . This approach is a fine-tuning transformation. It obtained 90.29 percent, 97.02
percent, 96.67 percent accuracy. Jeon et al. [83] ddevised a GAN-image detection framework (T-GD) to
assemble a model trained on TPGGAN and StyleGAN- based datasets that quickly and accurately detects
Deepfake images using Efficient Net and ResNext techniques. The model is based on the relationship between
the teacher and the students, improving the detection performance. It obtained 87.80 percent and 98.49 percent
accuracy.

Hsu et al. [111] developed a couple-wise learning model capable of telling if a GAN-generated fake image is
real. He merged an enhanced model of the DenseNet backbone network with an entirely different network
called the Siamese network to make a unique model. That model is known as Common Fake Feature Network
(CFFN).

Jain et al.[112] used an adversarial perturbation method to improve the performance of existing Deepfake
models.DIP - deep image prior - and Lipschitz regularization are used to improve the validity of CNN-based
deepfake sensors. DIP defence achieved 98 percent accuracy. Wu et al. [113] developed a system called
SSTNet that integrates spatial, steganography, and characteristic extraction techniques to determine Deepfakes.
In this method, XceptionNet monitors the image’s spatial features and statistics. Also, to mine temporal
characteristics, RNN is used. This extracted data is applied for binary classification for identifying Deepfakes.
Global texture data can be analyzed to improve the existing CNNs by making them more robust and
generalized when they are used to identify fake faces. The gram-Net technique evaluates JPEG compression,
downsampling, noise, and blur. Gram-Net has proven to work with a wide range of GANs. Conclusively, using
the Analyzing global image texture method, 95.51 percent and 90 percent accuracy was achieved with the help
of the ResNet model technique by usingCelebA-HQ and FFHQ images datasets[114]

To differentiate between real and fake images, Khalid et al.[115] created the OC-FakeDect technique that
utilizes a one-class Variation Auto Encoder (VAE) and solely trains on real face photos.

32
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Fig. 9. Generic steps used in deepfake detection

Fung et al. [70] came up with an unsupervised way to tell when someone’s face is changed. It is a new way to
detect unlabeled data to see if someone has changed their face. When a face image creates an image, it goes
through two different transformations and into two different subnetworks, Xception and projection head
network. When they used two different transformations, an FF++ and UADFV, as well as Celeb-DF and Celeb-
DF datasets, were performed with the Xception network, SVM, and Bayes techniques; it appeared that they all
had high accuracy.

Tariq et al. [94] came up with a way to find many different types of Deepfakes. To analyze the spatial and
secular data in Deepfakes, they use a convolutional LSTM-based residual network (CLRNet) approach
qualified on trained DFD, FF++, and Deepfake-in-the-Wild video datasets. Also, the model tested on
Deepfakes that had never been seen before, it got 97.50 percent, 97.13 percent, and 97.23 percent accuracy.

In less than a minute, this app can alter faces. For this purpose, the essential components of the face image are
determined and indicated as descriptors. Because every key point is distinctive, one must use a second
clustering procedure to make the codebook per picture.[116].

Using a CNN and an RNN, Li et al. [101] devised a method based on eye blinking for detecting fake faces in
videos. With VGG16, a type of RNN, features were extracted from the eye sequence using an LRCN model.

33
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Sequence learning will be done using LSTM, predicting whether the eye is open or closed. CEW and LRCN
datasets achieved 98 percent and 99 percent accuracy. Similarly, a unique deep learning-based model was
introduced that was capable of distinguishing real videos from fake ones. The model employs the warping
action that is used to create Deepfake. There is a gap in the resolution between the warped face area and the
rest of the image, creating artefacts. First, CNN is prepared to identify faces. Then, it attempts to find
landmarks and calculates transformed matrices to ensure the faces are in the right place. UADFV and DF-
TIMIT datasets were used with VGG-16, ResNet technique and 97.4 percent, 93.2 percent accuracy was faced.

Agarwal et al.[96] proposed a detection system based on Head Pose and Facial Features method. For the
classification, SVM was used.

McCloskey et al. [87] designed a GAN generator’s architecture, making it easier to find visual flaws in
Deepfake images. in Deepfake visuals. The generator’s normalizing techniques are adopted, which reduce the
total pixels that are overexposed or underexposed. A detection system subject to color attributes and a linear
Support Vector Machine (SVM) for the ultimate classification gained 70.0 percent accuracy. Yu et al. [72]
looked at GAN fingerprints for image attribution and used them to tell whether images were real or made by
GANs. If the model is trained with slight modifications in the dataset, its fingerprint would be distinct, making
it easier to identify the model. Fine-tuning is also a good way to protect the DNN model from changes in
fingerprint images that aren’t good for it. CelebA, LSUN bedroom scene datasets with Image fingerprint
feature achieved 99.43 percent, 98.58 percent accuracy. Yu et al. [78] created GAN fingerprints to attribute
images with SVM classification using Deepfake TIMIT (LQ) Deepfake TIMIT (HQ), UADFV, Celeb-DF ,
FF++ / DFD, DFDC Preview datasets found 89.0 percent, 55.1 percent, 53.2 percent, 47.3 percent, 55.9
percent, 54.6 percent results.

Matern et al. [77] utilized Deepfake and face manipulation tools based on visual characteristics such as teeth,
eyes, and facial features. The method was proposed by adopting the Visual Features method and Fake detection
methods based on very simple visual characteristics such as eye colour, missing reflections, and missing details
in the eye and dental areas. Deepfake TIMIT (LQ) Deepfake TIMIT (HQ), UADFV, Celeb-DF , FF++ / DFD,
DFDC Preview datasets used for the training and 70.2 percent, 77.0 percent, 77.3 percent, 78.0 percent, 66.2
percent, 55.1 percent accuracy achieved in the testing.

Fernandez et al. [117] used a heartbeat rhythm and designed a Deep Rhythm model that could tell if a video
was fake. For the purpose of showing heart rhythm signals, A motion-magnified spatial-temporal
representation (MMSTR) model was used to make the video. As a result, a dual-spatial-temporal attentional
network was produced to catch if a video was fake.

There are many ways to find digital fingerprints in images that can be learned and used to tell if they were
made by a GAN or not[95]. CNN’s are used to examine pixel-level artefacts when a face region moves onto a
target. A convolutional neural network with only a few layers is called MesoNet. It can use to build a model of

34
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

a microscopic object. L. Chai et al. [73] used redundant artefacts and a Patch-based classification method that
can evaluate local patches. This hypothesis was assessed on the FFHQ and CelebA-HQ datasets, using
recognized models like MesoInception4, Xception, Resnet-18, and CNN, having p values 0.5 and 0.1,
respectively. The datasets of CelebA-HQ got 99.97 percent accuracy.

A Siamese network imitates the visuals and sounds in movies, combined with a blend of the two triplet loss
functions to confine alikeness. Mittal et al.[97] trained Siamese network architecture technique with the DF-
TIMIT and DFDC datasets and gained 94.9 percent and 84.4 percent accuracy .

Agarwal et al. [102] used the Phoneme-viseme mismatches technique trained on A2V, T2V datasets and
proposed an approach for detecting fraudulent videos based on irregularities in the dynamics of the mouth
shape-viseme mismatches) and the prominent phonetic. Some phonemes need the lips to be completely closed
to be spoken correctly: A few of them are papa, mama, and baba. It gained 96.9 percent, 71.1 percent, 80.7
percent accuracy.

Deepfake videos can be detected using a method proposed Chugh et al. [98] . After the calculation of disparity
scores across audiovisual segments over one second video chunk, the MDS was determined by using
aggregation to all parts and taking the average.

Guarneraet. al.[84] represented a way to tell if an image is fake by looking for forensic traces that are hiddenin
pictures. The EM algorithm - expectation maximization - is utilized to discover a bunch of local characteristics
for modeling the underlying convolutional productive procedure. StarGAN, GDWCT, StyleGAN, StyleGAN2,
and ATTGAN made Deepfakes. CelebA along with LFW datasets obtained 90.22 percent accuracy.

To find fake videos the ABC metric shown in [118] . To take into account the baseline input, the attribution
methods employ Shapley values. ResNet50 trained on VGGFace2 represents the original class label, while P
represents the class label that is most likely to occur.

Qi et al. [100] used the Heartbeat rhythms method to highlight cardiac rhythm signals; the scientists created an
MMSTR for the video—a motion-magnified spatial-temporal representation.. After training FF++, and DFDC-
P datasets using the Deep Rhythm technique, we obtained 99.7 percent and 74.5 percent accuracy. Hu et al.
[119] examined the distinction between the pair of eyes while detecting Deepfake facial photos and presented a
detection standard that exploits the physical boundaries in GAN-based pictures. Then it estimates the
difference between both eyes to reveal whether a picture is real or fake. Demir et al. [103] ddevised a system to
identify Deepfakes by examining the gaze in videos and looking for the consistency of eyes and gazes’ method
to analyze OR detect the video. Using three dense layers of network architecture technique with face-
forensics++, CelebDF datasets, 93.28 percent, 80.00 percent accuracy were obtained.

The contrast between faces and their surroundings was exploited by Nirkin et al. [86] to identify false faces. He
trained two networks: one to identify a person’s face and the other to account for certain things, like hair, ears,

35
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

and neck. The researchers found that discrepancies detected by comparing these two networks.

6 Conclusion
The study on Deepfake production and Deepfake detection has been comprehensively reviewed and analyzed
in this survey, which hopefully will be useful to anyone interested in the subject. A taxonomy of diverse
Deepfake evolution methods and the categorization of various Deepfake detection practices, has been offered
among the most significant aspects of the technological growth of the approaches. It explains the fundamentals,
advantages, and risks related to Deepfake , GAN-based Deepfake applications. Additionally, models for
detecting Deepfake are addressed. In the investigations, the FF++ dataset accounts for the majority of the data
points. Deep learning models (mostly CNNs) provide a large proportion of the total number of models. Deep
learning-based techniques are commonly applied to detect Deepfake, as the methods are becoming increasingly
popular. Using Deepfake as an experiment, the conclusion indicates that learning methods are useful in
identifying Deepfake. Deepfake detection still confronts several hurdles, despite the tremendous advancements
in essential multimedia technology and the development of tools and applications in recent years. This survey
paper is intended to help scholars and practitioners in the respective field identify the most important research
topics., and in luring more scholars to participate in this emerging and constantly expanding subject.

References
1.
Y. Mirsky and W. Lee, “The Creation and Detection of Deepfakes,” ACM Computing

Surveys, vol. 54, no. 1, pp. 1–41, 2021. ↩


2. Juefei-Xu, F., Wang, R., Huang, Y., Guo, Q., Ma, L., & Liu, Y. (2021). Countering Malicious DeepFakes:
Survey, Battleground, and Horizon. Retrieved from http://arxiv.org/abs/2103.00218 ↩
3. Malik, A., Kuribayashi, M., Member, S., Abdullahi, S. M., & Khan, A. N. (2022). DeepFake Detection
for Human Face Images and Videos: A Survey. IEEE Access, 10, 18757–18775. ↩
4. Peng, B., Xiang, W., Jiang, Y., Wang, W., Dong, J., Sun, Z., … Lyu, S. (2022). DFGC 2022: The Second
DeepFake Game Competition. Retrieved from http://arxiv.org/abs/2206.15138 ↩
5. Kitchenham, B., & Charters, S. M. (2007). Guidelines for performing Systematic Literature Reviews in
Software Engineering Guidelines for performing Systematic Literature Reviews in Software Engineering
EBSE Technical Report EBSE-2007-01 Software Engineering Group School of Computer Science and
Mathematics Keele University Keele , Staffs Department of Computer Science University of Durham.
(October 2021). ↩

36
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

6. Do, H., Elbaum, S., & Rothermel, G. (2005). Supporting controlled experimentation with testing
techniques: An infrastructure and its potential impact. Empirical Software Engineering, 10(4), 405–435. ↩
7. Siddiqui Muhammad Yasir, H. A. (2022). Deep learning-based 3D instance and semantic segmentation: A
review. Journal on Artificial Intelligence, 4(2), 99–114. Retrieved from
http://www.techscience.com/jai/v4n2/48857 ↩
8. Mahdizadehaghdam, S., Panahi, A., & Krim, H. (2019). Sparse generative adversarial network.
Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019, 3063–3071.
Retrieved from https://arxiv.org/abs/1908.08930 ↩
9. Hui, J. (n.d.). GAN — Why it is so hard to train Generative Adversarial Networks! https://jonathan-
hui.medium.com/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b. Retrieved
from https://jonathan-hui.medium.com/gan-why-it-is-so-hard-to-train-generative-advisory-networks-
819a86b3750b ↩
10. ZAO Application. (n.d.). https://apps.apple.com/cn/app/id1465199127. Retrieved from
https://apps.apple.com/cn/app/id1465199127 ↩
11. Auto Faceswap Application. (n.d.). https://www.microsoft.com/en-us/p/auto-face-swap/9nblggh3m5nq.
Retrieved from https://www.microsoft.com/en-us/p/auto-face-swap/9nblggh3m5nq ↩
12. FaceApp. (n.d.). https://apps.apple.com/gb/app/faceapp-ai-face-editor/id1180884341. Retrieved from
https://apps.apple.com/gb/app/faceapp-ai-face-editor/id1180884341 ↩
13. Karras, T., Laine, S., & Aila, T. (2021). A Style-Based Generator Architecture for Generative
Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4217–
4228. Retrieved from https://arxiv.org/abs/1812.04948 ↩
14. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving
the image quality of stylegan. Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, 8107–8116. Retrieved from https://arxiv.org/abs/1912.04958 ↩
15. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2020). Training generative
adversarial networks with limited data. Advances in Neural Information Processing Systems, 2020-Decem.
Retrieved from https://arxiv.org/abs/2006.06676 ↩
16. Brock, A., Donahue, J., & Simonyan, K. (2019). Large scale GaN training for high fidelity natural image
synthesis. 7th International Conference on Learning Representations, ICLR 2019, 1–35. Retrieved from
https://arxiv.org/abs/1809.11096 ↩

37
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

17. Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). StarGAN: Unified Generative
Adversarial Networks for Multi-domain Image-to-Image Translation. Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, 8789–8797. Retrieved from
https://arxiv.org/abs/1711.09020 ↩
18. The biggest threat of deepfakes isn’t the deepfakes themselves. (n.d.).
https://www.technologyreview.com/2019/10/10/132667/the-biggest-threat-of-deepfakes-isnt-the-deepfakes-
themselves/. Retrieved from https://www.technologyreview.com/2019/10/10/132667/the-biggest-threat-of-
deepfakes-isnt-the-deepfakes-themselves/ ↩
19. Nguyen, T. T., Nguyen, C. M., Nguyen, D. T., Nguyen, D. T., & Nahavandi, S. (2019). Deep learning for
deepfakes creation and detection. CoRR, abs/1909.11573. Retrieved from http://arxiv.org/abs/1909.11573 ↩
20. Mirsky, Y., & Lee, W. (2021). The Creation and Detection of Deepfakes. ACM Computing Surveys,
54(1), 1–41. ↩
21. Oscar Schwartz. 2018. You thought fake news was bad? The Guardian. Retrieved March 2, 2020. (n.d.).
https://www.theguardian.com/technology/2018/nov/12/deep-fakes-fake-news-truth. Retrieved from
https://www.theguardian.com/technology/2018/nov/12/deep-fakes-fake-news-truth ↩
22. Thies, J., Zollhöfer, M., & Nießner, M. (2019). Deferred neural rendering: Image Synthesis using Neural
Textures. ACM Transactions on Graphics, 38(4). Retrieved from https://arxiv.org/abs/1904.12356 ↩
23. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Niebner, M. (2016). Face2Face: Real-Time
Face Capture and Reenactment of RGB Videos. Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, 2016-Decem, 2387–2395. ↩
24. Blanz, V., Basso, C., Poggio, T., & Vetter, T. (2003). Reanimating Faces in Images and Video. Computer
Graphics Forum, 22(3), 641–650. ↩
25. Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., & Theobalt, C. (2015). Real-time
expression transfer for facial reenactment. ACM Transactions on Graphics, 34(6). ↩
26. Thies, J., Zollhöfer, M., Theobalt, C., Stamminger, M., & Niessner, M. (2018). HeadOn: Real-time
reenactment of human portrait videos. ACM Transactions on Graphics, 37(4). Retrieved from
https://arxiv.org/abs/1805.11729 ↩
27. Siddiqui Muhammad Yasir, H. A., Amin Muhammad Sadiq. (2022). 3D instance segmentation using
deep learning on RGB-d indoor data. Computers, Materials & Continua, 72(3), 5777–5791. Retrieved from
http://www.techscience.com/cmc/v72n3/47479 ↩

38
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

28. Kumar, R., Sotelo, J., Kumar, K., Brebisson, A., & Bengio, Y. (2017). ObamaNet: Photo-realistic lip-
sync from text. 1–4. Retrieved from http://arxiv.org/abs/1801.01442 ↩
29. Zhang, J., Chen, J., Tang, H., Wang, W., Yan, Y., Sangineto, E., & Sebe, N. (n.d.). Dual In-painting
Model for Unsupervised Gaze Correction and Animation in the Wild. Retrieved from
https://arxiv.org/abs/arXiv:2008.03834v1 ↩
30. Yu, Y., Liu, G., & Odobez, J. M. (2019). Improving few-shot user-specific gaze adaptation via gaze
redirection synthesis. Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, 2019-June, 11929–11938. Retrieved from https://arxiv.org/abs/1904.10638 ↩
31. Tran, L., Yin, X., & Liu, X. (n.d.). Representation Learning by Rotating Your Faces. Retrieved from
https://arxiv.org/abs/arXiv:1705.11136v2 ↩
32. Cao, J., Hu, Y., Yu, B., He, R., & Sun, Z. (2019). 3D Aided Duet GANs for Multi-View Face Image
Synthesis. IEEE Transactions on Information Forensics and Security, 14(8), 2028–2042. ↩
33. Scherhag, U., Rathgeb, C., Merkle, J., Breithaupt, R., & Busch, C. (2019). Face recognition systems
under morphing attacks: A survey. IEEE Access, 7, 23012–23026. ↩
34. Gross, R., Sweeney, L., Torre, F. D., & Baker, S. (2006). Model-Based Face De-Identification. ↩
35. Gross, R., Sweeney, L., Cohn, J., Torre, F., & Baker, S. (2009). Face de-identification. In A. Senior (Ed.),
Protecting privacy in video surveillance (pp. 129–146). London: Springer London. ↩
36. Meden, B., Malli, R. C., Fabijan, S., Ekenel, H. K., Štruc, V., & Peer, P. (2017). Face deidentification
with generative deep neural networks. IET Signal Processing, 11(9), 1046–1054. Retrieved from
https://arxiv.org/abs/1707.09376 ↩
37. Pan, Y. L., Haung, M. J., Ding, K. T., Wu, J. L., & Jang, J. S. (2019). K-Same-siamese-GAN: K-Same
algorithm with generative adversarial network for facial image de-identification with hyperparameter tuning
and mixed precision training. 2019 16th IEEE International Conference on Advanced Video and Signal
Based Surveillance, AVSS 2019. Retrieved from https://arxiv.org/abs/1904.00816 ↩
38. Mirjalili, V., Raschka, S., & Ross, A. (2019). FlowSAN: Privacy-Enhancing Semi-Adversarial Networks
to Confound Arbitrary Face-Based Gender Classifiers. IEEE Access, 7, 99735–99745. Retrieved from
https://arxiv.org/abs/1905.01388 ↩
39. Gafni, O., Wolf, L., & Taigman, Y. (2019). Live Face De-Identification in Video. CoRR. Retrieved from
http://arxiv.org/abs/1911.08348 ↩
40. Li, Y., & Lyu, S. (2019). De-identification Without Losing Faces. 83–88. ↩

39
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

41. Siddiqui Muhammad Yasir, H. A. (2023). Faster metallic surface defect detection using deep learning
with channel shuffling. Computers, Materials & Continua, 75(1), 1847–1861. Retrieved from
http://www.techscience.com/cmc/v75n1/51506 ↩
42. Agarwal, S., & Farid, H. (n.d.). Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches. ↩
43. Suwajanakorn, S., Seitz, S. M., & Kemelmacher-Shlizerman, I. (2017). Synthesizing obama: Learning
lip sync from audio. ACM Transactions on Graphics, 36(4). ↩
44. Song, Y., Zhu, J., Li, D., Wang, A., & Qi, H. (2019). Talking face generation by conditional recurrent
adversarial network. IJCAI International Joint Conference on Artificial Intelligence, 2019-Augus, 919–925.
Retrieved from https://arxiv.org/abs/1804.04786 ↩
45. Song, L., Wu, W., Qian, C., He, R., & Loy, C. C. (2020). Everybody’s Talkin’: Let Me Talk as You Want.
Retrieved from http://arxiv.org/abs/2001.05201 ↩
46. Fried, O., Tewari, A., Zollhöfer, M., Finkelstein, A., Shechtman, E., Goldman, D. B., … Agrawala, M.
(2019). Text-based Editing of Talking-head Video. ACM Transactions on Graphics, 38(4). Retrieved from
https://arxiv.org/abs/1906.01524 ↩
47. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. Proceedings of
the IEEE International Conference on Computer Vision, 2015 Inter, 3730–3738. Retrieved from
https://arxiv.org/abs/1411.7766 ↩
48. Xie, D., Chatterjee, P., Liu, Z., Roy, K., & Kossi, E. (2020). DeepFake detection on publicly available
datasets using modified AlexNet. 2020 IEEE Symposium Series on Computational Intelligence (SSCI),
1866–1871. ↩
49. Korshunov, P., & Marcel, S. (2018). DeepFakes: a New Threat to Face Recognition? Assessment and
Detection. 1–5. Retrieved from http://arxiv.org/abs/1812.08685 ↩
50. Dolhansky, B., Howes, R., Pflaum, B., Baram, N., & Ferrer, C. C. (2019). The deepfake detection
challenge (DFDC) preview dataset. https://arxiv.org/abs/1910.08854. Retrieved from
https://arxiv.org/abs/1910.08854 ↩
51. Contributing Data to Deepfake Detection Research. (n.d.).
https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html. Retrieved from
https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html ↩
52. Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Niessner, M. (2019). FaceForensics++:
Learning to detect manipulated facial images. Proceedings of the IEEE International Conference on

40
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

Computer Vision, 2019-Octob, 1–11. Retrieved from https://arxiv.org/abs/1901.08971↩


53. Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-DF: A Large-Scale Challenging Dataset for
DeepFake Forensics. Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, 3204–3213. Retrieved from https://arxiv.org/abs/1909.12962 ↩
54. Jiang, L., Li, R., Wu, W., Qian, C., & Loy, C. C. (2020). DeeperForensics-1.0: A large-scale dataset for
real-world face forgery detection. CVPR. ↩
55. Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y. G. (2020). WildDeepfake: A Challenging Real-World
Dataset for Deepfake Detection. MM 2020 - Proceedings of the 28th ACM International Conference on
Multimedia, 2382–2390. Retrieved from https://arxiv.org/abs/2101.01456 ↩
56. Le, T.-N., Nguyen, H. H., Yamagishi, J., & Echizen, I. (2021). OpenForensics: Large-Scale Challenging
Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild. Retrieved from
http://arxiv.org/abs/2107.14480 ↩
57. Dang, H. (n.d.). On the Detection of Digital Face Manipulation. Retrieved from
https://arxiv.org/abs/arXiv:1910.01717v5 ↩
58. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of GANs for improved quality,
stability, and variation. CoRR, abs/1710.10196. Retrieved from http://arxiv.org/abs/1710.10196 ↩
59. Nadimpalli, A. V., & Rattani, A. (n.d.). GBDF: Gender Balanced DeepFake Dataset Towards Fair
DeepFake Detection. Retrieved from https://arxiv.org/abs/arXiv:2207.10246v1 ↩
60. Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C. (2020). The
DeepFake Detection Challenge (DFDC) Dataset. Retrieved from http://arxiv.org/abs/2006.07397 ↩
61. Elaskily, M., Elnemr, H., Sedik, A., Dessouky, M., El Banby, G., Elaskily, O., … Elshakankiry, O.
(2020). A novel deep learning framework for copy-move forgery detection in images. Multimedia Tools and
Applications, 79. ↩
62. WWD. (n.d.). https://mklab.iti.gr/results/the-wild-web-tampered-image-dataset/. Retrieved from
https://mklab.iti.gr/results/the-wild-web-tampered-image-dataset/ ↩
63. Shullani, D., Fontani, M., Iuliani, M., Shaya, O. A., & Piva, A. (2017). VISION: a video and image
dataset for source identification. Eurasip Journal on Information Security, 2017(1). ↩
64. Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2018). FaceForensics: A
Large-scale Video Dataset for Forgery Detection in Human Faces. Retrieved from
http://arxiv.org/abs/1803.09179 ↩

41
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

65. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2019). MesoNet: A compact facial video forgery
detection network. 10th IEEE International Workshop on Information Forensics and Security, WIFS 2018.
Retrieved from https://arxiv.org/abs/1809.00888 ↩
66. Nataraj, L., Mohammed, T. M., Manjunath, B. S., Chandrasekaran, S., Flenner, A., Bappy, J. H., & Roy-
Chowdhury, A. K. (2019). Detecting GAN generated Fake Images using Co-occurrence Matrices. IS and T
International Symposium on Electronic Imaging Science and Technology, 2019(5), 1–7. Retrieved from
https://arxiv.org/abs/1903.06836 ↩
67. Guera, D., & Delp, E. J. (2019). Deepfake Video Detection Using Recurrent Neural Networks.
Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based
Surveillance. ↩
68. Nguyen, H. H., Yamagishi, J., & Echizen, I. (2019). Capsule-forensics: Using Capsule Networks to
Detect Forged Images and Videos. ICASSP, IEEE International Conference on Acoustics, Speech and Signal
Processing - Proceedings, 2019-May, 2307–2311. Retrieved from https://arxiv.org/abs/1810.11215 ↩
69. Jeon, H., Bang, Y., & Woo, S. S. (2020). FDFtNet: Facing Off Fake Images Using Fake Detection Fine-
Tuning Network. IFIP Advances in Information and Communication Technology, 580 IFIP, 416–430.
Retrieved from https://arxiv.org/abs/2001.01265 ↩
70. Fung, S., Lu, X., Zhang, C., & Li, C. T. (2021). DeepfakeUCL: Deepfake Detection via Unsupervised
Contrastive Learning. Proceedings of the International Joint Conference on Neural Networks, 2021-July.
Retrieved from https://arxiv.org/abs/2104.11507 ↩
71. Li, Y., & Lyu, S. (2018). Exposing DeepFake Videos By Detecting Face Warping Artifacts. Retrieved
from http://arxiv.org/abs/1811.00656 ↩
72. Yu, N., Davis, L., & Fritz, M. (2019). Attributing fake images to GANs: Learning and analyzing GAN
fingerprints. Proceedings of the IEEE International Conference on Computer Vision, 2019-Octob, 7555–
7565. Retrieved from https://arxiv.org/abs/1811.08180 ↩
73. Chai, L., Bau, D., Lim, S. N., & Isola, P. (2020). What Makes Fake Images Detectable? Understanding
Properties that Generalize. 12371 LNCS, 103–120. Retrieved from https://arxiv.org/abs/2008.10588 ↩
74. Wang, S.-Y., Wang, O., Zhang, R., Owens, A., & Efros, A. A. (2019). CNN-generated images are
surprisingly easy to spot... for now. CoRR, abs/1912.11035. Retrieved from http://arxiv.org/abs/1912.11035 ↩
75. Chollet, F. (2016). Xception: Deep Learning with Depthwise Separable Convolutions. CoRR,
abs/1610.02357. Retrieved from http://arxiv.org/abs/1610.02357 ↩

42
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

76. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings
of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 770–
778. Retrieved from https://arxiv.org/abs/1512.03385 ↩
77. Exploiting visual artifacts to expose deepfakes and face manipulations. (2019). Proceedings - 2019 IEEE
Winter Conference on Applications of Computer Vision Workshops, WACVW 2019, (1), 83–92. ↩
78. Yang, X., Li, Y., & Lyu, S. (2018). Exposing deep fakes using inconsistent head poses. CoRR,
abs/1811.00661. Retrieved from http://arxiv.org/abs/1811.00661 ↩
79. Bharati, A., Singh, R., Vatsa, M., & Bowyer, K. W. (2016). Detecting facial retouching using supervised
deep learning. IEEE Transactions on Information Forensics and Security, 11(9), 1903–1913. ↩
80. Khan, S. A. (n.d.). Hybrid Transformer Network for Deepfake Detection (Vol. 1). Association for
Computing Machinery. ↩
81. Xuan, X., Peng, B., Wang, W., & Dong, J. (2019). On the Generalization of GAN Image Forensics.
Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), 11818 LNCS(61502496), 134–141. Retrieved from
https://arxiv.org/abs/1902.11153 ↩
82. Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., & Natarajan, P. (2019). Recurrent
Convolutional Strategies for Face Manipulation Detection in Videos. 80–87. Retrieved from
http://arxiv.org/abs/1905.00582 ↩
83. Jeon, H., Bang, Y., Kim, J., & Woo, S. S. (2020). T-GD: Transferable GAN-generated Images Detection
Framework. 37th International Conference on Machine Learning, ICML 2020, PartF16814, 4696–4711.
Retrieved from https://arxiv.org/abs/2008.04115 ↩
84. Guarnera, L., Giudice, O., & Battiato, S. (2020). DeepFake detection by analyzing convolutional traces.
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2020-June,
2841–2850. Retrieved from https://arxiv.org/abs/2004.10448 ↩
85. Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6),
47–60. ↩
86. Nirkin, Y., Wolf, L., Keller, Y., & Hassner, T. (2020). DeepFake Detection Based on the Discrepancy
Between the Face and its Context. 1–10. Retrieved from http://arxiv.org/abs/2008.12262 ↩
87. McCloskey, S., & Albright, M. (2019). Detecting GAN-Generated Imagery Using Saturation Cues.
Proceedings - International Conference on Image Processing, ICIP, 2019-Septe, 4584–4588. ↩

43
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

88. Marra, F., Gragnaniello, D., Verdoliva, L., & Poggi, G. (2019). Do GANs Leave Artificial Fingerprints?
Proceedings - 2nd International Conference on Multimedia Information Processing and Retrieval, MIPR
2019, 506–511. Retrieved from https://arxiv.org/abs/1812.11842 ↩
89. Hulzebosch, N., Ibrahimi, S., & Worring, M. (2020). Detecting CNN-generated facial images in real-
world scenarios. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Workshops, 2020-June, 2729–2738. Retrieved from https://arxiv.org/abs/2005.05632 ↩
90. Cozzolino, D., Thies, J., Rössler, A., Riess, C., Nießner, M., & Verdoliva, L. (2018). ForensicTransfer:
Weakly-supervised Domain Adaptation for Forgery Detection. CoRR, abs/1812.02510. Retrieved from
http://arxiv.org/abs/1812.02510 ↩
91. Flynn, P. J., Bowyer, K. W., & Phillips, P. J. (2003). Assessment of time dependency in face recognition:
An initial study. In J. Kittler & M. S. Nixon (Eds.), Audio- and video-based biometric person authentication
(pp. 44–51). Berlin, Heidelberg: Springer Berlin Heidelberg. ↩
92. Zhang, X., Karaman, S., & Chang, S. F. (2019). Detecting and Simulating Artifacts in GAN Fake
Images. 2019 IEEE International Workshop on Information Forensics and Security, WIFS 2019. Retrieved
from https://arxiv.org/abs/1907.06515 ↩
93. Haliassos, A., Vougioukas, K., Petridis, S., & Pantic, M. (2021). Lips Don’t Lie: A Generalisable and
Robust Approach to Face Forgery Detection. Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, 5037–5047. Retrieved from https://arxiv.org/abs/2012.07657 ↩
94. Tariq, S., Lee, S., & Woo, S. (2021). One detector to rule them all: Towards a general deepfake attack
detection framework. The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW
2021, 3625–3637. Retrieved from https://arxiv.org/abs/2105.00187 ↩
95. Agarwal, S., Farid, H., El-Gaaly, T., & Lim, S. N. (2020). Detecting Deep-Fake Videos from Appearance
and Behavior. 2020 IEEE International Workshop on Information Forensics and Security, WIFS 2020.
Retrieved from https://arxiv.org/abs/2004.14491 ↩
96. Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., & Li, H. (2019). Protecting world leaders against
deep fakes. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Workshops. ↩
97. Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., & Manocha, D. (2020). Emotions Don’t Lie: An
Audio-Visual Deepfake Detection Method using Affective Cues. MM 2020 - Proceedings of the 28th ACM
International Conference on Multimedia, 2823–2832. Retrieved from https://arxiv.org/abs/2003.06711 ↩

44
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

98. Chugh, K., Gupta, P., Dhall, A., & Subramanian, R. (2020). Not made for each other- Audio-Visual
Dissonance-based Deepfake Detection and Localization. MM 2020 - Proceedings of the 28th ACM
International Conference on Multimedia, 439–447. Retrieved from https://arxiv.org/abs/2005.14405 ↩
99. Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L.-P. (2018). OpenFace 2.0: Facial Behavior Analysis
Toolkit. 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018),
59–66. ↩
100. Qi, H., Guo, Q., Juefei-Xu, F., Xie, X., Ma, L., Feng, W., … Zhao, J. (2020). DeepRhythm: Exposing
DeepFakes with Attentional Visual Heartbeat Rhythms. MM 2020 - Proceedings of the 28th ACM
International Conference on Multimedia, 1318–1327. Retrieved from https://arxiv.org/abs/2006.07634 ↩
101. Li, Y., Chang, M. C., & Lyu, S. (2019). In Ictu Oculi: Exposing AI created fake videos by detecting eye
blinking. 10th IEEE International Workshop on Information Forensics and Security, WIFS 2018, (1), 1–7. ↩
102. Agarwal, S., Farid, H., Fried, O., & Agrawala, M. (2020). Detecting deep-fake videos from phoneme-
viseme mismatches. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Workshops, 2020-June, 2814–2822. ↩
103. Demir, I., & Ciftci, U. A. (2021). Where Do Deep Fakes Look? Synthetic Face Detection via Gaze
Tracking. Eye Tracking Research and Applications Symposium (ETRA), PartF169256. Retrieved from
https://arxiv.org/abs/2101.01165 ↩
104. Jung, T., Kim, S., & Kim, K. (2020). DeepVision: Deepfakes Detection Using Human Eye Blinking
Pattern. IEEE Access, 8, 83144–83154. ↩
105. Two-Stream Neural Networks for Tampered Face Detection. (2017). IEEE Computer Society
Conference on Computer Vision and Pattern Recognition Workshops, 2017-July, 1831–1839. ↩
106. Nguyen, H. H., Fang, F., Yamagishi, J., & Echizen, I. (2019). Multi-task Learning for Detecting and
Segmenting Manipulated Facial Images and Videos. 2019 IEEE 10th International Conference on
Biometrics Theory, Applications and Systems, BTAS 2019. Retrieved from https://arxiv.org/abs/1906.06876 ↩
107. Neves, J. C., Tolosana, R., Vera-Rodriguez, R., Lopes, V., Proença, H., & Fierrez, J. (2020).
GANprintR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection. IEEE
Journal on Selected Topics in Signal Processing, 14(5), 1038–1048. Retrieved from
https://arxiv.org/abs/1911.05351 ↩
108. Amerini, I., Conti, M., Giacomazzi, P., & Pajola, L. (2022). PRaNA: PRNU-based technique to tell real
and deepfake videos apart. 2022 International Joint Conference on Neural Networks (IJCNN), 1–7. ↩

45
RS Open Journal on Innovative Communication Technologies • Issue 8 Media Forensics and Deepfake - Systematic Survey

109. Li, H., Li, B., Tan, S., & Huang, J. (2020). Identification of deep network generated images using
disparities in color components. Signal Processing, 174. Retrieved from https://arxiv.org/abs/1808.07276 ↩

110. Lugstein, F., Baier, S., Bachinger, G., & Uhl, A. (2021). PRNU-based Deepfake Detection (Vol. 1, pp.
7–12). Association for Computing Machinery. ↩

111. Hsu, C.-C., Zhuang, Y.-X., & Lee, C.-Y. (2020). Deep fake image detection based on pairwise learning.
Applied Sciences, 10(1). Retrieved from https://www.mdpi.com/2076-3417/10/1/370 ↩

112. Gandhi, A., & Jain, S. (2020). Adversarial Perturbations Fool Deepfake Detectors. Proceedings of the
International Joint Conference on Neural Networks. Retrieved from https://arxiv.org/abs/2003.10596 ↩

113. Wu, X., Xie, Z., Gao, Y., & Xiao, Y. (2020). SSTNet: Detecting manipulated faces through spatial,
steganalysis and temporal features. ICASSP 2020 - 2020 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), 2952–2956. ↩

114. Liu, Z., Qi, X., Jia, J., & Torr, P. H. S. (2020). Global texture enhancement for fake face detection in the
wild. CoRR, abs/2002.00133. Retrieved from https://arxiv.org/abs/2002.00133 ↩

115. Khalid, H., & Woo, S. S. (2020). OC-FakeDect: Classifying deepfakes using one-class variational
autoencoder. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops,
2020-June, 2794–2803. ↩

116. Zhang, Y., Zheng, L., & Thing, V. L. L. (2017). Automated face swapping and its detection. 2017 IEEE
2nd International Conference on Signal and Image Processing, ICSIP 2017, 2017-Janua, 15–19. ↩

117. Fernandes, S., Raj, S., Ortiz, E., Vintila, I., Salter, M., Urosevic, G., & Jha, S. (2019). Predicting heart
rate variations of deepfake videos using neural ODE. Proceedings - 2019 International Conference on
Computer Vision Workshop, ICCVW 2019, 1721–1729. ↩

118. Fernandes, S., Raj, S., Ewetz, R., Pannu, J. S., Kumar Jha, S., Ortiz, E., … Salter, M. (2020). Detecting
deepfake videos using attribution-based confidence metric. IEEE Computer Society Conference on
Computer Vision and Pattern Recognition Workshops, 2020-June, 1250–1259. ↩

119. Hu, S., Li, Y., & Lyu, S. (2021). Exposing GaN-generated faces using inconsistent corneal specular
highlights. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing -
Proceedings, 2021-June, 2500–2504. Retrieved from https://arxiv.org/abs/2009.11924 ↩

46

You might also like