Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views27 pages

Deep Learning Wireless Networks

This document provides a comprehensive survey on the applications of Deep Learning (DL) in intelligent wireless networks, highlighting its ability to analyze complex network dynamics and improve network performance. It discusses the advantages of DL over traditional machine learning methods, including higher prediction accuracy and the ability to handle large amounts of data without extensive pre-processing. The paper also outlines future research trends and challenges in the field, aiming to guide researchers in exploring unsolved issues related to DL in wireless networks.

Uploaded by

ishandhingra2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views27 pages

Deep Learning Wireless Networks

This document provides a comprehensive survey on the applications of Deep Learning (DL) in intelligent wireless networks, highlighting its ability to analyze complex network dynamics and improve network performance. It discusses the advantages of DL over traditional machine learning methods, including higher prediction accuracy and the ability to handle large amounts of data without extensive pre-processing. The paper also outlines future research trends and challenges in the field, aiming to guide researchers in exploring unsolved issues related to DL in wireless networks.

Uploaded by

ishandhingra2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO.

4, FOURTH QUARTER 2018 2595

Deep Learning for Intelligent Wireless Networks:


A Comprehensive Survey
Qian Mao , Student Member, IEEE, Fei Hu , Member, IEEE, and Qi Hao, Member, IEEE

Abstract—As a promising machine learning tool to handle the which has been developed as an eye-catching technology,
accurate pattern recognition from complex raw data, deep learn- i.e., Deep Learning (DL) [1]. In the DL process, first, com-
ing (DL) is becoming a powerful method to add intelligence to puters need to learn from experiences and build up certain
wireless networks with large-scale topology and complex radio
conditions. DL uses many neural network layers to achieve a training model. This training process allows computers to
brain-like acute feature extraction from high-dimensional raw determine appropriate weight values between neural nodes,
data. It can be used to find the network dynamics (such as which are able to extract the features from the input data.
hotspots, interference distribution, congestion points, traffic bot- Once the neural network has been trained, an appropriate deci-
tlenecks, spectrum availability, etc.) based on the analysis of a sion is able to be made to achieve high reward. This idea
large amount of network parameters (such as delay, loss rate, link
signal-to-noise ratio, etc.). Therefore, DL can analyze extremely has shown great success in many real-world control scenarios,
complex wireless networks with many nodes and dynamic link such as voice recognition [2], [3], image recognition [4]–[7],
quality. This paper performs a comprehensive survey of the appli- semantic analysis [8], [9], language interpretation [10], [11],
cations of DL algorithms for different network layers, including game control [12], drug discovery [13], and biomedical sci-
physical layer modulation/coding, data link layer access con- ences [14]–[16], etc.
trol/resource allocation, and routing layer path search, and traffic
balancing. The use of DL to enhance other network functions, DL is a subclass of machine learning which uses cascaded
such as network security, sensing data compression, etc., is also layers to extract features from the input data and eventually
discussed. Moreover, the challenging unsolved research issues forms a decision. The application of DL should consider four
in this field are discussed in detail, which represent the future aspects: (1) How to represent the state of the environment in
research trends of DL-based wireless networks. This paper can suitable numerical formats, which will be taken as the input
help the readers to deeply understand the state-of-the-art of
the DL-based wireless network designs, and select interesting layer of the DL network; (2) How to represent/interpret the
unsolved issues to pursue in their research. recognition results, i.e., the physical meaning of the output
layer of the DL network; (3) How to compute/update the
Index Terms—Wireless networks, deep learning (DL), deep
reinforcement learning (DRL), protocol layers, performance reward value, and what is the proper reward function that
optimization. can guide the iterative weight updating in each neural layer;
(4) The structure of the DL system, including how many hid-
den layers, the structure of each layer, and the connections
I. I NTRODUCTION between layers.
UMAN brains possess powerful data processing capa- Currently, many DL systems are tied with Reinforcement
H bilities. Every day we confront numerous data from
the external world. Under a complex environment, a large
Learning (RL) models [17], which comprises three parts: 1) an
environment which can be described by some features, 2) an
number of object features are first collected by our sense agent which takes actions to change the environment, and 3) an
organs. Then the brain extracts the abstract characteristics interpreter which announces the current state and the action the
from those feature data and finally makes a decision. In agent takes. Meanwhile, the interpreter announces the reward
many fields computers have already shown comparable or after the action takes effect in the environment, as shown in
even more powerful capabilities compared to human beings, Fig. 1. The goal of the RL is to train the agent in such a way
such as the game playing, auto control, voice and image that for a given environment state, it chooses the optimal action
recognition, etc. The approach for the computer to achieve that yields the highest reward. Therefore, one of the main dif-
these abilities is very similar to what human brain does, ferences between DL and RL is that the former usually learns
from examples (e.g., training data) to create a model to clas-
Manuscript received January 20, 2018; revised April 20, 2018; accepted sify data, however, the latter trains the model by maximizing
June 5, 2018. Date of publication June 12, 2018; date of current ver-
sion November 19, 2018. This work was supported in part by the the reward associated with different actions.
Science and Technology Innovation Commission of Shenzhen under Grant DL has already shown astonishing capabilities in dealing
CKFW2016041415372174, and in part by the National Natural Science with many real-world scenarios, such as the success of Alpha
Foundation of China under Grant 61773197. (Corresponding author: Qi Hao.)
Q. Mao and F. Hu are with the Department of Electrical and Computer Go, the face recognition on mobile phones, etc. Researchers
Engineering, University of Alabama, Tuscaloosa, AL 35401 USA (e-mail: in computer network areas also cast strong interests in DL
[email protected]; [email protected]). applications. By using DL model the complex network envi-
Q. Hao is with the Department of Computer Science and Engineering,
Southern University of Science and Technology, Shenzhen 518055, China. ronment can be represented, abstract features can be obtained,
Digital Object Identifier 10.1109/COMST.2018.2846401 and a better decision can be achieved finally for the computer
1553-877X  c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2596 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

thereby providing higher prediction accuracy. 2) No need


to pre-process input data. The prediction accuracy of ML
depends much on the data pre-processing. However, the input
of DL is usually feature parameters directly collected from
the network. Considering the significant diversity of wireless
network parameters, this advantage of DL decreases the design
complexity and increases the prediction accuracy.
The success of applying DL for wireless networking is due
to the following three similarities between DL and human
brain:
(1) Tolerance of incomplete or even erroneous input raw
Fig. 1. Schematic Diagram of Reinforcement Learning. data: The human brain can tolerate distorted samples. For
example, we can still recognize the image shape of ‘1’ even
some sections of ‘1’ are missing, and we can recognize peo-
network nodes to achieve improved network quality-of-service ple from an obscure face image even when some pixels are
(QoS) and quality-of-experience (QoE). missing. Likewise, DL uses deep neural network to tolerate
Wireless networks yield complex features, such as commu- missing or distorted input data. This capability is important
nication signal characteristics, channel quality, queueing state to wireless networks since it is not possible to accurately col-
of each node, path congestion situation, etc. On the other lect all the radio links’ states due to the channel fading, node
hand, there are many network control targets having significant mobility, and control channel failure.
impacts on the communication performances, such as resource (2) Capability of handling large amount of input informa-
allocation, queue management, congestion control, etc. To han- tion: Human brain can simultaneously absorb multiple types
dle the complicated situations, machine learning technique has of complex information and makes a good judgement. For
been extensively explored [18]. Chen et al. [19] presented example, we can use sound, image, and smell to detect the
a comprehensive summary towards the ML applications in coming of a dog. Likewise, DL can simultaneously accept
wireless networks, including wireless communications and a huge amount of performance data from multiple protocol
networking using Unmanned Aerial Vehicles (UAVs), wireless layers (such as 1000 nodes’ queueing status data and link
virtual reality, mobile edge caching and computing, spec- interference matrix), and then determines the concrete conges-
trum management and co-existence of multiple radio access, tion place in a large network. DL will play a critical role in big
Internet of Things, etc. The applications of ML in these data wireless transmissions due to its capability of analyzing
areas present astonishing improvement compared to traditional the performance parameters of huge traffic flows.
methods. (3) Capability of making control decisions: Our brains learn
On the other hand, since modern wireless networks are things and guide our behaviors. Passive learning may not be
becoming more and more complex, more desires are brought the final goal of network analysis. Using the learning results to
to the learning system, such as higher computing capacity, guide the proper network control is the ultimate goal. With the
bigger datasets, faster and more intelligent learning algo- Markov decision model, DL is able to evolve into deep rein-
rithms, more flexible input mechanism [19], etc. To meet forcement learning (DRL) model, which can use the updates
these urges, deep learning applications in wireless networks of system states, reward function, and policy seeking, to make
has drawn lots of interests. DL equips the wireless network a suitable network control based on the maximum reward cal-
with a ‘human brain’: it accepts a large number of network culation. Thus we can use DRL to achieve large-scale wireless
performance parameters, such as link signal-to-noise ratios network control.
(SNRs), channel holding time, link access success/collision In this article, a comprehensive survey towards the applica-
rates, routing delay, packet loss rate, bit error rate, etc., and tions of DL in wireless networks is presented. Fig. 2 shows
performs deep analysis on the intrinsic patterns (such as the taxonomy we have used when reviewing various uses of
congestion degree, interference alignment effect, hotspot dis- DL for each network aspect. Our contributions in this review
tributions, etc.). Such patterns can be used to perform the consist of 3 important aspects:
protocol controls in different protocol layers. For example, (1) DL applications in different layers: We will systemati-
the routing layer may start to look for a new alternate path; cally analyze the benefits of adopting DL/DRL for the network
the transport layer can shrink the congestion window size, feature extraction in different layers. As shown in Fig. 2, in
and so on. Compared to traditional machine learning tech- physical layer, DL can be used for interference alignment. It
nique, DL provides more promising improvement in wireless can also be used to classify the modulation modes, design effi-
network applications: 1) Higher prediction accuracy. Wireless cient error correction codes, etc. In the data link layer, DL can
networks have highly complicated features, such the nodes’ be used for resource (such as channels) allocation, link qual-
mobility, channel variation, channel interference, etc. machine ity evaluation, and so on. In network (routing) layer, it can
learning methods cannot analyze these features deeply enough help to seek an optimal routing path. In higher layers (such
because of the lack of deep neural layers. However, the as application layer), it is used to enhance data compression
in-depth patterns hidden in the input parameters can be and multi-session scheduling. We will provide the core design
abstracted layer by layer through deep learning algorithms, ideas for each DL application, and compare different solutions.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2597

52 68 70 72
41 40 57 58 59 60 82
73

95 96 100 113 116


62 101 104
42 83
44 45 46 63 64 65
47 75 79
106
48 49 50 66 110 113
101 103
51 90 114 115

Fig. 2. Taxonomy of Deep Learning Applications in Wireless Networks.

(2) DL advantages in security and other network functions: II. F UNDAMENTALS OF D EEP L EARNING
Besides the above protocol stack, we will also discuss the DL originated from Machine Learning (ML). In this sec-
advantages of using DL in other network functions. One criti- tion, we first analyze the differences and relationship between
cal area is security and privacy protection. Today the intrusion those two techniques. Then, a brief introduction towards DL
detection becomes more challenging due to the network scale principles is presented.
increase and huge amount of traffic passing through the attack
detector/filters. DL is an ideal tool to perform large-scale
network profile analysis to detect the potential intrusion events. A. From Machine Learning to Deep Learning
We will explain how DL can be used to classify the pack- Both ML and DL solve real-world problems with neu-
ets into benign/malicious types, and how it can be integrated ral networks. A typical ML system is composed of three
with other machine learning schemes such as unsupervised parts: 1) Input layer, which takes pre-processed data as the
clustering, to achieve a better anomaly detection effect. system input. The features of the real-world data (e.g., pixel
(3) Future trends: Since this field is still far from maturity values, shape, texture, etc.) need to be pre-processed and
and many issues are not solved yet, we will introduce 10 chal- identified by humans so that the ML system can deal with
lenging problems on the use of DL to enhance some of the them. 2) Feature extraction and processing layer, in which a
popular wireless networks, such as cognitive radio networks single layer of data processing is used to extract the data pat-
(CRNs), software-defined networks (SDNs), dew/fog comput- terns. Currently, Support Vector Machines (SVM), Principal
ing, etc. We will provide the context, motivation, problem Component Analysis (PCA), Hidden Markov Model (HMM),
statement, and concrete unsolved issues for each of those 10 etc., are extensively used for feature extraction. 3) Output
problems. They are helpful to the readers who are seeking for layer, which spills out the results of classification, regression,
new research directions. clustering, density estimation, or dimensionality reduction,
Roadmap: The rest of this paper is organized as fol- depending on the task of the ML model. The schematic
lows: In Section II, to prepare for the discussions of DL structure of ML is shown in Fig. 3(a).
applications for wireless network functions, we first explain The original data input into the learning system could be
the fundamental math models of DL, including its rela- quite diverse, varying from natural information such as image,
tions with general machine learning and graph-based learning audio and video, to various quantitative event descriptions.
framework. Then we move to the discussions of DL-based Although the input of the learning system may be different,
physical layer enhancements in terms of signal interference the core data learning module requires that the input data has a
and modulation classification in Section III. Section IV dis- uniformed form, based on which the input events are classified.
cusses the importance of DL in data link layer design. Therefore, to enable the learning process to “recognize” the
Some typical MAC design examples are explained with DL- input data, the original natural data needs to be pre-processed,
based enhancements. In Section V the DL-based routing i.e., the raw data needs to be transformed into a suitable repre-
layer operations such as path establishment and optimization sentation or feature vector, which can be accepted by the ML
are described. The utilizations of DL for security and other classification system. This pre-processing needs to be sophisti-
network functions are discussed in Section VI. Section VII catedly designed in such a way that the features of the original
summarizes some DL implementation platforms that have natural data related with classification are well preserved. And
been extensively used in wireless network research. Ten the classification accuracy is significantly affected by the data
challenging research issues to be solved next are stated pre-processing schemes.
in Section VIII, followed by the concluding marks in Machine learning systems usually have only one hidden
Section IX. layer between the input and output layers. This type of learning

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2598 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

to the connection. To determine the values of the weights, a


large number of samples are sent to the system for training
purpose, which could be either supervised learning or unsu-
pervised learning. In the supervised learning, a gradient vector
is computed for each weight, indicating the amount of error
change with the variation of that weight. According to the
gradient vector, the weight is adjusted to decrease the error.

B. Deep Learning Framework


Human beings spontaneously interact with the environment
by using a combination of reinforcement learning and hierar-
chical sensory processing system, to accomplish many tasks
such as object recognition [25], conditioning and choice-
making [26], etc. Inspired by animal behavior, deep reinforce-
ment learning was proposed and has drawn much attention
in computer intelligence. A DL model includes two crucial
elements: forward feature abstraction and backward error feed-
back. The training process usually needs both elements, while
the verification process solely implements the former.
Forward feature abstraction: Assume there are N layers in
the DL network, as shown in Fig. 4. For the j-th node in layer
i, denoted as nij , the output is obtained through two steps.
First, node nij computes a weighted sum of all its inputs,
denoted as zij . Then zij is sent to a non-linear function f () to
obtain the output yij of node nij .
Li−1

i
zij = wkj , yij = f (zij ), (1)
k =1
where wkj i is the weight from node n
i−1,k to node nij
and Li−1 is the number of nodes for layer i − 1. For the
Fig. 3. Schematics of Machine Learning and Deep Learning. (a) Machine choice of the non-linear function f (), the rectified linear unit
Learning; (b) Deep Learning. (ReLU) f (z ) = max(0, z ), the hyperbolic tangent function
f (z ) = [ exp(z ) − exp(−z )]/[ exp(z ) + exp(−z )], and the
logistic function f (z ) = 1/[1 + exp(−z )] are the popular
system is also referred to as shallow learning network, which options [7].
provides arbitrary function approximator with enough hidden Backward error feedback: The initial weights are random
units in one hidden layer and learns more-or-less independent values or empirical values. To improve the accuracy of the final
features from the input layer. For example, Chen et al. [20] output of the learning system, these weight values are adjusted
proposed a radio map learning system based on the shal- by backward error feedback technique, i.e., the classification
low learning network, which uses machine learning method accuracy is feedbacked, according to which the connection
to exploit the segmentation models and signal strength mod- weights are modified. For a node of the deepest layer, say,
els of the UAV-assisted wireless networks and reconstructs a node nNj , the error derivative is yNj −tNj , where yNj and tNj
finely structured radio map to improve the service coverage. are the generated output and the correct output, respectively.
On the contrary, most deep learning systems have more than Then the error derivative of lower layer connection is
one hidden layers between the input and output layers, where ∂E ∂E ∂yNj
the input of an upper layer is the output of its lower layer, such = , (2)
∂zNj ∂yNj ∂zNj
as the learning networks proposed in [21]–[24]. DL technique
avoids the sophisticated input data pre-processing by employ- where ∂E /∂yNj = yNj − tNj and j = 1, 2, . . . , LN .
ing multiple hidden layers between the input and the output For the j-th node of layer i (i = 1, 2, . . . , N − 1), first a
layers, as shown in Fig. 3(b). The natural data is input into weighted sum of the error derivatives of all the inputs (from
the learning system in their raw form. The DL system then deeper layer) to the node is computed, denoted as ∂E /∂yij .
automatically extracts appropriate representations for classi- Then the error derivative of the lower layer connection is
fication or detection purpose. Starting with the natural data, ∂E ∂E ∂yij
each layer extracts different features from the input data, grad- = , (3)
∂zij ∂yij ∂zij
ually amplifying features that are more relevant to decision
where ∂y∂E = Li+1 w i+1 ∂E , i = 1, 2, . . . , N − 1, and
making and suppressing irrelevant features. Each layer is con- ij k =1 jk ∂zi+1,k
nected to neighboring layers with different weights attached j = 1, 2, . . . , Li .

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2599

of maximizing the cumulative reward. Assuming the environ-


ment’s state is s at moment t and the agent takes action a
according to policy π, then, the environment transfers to the
next state at moment t + 1 according to environment’s transfer-
ring probability P. Meanwhile, a reward rt is given at moment
t. The goal of the agent is to maximize the cumulative reward
Qt , i.e., [28],
 
maxQt = maxE rt + γrt+1 + γ 2 rt+2 + · · · |st = s, at = a, π ,
π π
(4)
where γ is a future reward discount, since the current action
at impacts not only on the current reward, but also on the
future reward with a diminuendo strength. In [27], the reward
is represented as Qt (s, a, θi ), where θi is the weight of the
Q-network at iteration i. To replay the experience, for each
moment t, the agent’s experience, et = (st , at , rt , st+1 ), is
stored in the experience pool U(D). Each time when the agent
needs to adopt an action during the learning process, a sample
of the stored experience is randomly chosen by the agent.
Thus,
Li (θi ) = E(s,a,r ,s  )∈U (D)
 2
  −
× r + γmax 
Q(s , a , θi ) − Q(s, a, θi ) ,
a
(5)
where θi− is the weight of the Q-network used to compute the
target at iteration i, which is only updated with the weight θi
every c steps and is fixed between individual updates (c is a
constant). The parameterization of reward Q for each action is
achieved by a neural network, where each possible action is
provided a separate output unit. Therefore, for each possible
action, only the state representation serves as the input to the
network, yielding the predicted Q value for a specific action.
A classical application of DQN is video game (Atari
2600) [27], as shown if Fig. 5. For instant t, the state is the
screenshot of the game, denoted as xt . (Note that the internal
state of the game is not accessible by the observer. Instead,
Fig. 4. Deep Learning Operations. (a) Forward feature abstraction only the screen can be observed.) The action is the operation
(b) Backward error feedback.
that the Atari emulator takes, denoted as at , and the reward
is the game score gained due to the action, denoted as rt . To
In practice, stochastic gradient descent (SGD) is extensively learn the strategy with fully observation, the state input to the
used [7], since SGD finds a good set of weights with relatively DQN at instant t is a finite Markov Decision Process (MDP),
fast speed. The training process of SGD is composed of many indicating both the current observation and the previous obser-
rounds, each of which is trained by a small set of samples, vations and actions, as shown in Table I. Assuming the game
and the final gradient is the average of each round. terminates  at instant T, the future discounted reward at instant
In circumstances that some useful environment features t is Rt = T i=t ri γ
i−t . Apparently, the goal of the agent at

can be handcrafted, or the environment’s state space is low- instant t is to take an action that maximizes the future rewards
dimensional and can be fully observed, the performances of with the current observation representation and policy π, i.e.,
reinforcement learning is limited. Furthermore, reinforcement maxE(Rt |st = s, at = a, π), which can be represented as the
π
learning tends to be unstable or even diverge when a nonlin- following equation,
ear function approximator is used to represent the reward. To  
∗ ∗  
solve these problems, deep Q-network (DQN) was proposed, Q (s, a) = Es  r + γmax 
Q (s , a )|s, a , (6)
a
which employs two novel strategies to overcome the insta-
bility problem of deep learning, i.e., experienced replay and where is the observation of the next instance and a is the
s
iterative update [27], [28]. In DQN, the agent interacts with action taken at the current instance. In practice, an approxima-
the environment through a sequence of actions, with the goal tor is used to estimate the action reward, i.e., using Q(s, a, θi− )

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2600 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

Fig. 5. Schematic of Deep Q-Learning for Video Game.

TABLE I
PARAMETERS OF D EEP L EARNING S CHEME FOR A NTI -JAMMING

to replace Q ∗ (s, a), where θi− is the weights of the neural there are three main types of layers in a CNN architecture,
network in an iteration before i. (For instance, θi− = θi−1 .) i.e., convolutional layer, pooling layer and fully connected
Thus, an approximate estimated reward value is layer. The main difference between the graph-oriented CNN
and regular CNN is that the former builds graphs for each
y = r + γmaxQ(s  , a  , θi− ), (7) neural node of the learning system, which is achieved by
a
selecting neighbors and determining the connection weights
For each round of the Q-network training, say round i, the
with the neighbors for each real-world node, as shown in
training is implemented by adjusting the weights with the aim
Fig. 6(a) and (b). To represent graphs in DL models, the input
of reducing the mean square error of (7).
data is denoted as vertices and edges, i.e., G = (V, E, A), where
V represents the set of vertices, E represents the set of edges,
C. Deep Learning for Graph-Structured Data and A is the weighted adjacency matrix. Then the graph is
In many practical applications the data often has the struc- input into the learning system monolithically.
tured features, i.e., nodes are connected with each other The graph DL can be conducted in either spectral domain
spatially or temporally, or both. For instance, when predicting or spatial domain [30]. The spatial approach generalizes CNN
the behavior of a person in the kitchen, the interactions using the graph’s spatial structure, capturing the essence
between the person and the appliances are connected spa- of the convolution as an inner product of the parame-
tially or temporally. Under such a circumstance, considering ters with spatially close neighbors, as shown in Fig. 6(b).
the spatio-temporal relations among nodes in the DL frame- Bruna et al. [31] uses multi-scale clustering to define the
work, we can use the graph-based structure to achieve the network architecture, in which the convolutions are defined
promising performance [29]. for each cluster. However, the spatial approaches tend to have
Currently, the graph-structured data is usually generalized as difficulties in finding a shift-invariance convolution for non-
Convolutional Neural Networks (CNNs). A CNN is a sequence grid data. To overcome this problem, Hechtlinger et al. [32]
of layers, each of them transforms one volume of activa- proposed a spatial CNN, which uses the relative distance
tions to another through a differentiable function. Usually between nodes. Assume G = (V , E ) is a graph, where

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2601

drawn from different feature spaces and distributions. Transfer


learning stores knowledge gained from solving one set of prob-
lems and applies it to a different but related problem set. By
using transfer learning in deep learning networks, the intrinsic
information learned by the DL network is transferred from the
source domain to the target domain, thereby building a model
for a new but related task in the target domain without using
new data [37]. In Lee’s scheme, a graph is first generated by
the co-occurrence graph estimation (CoGE) [29] or supervised
graph estimation (SGE) [33]. Then, the intrinsic geometric
characteristics of the graph is extracted via Laplacian matrix,
where three Laplacian operators are used for comparison pur-
pose. Following that, the convolutional networks are applied
to the graph, for which the weights of the DL model are
determined through the training process. To learn various data
features, the convolution operation is re-defined with spectral
information from spatial domain. This operation allows the DL
model to transfer the data-driven structural features from the
original domain to an appropriate spectral domain, thereby
the intrinsic geometric information of the spatial graphs is
effectively extracted.

III. D EEP L EARNING FOR P HYSICAL L AYER D ESIGN


DL plays important roles in the Physical Layer (PL) of wire-
less networks. For instance, DL can help to determine the most
suitable modulation/encoding schemes according to the com-
prehensive analysis of the complex radio conditions, including
spectrum availability, interference distribution, node mobility,
application types, etc. In the following discussions, we will
provide some typical DL applications for PL function control.
Fig. 6. Convolutional Neural Networks. (a) Grid Structure CNN; (b) Graph
Structure CNN in Spatial Domain; (c) Graph Structure CNN in Spectral
Domain. A. DL for Interference Alignment
Interference Alignment (IA) has attracted extensive interests
V = (X1 , X2 , . . . , XN ) is a set of N features (vertices) and nowadays, for its improved channel utilization by allow-
E is a set of edges. To select the neighbors for a node, ing multiple transmitter-receiver pairs communicating via the
a graph transition matrix, P, is used, of which element pij same radio resources. In Multi-Input Multi-Output (MIMO)
denotes the probability of moving from node Xi to node Xj . networks, IA uses linear precoding technique to align trans-
Meanwhile,the expected number of visits, Q k , is calculated mission signals in such a way that the interference signal lies
k
as Q k = i=0 pij , where pij ∈ P is the probability of
k k k in a reduced dimensional subspace at each receiver [38], [39].
moving from node Xi to node Xj . The convolution for the This coordination between the transmitter and receiver breaks
node is conducted upon the top α neighbors with the high- the throughput limitation imposed by MIMO antennas’
est expected visit numbers (α is a constant). Therefore, the interference problem.
weights are decided according to the distance indicated by the He et al. [40] proposed to use deep Q-learning to obtain the
transition matrix. optimal IA user selection policy in the cache-enabled oppor-
The graph-oriented CNN can also perform on spectral tunistic IA networks, as shown in Fig. 7. In this scheme, a
domain graph, i.e., the input graph is first transmitted from central scheduler collects the channel condition and the cache
graph domain to spectral domain, then the spectral domain status of each user, and allocates channel resources to each
graph is sent to CNN for training and testing purpose, as user. All the users are connected to the central scheduler via a
shown in Fig. 6(c). For instance, in [31] and [33], the spec- backhaul network with a total capacity of Ctotal . The channel
tral decomposition of the graph Laplacian is first used to is time-varying characterized by finite-state Markov model.
derive the eigenvectors of the spatial domain graph, then Each transmitter is equipped with a cache, which stores an
the convolution is implemented upon the spectral graphs. amount of frequently requested information. This in-network
Furthermore, in [34] and [35], a ChebNet is proposed, which caching design efficiently reduces the transmission of dupli-
uses Chebyshev polynomials of the Laplacian to learn the fil- cate contents. Assume that there are L candidates wanting to
ter structures for the graph data. Lee et al. [36] proposed a join the IA network, an action is determined in each time slot,
DL scheme performed in spectral domain, which incorporates indicating which candidates are chosen to be allocated com-
transfer learning to allow training data and testing data to be munication resources based on their current Signal-to-Noise

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2602 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

Fig. 7. Cache-Enabled Opportunistic IA Networks.

Fig. 8. DQN-based Anti-Jamming System.

Ratios (SNRs). The system state at time slot t is defined as


x (t) = {γ1 (t), c1 (t), γ2 (t), c2 (t), . . . , γL (t), cL (t)}, where the huge number of frequency channels and the time con-
γi (t) and ci (t) denote the channel state and the cache state straint of the decision making process, a convolutional neural
of candidate i, respectively (i = 1, 2, . . . , L). System action network adopted in [27] was used to estimate the reward for
is represented as a(t) = {a1 (t), a2 (t), . . . , aL (t)}, where each action, which consists of two convolutional layers and
ai (t) = 0 indicates that candidate i is not selected to be allo- two fully connected layers (see Fig. 5).
cated communication resources, and ai (t) = 1 indicates being
selected. The reward function is defined as to maximize the C. DL for Modulation Classification
throughput of the IA network. If the requested content is in the
Modulation classification identifies the modulation type for
local cache, the candidate is provided the maximum data rate
the received signals. To improve the accuracy of the clas-
by the IA. However, if the content is not in candidate’s cache,
sification for complex modulation signals, Peng et al. [42]
certain amount of bandwidth needs to be used for content
proposed a CNN-based DL scheme. Since different modu-
transmission.
lation methods may have particular constellation diagrams,
Note that in this scheme, a central scheduler is needed,
this scheme uses AlexNet to classify constellation diagrams,
which collects channel state information as the input of the
thereby pinpointing the modulation method. AlexNet is a CNN
deep Q network and implements the resource allocation com-
based deep learning model that comprises thousands of neu-
putation. This structure arouses vulnerability of the entire
rons and millions of connections, and it can classify 1.2 million
system. Furthermore, the employment of the central scheduler
images into 1000 classes [4], [43]. Simulation shows that this
is intractable.
scheme can accurately differentiate QPSK, 8PSK, 16QAM and
64QAM signals and has comparable accuracy compared to the
B. DL for Jamming Resistance traditional modulation classification schemes such as cumu-
In cognitive radio networks, when the secondary users (SUs) lant based scheme and Support Vector Machine (SVM) based
tries to join the network, they need to 1) avoid interfering with scheme. The DL-based modulation classification is a promis-
the primary users (PUs) and 2) counteract jammers. Spread ing topic. However, merely considering the graphic pattern of
spectrum is one of the most popular anti-jamming techniques. the constellation diagram may limits the classification effect.
However, smart and cooperative jammers can still block some The classification performance could be improved if the image
channels and eavesdrop the control channel. classification is aimed by modulation parameters analysis.
Han et al. [41] proposed a deep Q-learning based, two-
dimensional anti-jamming scheme executed by the SUs. It D. DL for Physical Coding
utilizes both frequency hopping and SU’s mobility to confront In addition, deep learning is also used in error correcting
smart jammers, as shown in Fig. 8. In this scheme, the action codes. In [44] and [45], the belief propagation (BP) decod-
executed by the SU is represented as at ∈ {0, 1, . . . , N }, ing algorithm of the low-density parity-check (LDPC) codes
where at = 1, 2, . . . , N indicates which frequency channel the is improved by DL. Tanner graph is extensively used in the BP
SU is going to access (frequency hopping) , and at = 0 indi- decoding process. However, it is a challenging task to build
cates that the SU will leave the area and find another Access an efficient parity check matrix, which can be expressed by
Point (AP) due to heavy jamming (mobility). However, it is the edges of the Tanner graph. Nachmani et al. [45] assigned
not clearly addressed how does the SU move. An efficient weight to each edge of the Tanner graph, then these weights
moving strategy of SU is crucial to find an optimal access are trained using stochastic gradient descent. Using this Tanner
point and to avoid duplicated request. The scheme is summa- graph trained by the DL, the Bit Error Rate (BER) is signif-
rized in Table I, where N is the number of frequency channels, icantly decreased. In [46] and [47], the performance of polar
W is memory length, r is the utility of the SU. Considering codes is improved by using a decoding algorithm trained by

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2603

TABLE II
C OMPARISON OF D EEP L EARNING A PPLICATIONS IN E RROR C ORRECTING C ODES AND S IGNAL D ETECTION

45

46

47

48

49

50

51

DL. These works represent a promising application of DL,


i.e., to learn a structure-based decoding network. In addi-
tion, DL networks are also used in signal detection scenarios,
such as multiple input multiple output (MIMO) signal detec-
tion [48], [49] and chemical signal detection [50]. Using the
detection models optimized by DL, the transmission signals
are more accurately deduced and the BER is decreased as a
consequence.
O’Shea and Hoydis [51] proposed a novel idea which treats
the physical layer as an end-to-end autoencoder. The autoen- Fig. 9. RL-LSTM based Scheme for Spectrum Allocation.
coder includes the functions of modulation, error correcting
coding, signal classification, etc. Then the autoencoder is LDPC codes, etc. The performances of these operations could
trained as a CNN. This approach is tested in a single end- be significantly improved by DL technique. However, most
to-end communication system, multiple-transmitter/receiver physical layer problems have strict limitation towards reac-
system and radio transformer networks. Compared to the tra- tion time, therefore, the employment of the DL server and the
ditional modulation methods (e.g., BPSK and QAM) and computational complexity control are crucial issues of the DL
error correcting code (e.g., Hamming code), the autoencoder applications in physical layer.
decreases the block error rate (BLER) by 1-5 times in multiple-
transmitter/receiver systems, and decreases the BLER by 1-7
times in Rayleigh fading communication scenarios (with radio IV. DL FOR DATA L INK L AYER
transformer networks). A comparison on the DL applications A. DL for Spectrum Allocation
in error correcting codes and signal detection is present in LTE-U allows Small Base Station (SBS) to access the unli-
Table II. censed spectrum, thereby providing an efficient solution in
radio spectrum utilization. To efficiently and proactively allo-
cate the unlicensed spectrum, Challita et al. [52] proposed a
E. A Brief Discussion on DL Application in Physical Layer resource allocation scheme for LTE in unlicensed spectrum
In wireless networks, interference alignment and jamming (LTE-U) by using Reinforcement Learning and Long Short-
resistance are two of the trickiest problems, considering the Term Memory (RL-LSTM). In this scheme, the time domain is
large number of nodes, the nodes’ mobility, the variation divided into multiple time windows, denoted as T. Each win-
of channel conditions, the complex frequency usage, etc. dow is further divided into multiple time epochs, denoted as t.
DL is an ideal tool to deal with the complicated problems, Assume there are J SBSs and M − J WiFi stations. Each node
since it abstracts the intrinsic patterns from hybrid and vast has a LSTM encoder unit, which learns a vector representing
physical layer parameters. In addition, modulation and error the historical traffic loads of the SBS or the WiFi station [53],
correction coding are basic functions of the physical layer, as shown in Fig. 9. All LSTMs comprise a traffic encoder.
which tend to demand huge computations in modern networks, Following that, a Multi-Layer Perceptron (MLP) abstracts all
such as Orthogonal frequency-division multiplexing (OFDM) the historical traffic load vectors as a single vector, which indi-
modulation, Trellis coded modulation (TCM), Turbo codes, cates the traffic values of all SBSs and WiFi stations on all

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2604 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

TABLE III
PARAMETERS OF D EEP Q-L EARNING S CHEME FOR R ESOURCE A LLOCATION

the unlicensed channels. Finally, an action decoder interprets P(A, S, G) comprises actual power consumption and transition
the abstract vector into multiple predicted action sequences power amount (due to sleep/active switch), i.e.,
for the SBSs. For a SBS j, the goal is to maximize the total   1 
throughput, uj , during its allocated airtime with the selected P (A, S , G) = |wr ,u |2 + Pr ,active
channel C and the time window T, i.e., η
r ∈A u∈U l r ∈A
 

T 
C + Pr ,sleep + Pr ,transition , (9)
uj (aj , a−j ) = αj ,c,t γj ,c,t (8) r ∈S r ∈G
t=1 c=1
where aj denotes the action vector of SBS j, a−j denotes where wr ,u is the beamforming weight from RRH r to user
the action vector of all other SBSs except j, αj ,c,t is the u, ηl is the drain efficiency constant of the power amplifier,
achievable airtime fraction of SBS j on channel c in time A, S and G represent the sets of active, sleep and transition
epoch t, and γj ,c,t is a channel-related parameter. To achieve RRHs, respectively, U is the user set, and Pr ,active , Pr ,sleep
the optimization goal, the RL algorithm is used to train the and Pr ,transition are RRH’s power consumptions in active,
weights of the traffic encoder and the action decoder, for sleep and transition modes, respectively.
which the reward is defined as the approximation of the SBS’s For the active RRHs selected by the first step, the DRL
throughput, ûj (aj , a−j ). By maximizing the expected reward agent computes the optimal beamforming weights by solving
ûj (aj , a−j ) according to the gradient with respect to the pol- the following optimization problem:
icy parameters, the weights of the RL neural network can be   1
trained [54], [55]. The simulations were run upon the dataset minimize |wr ,u |2
ηl
provided by [56]. Compared to the traditional reactive allo- r ∈A u∈U
cation approaches, this scheme increases the average airtime subject to SINRu ≥ γu , u∈U
 1
allocated for LTE-U by around 18%. |wr ,u |2 ≤ Pr , r ∈A (10)
Cloud Radio Access Network (RAN) is proposed for future ηl
u∈U
cellular networks, e.g., 5G, which is a centralized, cloud-
computing-based radio access network. In a cloud RAN, there where Ru is user u’s demanded data rate, Pr is RRH r’s max-
are a central Base Band Unit (BBU) pool in the cloud and imum allowable transmit power, and γu = Γm (2Ru /B − 1)
many distributed Remote Radio Heads (RRHs) near the users. (B is the channel bandwidth, and Γm is the SNR gap depend-
The RRHs only maintain basic transmission functions, com- ing on modulation). Compared to the single BS association
pressing and forwarding users’ radio signals to the BBUs via approach, this scheme is shown to satisfy the users’ demands
fronthaul links. The resource allocation problem, i.e., how to better when the amount of demand is high. And compared
minimize power consumption of the RRHs while satisfying to the full coordinated association approach, this scheme
users’ demands, has become one of the main tasks in cloud consumes less power.
RANs. To tackle this problem, Xu et al. [57] proposed a DL- Sun et al. [58] proposed a DL-based wireless resource allo-
based scheme for power-efficient resource allocation in RANs. cation scheme. This scheme puts the resources into a “black
In the scheme, the decisions are made via two steps: first, box” and trains a DNN in such a way that the power allocation
using a deep Q-learning algorithm to determine which RRHs for each transmitter is optimized and the system throughput
should be turned on or switched into sleep status; second, is maximized. The input layer of the DNN is fully connected,
using convex optimization algorithm to calculate the beam- the multiple hidden layers use the Rectified Linear Unite
forming weights from the RRH to the user for all the active (ReLU), max(x , 0), as the activation function, and the out-
RRHs. The parameters of the first step, i.e., Q-learning oper- put layer uses min(max(x , 0), Pmax ) to intake the resource
ation, are shown in Table III. The state representation of constraints, where x is the input of a neural node and Pmax is
time slot t is st = (m1 , m2 , . . . , mR , d1 , d2 , . . . , dU ), where the power budget of each transmitter. The proposed DL-based
mi ∈ {0, 1} denotes whether the i-th RRH is active or sleep, power allocation scheme is tested in Gaussian interference
R is the total number of RRHs, dj ∈ [dmin , dmax ] represents channel (IC) and multi-cell interfering multiple-access channel
the demand of the j-th active RRH. In each time slot, the (IMAC), respectively, and it is shown that compared to random
DRL agent determines which RRH is active. The immediate power allocation and maximum power allocation, the DL-
reward is defined as the gap between the maximum possible based scheme provides much higher throughput, and compared
power consumption Pmax and the actual power consumption, to WMMSE [59], the throughput of the DL-based scheme is
i.e., Pmax − P (A, S , G), where the actual power consumption close to WMMSE while the computation time is much shorter.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2605

Fig. 10. Multi-User DQN for Spectrum Access Optimization.

To maximize the channel utility for multi-user wireless


networks with less computation and limited observations,
Naparstek and Cohen [60] proposed a deep multi-user rein-
forcement learning approach. Assume that there are N users
randomly accessing K orthogonal channels. At each time slot,
a user accesses a channel with a certain probability. If there
is no interference during the channel access and the message
Fig. 11. Spatiotemporal Hybrid Modeling System for Traffic Prediction.
is successfully received, a positive ACK will be received by
the source node. To optimize the channel utility, the action
of user n at time slot t is defined as a vector of size K + 1, scheme [57], the channel-related parameters are used to cal-
i.e., an (t) = (0, 0, . . . , 0, 1, 0, . . . , 0), where the single 1 indi- culate the achievable data rate for users. However, the real
cates the channel chosen by the user. (If the first element is channels might be so complicated that the channel estimations
1, it indicates that no channel is chosen.) Meanwhile, define are not accurate, which may cause bias to the DL training,
a−n (t) = {ai (t)}|i=n as the action of any other user except thereby resulting in deteriorate decisions of the spectrum allo-
user n. The ACK message serves as the observation, i.e., cation. Therefore, an accurate estimation towards the channel
on (t) = 1 indicates a successful delivery and on (t) = 0 indi- condition is a crucial and challenging question in the DL-based
cates a failed delivery. For user n, the history is defined as spectrum allocation models.
Hn (t) = (an (1), . . . , an (t), on (1), . . . , on (t)), and the pol-
icy σn (t) is the set of weights when mapping from history B. DL for Traffic Prediction
Hn (t −1) to action an (t). The accumulated discounted reward Most exiting schemes that optimize resource allocation
of user n is Rn = T t=1 γt−1 rn (t), where rn (t) is the reward assume some given factors, such as traffic load, spectrum
of user n at time slot t, which depends on both an (t − 1) and usage, etc. However, Wang et al. [62] pointed out that these
a−n (t − 1), γ is the discount factor, and T is time duration. factors could vary significantly both temporally and spatially.
For an arbitrary user, say user n, the goal of training is to find Therefore, simply assuming constant values for these param-
a policy that maximizes the expected accumulated reward for eters may deteriorate the effect of resource allocation. To
the user, i.e., maxE[Rn |σn ]. solve this problem, a spatiotemporal modeling scheme based
σn
The architecture of the multi-user DQN for spectrum access on hybrid DL was proposed in [62] to predict the traffic in
optimization of user n is shown in Fig. 10. The input is cellular networks. In this scheme, an auto encoder model,
composed of user n’s action an (t − 1), the capacity of each which consists of a Global Stacked Auto Encoder (GSAE)
channel, and the observation of user n, on (t −1). An LSTM is and multiple Local Stacked Auto Encoders (LSAEs), is used
adopted to maintain internal states and to aggregate observa- for spatial modeling. Meanwhile, the long short-term memory
tions, since the network state is partially observed and depends units (LSTMs) are adopted for temporal modeling, as shown
on multiple users. Since some states are independent of the in Fig. 11. When predicting the traffic of a cell, the historical
users’ action, a duel DQN is adopted to achieve accurate esti- data of both the cell itself (red hexagon) and its neighbor-
mation. The V-value DQN estimates the average Q-value of ing cells (green hexagons) are collected. Each cell has its
the state V (sn (t)). The A-value DQN estimates the advantage LSAE for representation encoding. Meanwhile, a GSAE takes
of each action A(an (t)). Then the final Q-value of the action all the cell data and produces a global representation. The
an (t) is composed of V (sn (t)) and A(an (t)). The output of local representation is combined with global representation to
the system is a vector with size K + 1, each element of which produce spatial modeling and prediction. The output of spa-
indicates the estimated Q-value for the transmitted message in tial modeling is then sent to LSTM for temporal modeling
the corresponding channel, including no transmission (indi- and prediction. Using this spatiotemporal modeling scheme,
cated by the first element of the output). It was shown that the traffic of a large LTE network with 2,844 base stations
this scheme almost doubled the average channel utilization (BSs) and a coverage of 6,500 km2 is precisely predicted, and
compared to traditional slotted-Aloha scheme [61]. the parameters of Mean Square Error (MSE), Mean Absolute
Most DL-based spectrum allocation schemes first estimate Error (MAE) and Log Loss are measured to evaluate the
channel conditions. Based on the channel-related parameters prediction performance. It was shown that such a scheme
the rewards of an action (e.g., throughput, power consumption, had a significant improvement compared to Auto Regression
etc.) are determined by the deep learning system. For example, Integrated Moving Average (ARIMA) [63] and Support Vector
in (11), the channel-related parameters are used to calculate Regression (SVR) [64], [65], which are two most widely used
the throughput of the small base station, and in Xu et al.’s methods for time series analysis.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2606 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

TABLE IV
C OMPARISON OF D EEP L EARNING A PPLICATIONS IN DATA L INK L AYER

52

57

58

60

62

66

To predict the traffic for a cell, the traffics of both the cell The output of the DBN is the evaluation values of all links,
itself and its neighbors are input into the LSAE and GSAE. Y = (y1 , y2 , . . . , yM ), where M is the number of links in
The size of the neighboring region should be carefully bal- the entire network. Each element of Y, denoted as yi , indi-
anced between the prediction accuracy and the computation. In cates the probability of the input flow belonging to link i
the simulations in [62], a 11 × 11 square is used as a neighbor- (i = 1, 2, . . . , M ). Apparently, the higher the value of yi is, the
ing region, i.e., for each cell, the traffics of its 120 neighboring more likely that link i will be used by the flow. Based on the
cells are considered. Another tricky issue of traffic prediction evaluation results, the links that are not likely to be scheduled
is the temporal correlation, which may have periodicity in the for a flow will be excluded from the link optimization pro-
range of day, week, month, and year. cess. This approach efficiently reduces the problem size of link
optimization. Simulation results show that the scheme reduces
C. DL for Link Evaluation the computation cost by at least 50% without decreasing the
Due to the huge size and complex structure of modern optimization performance.
networks (such as multi-layer structure, heterogeneous charac-
teristics, hybrid network resources, etc.), the scale of network D. A Brief Discussion on DL Application in Data
optimization problem tends to be enormous. Therefore, reduc- Link Layer
ing the computational complexity is a critical problem. For The applications of DL in data link layer are mostly focus-
the link evaluation based optimization problem, Liu et al. [66] ing on resource allocation, traffic prediction, and link evalu-
proposed to reduce the problem size instead of to reduce the ation problems, which yield promising performance improve-
algorithm complexity. In their scheme, one possible status ment, as shown in Table IV. Considering the large size of
of all virtual links is defined as a network pattern (denoted modern network, DL system usually needs to read tremen-
as set A), and the goal is to minimize the overall power dous DLL parameters to make a decision. Therefore, how
consumption by scheduling all the patterns appropriately. to limit computation and data size are huge challenges for
This optimization goal can be achieved by solving a Linear the deep learning applications in DLL. Meanwhile, accurate
Programming(LP) problem, for which the objective function estimations towards the channel conditions are crucial for the
is min E = a∈A Pa ta . Here Pa is the power consumption deep learning system to make accurate DLL decisions, which
of pattern a, and ta is the active time. However, the problem is challenging due to the fast channel variations and the time
scale is huge due to the large number of virtual links. To limit of the decision-making process.
reduce the problem size, the authors suggest that many vir-
tual links of the network would not be scheduled, or merely V. ROUTING L AYER
carry a small amount of traffic. If these links are excluded
Modern routing protocols developed for wireless networks
from the LP problem, the computation will be magnificently
are basically categorized into four types: routing-table-based
decreased without much degradation of the optimization objec-
proactive protocols, on-demand reactive protocols, geographical
tive. Therefore, a Deep Belief Network (DBN) [67] is first
protocols, and ML/DL-based routing protocols. DL-based rout-
used before the LP model, which evaluates the link quality.
ing protocols have been extensively studied in the past several
The input of the DBN is flow information, which is repre-
years due to its superior performance for complex networks.
sented by a flow demand vector X = (x1 , x2 , . . . , xN ), where
N is the total number of network nodes. For a flow with a
demand of dc and travelling from the source node ns to the A. Lifetime-Aware Routing Based on RL
destination node nd , the element of X is xi = dc if i = ns , Underwater sensor network usually confronts two big chal-
xi = −dc if i = nd , otherwise xi = 0 (i = 1, 2, . . . , N ). lenges, i.e., a large propagation delay due to the use of

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2607

acoustic channels and the stringent power usage due to each node (after the packet has been delivered) are shown in
the high power consumption and inconvenience of battery Fig. 12(a).
charging. To deal with these challenges, a balanced rout- For the second packet, node s1 calculates the Q values of
ing protocol that distributes traffic evenly among all sensors its neighbors as follows:
was suggested in [68], which proposed an adaptive, energy-
Q(s1 , a2 ) = rt + γ(Psa12s2 V (s2 ) + Psa12s1 V (s1 )
efficient, and lifetime-aware routing scheme, called QELAR,
based on Q-learning algorithm. For the Q-learning model = −1 + 0.5(−1) = −1.5
[S , A, Pa (s, s  ), Ra (s, s  )], where S, A, P and R are the set of Q(s1 , a3 ) = rt + γ(Psa13s3 V (s3 ) + Psa13s1 V (s1 )
states, actions, state transition probabilities and rewards, the = −1 + 0.5V (0) = −1 (16)
value of taking action a in state s under a policy π, Qπ (s, a),
is defined as Therefore, node s1 updates its V values as V (s1 ) =
maxQ(s1 , a) = −1, and chooses the node with a larger Q
a
Qπ (s, a) = Eπ {Rt |st = s, at = a} value, which is s3 , to forward the packet. In this way, the

 previous packet forwarding conducted by node s2 acts as a
= Eπ γk rt+k |st = s, at = a . (11) ‘penalty’ in (17), which causes node s1 to choose node s3 to
k =0 forward the current packet. Node s3 then calculates the Q val-
The optimal value of state s is defined as V ∗ (s) = ues of its neighbors as Q(s3 , a1 ) = −1.5, Q(s3 , a2 ) = −1.5
maxQ ∗ (s, a). To consider the nodes’ energy condition, the and Q(s3 , a5 ) = −1. Thus node s3 forwards the packet to
a
scheme assumes that the residual energy of a node is Eres (s), node s5 and updates its V values as V (s3 ) = −1. This
the initial energy of a node is Einit (s), and the average resid- procedure is repeated for each packet. Finally, the V val-
ual energy in the group including the node is E (s). If a packet ues of each node converge to stable status, as shown in
is successfully transferred from node s to node s , the reward is Fig. 12(c). To balance the tasks among nodes, the resid-
ual energy of each node should be considered. Therefore,
Ra  (s, s  ) = −g − α1 [c(s) + c(s  )] + α2 [d (s) + c(s  )], in (15) and (16), α1 , α2 , β1 , β2 ∈ (0, 1]. In this circum-
(12) stance, the V value of each node may converge as shown in
Fig. 12(d), where the number tagged to each node represents
where c(s) = 1 − Eres (s)/Einit (s) and d (s) = the residual energy. Compared to the Vector-Based-Forwarding
2
π arctan(Eres (s)−E (s)) are residual energy-related rewards, (VBF) scheme [69], which is a popular routing protocol for
α1 and α2 are their weights, and g is a punishment coefficient Underwater Sensor Networks, the lifetime of QELAR is 20%
due to power consumption when a node attempts to forward a longer.
packet. On the other hand, if the packet forwarding from node QELAR forms the routing topology based on the task bal-
s to node s fails, the reward is ance among nodes, which significantly increases the batteries’
Ra  (s, s) = −g − β1 c(s) + β2 d (s), (13) lifetime if the link conditions are perfect. However, in real-
life network, many factors may deteriorate link quality, such
where β1 and β2 are weights. Then the overall reward rt as a large queue in node’s data sending buffer, high mobility,
in (14) is weak signal strength, interference, etc. The deteriorated link
quality may decrease the end-to-end transmission quality and
rt = Pa  (s, s  )Ra  (s, s  ) + Pa  (s, s)Ra  (s, s). (14)
increase packet retransmissions. As a consequence, the batter-
where Pa  (s, s  ) is the transition probability from node s to ies’ lifetime is shortened. Therefore, considering other factors
node s with action a and Pa  (s, s) is the transition probability together with task balance might be a good routing strategy,
from node s to node s with action a (failed data forwarding). especially when the size of the wireless network is large or
For instance, in the network as shown in Fig. 12, node s1 the payload is heavy.
wants to send packets to node s4 . Initially, all the Q values
and V values are set as 0s, and let γ = 0.5 and g = 1. If the B. DL for Routing Path Search
nodes’ residual energy is not considered, α1 = α2 = 0. For In many networks the conditions of routers vary from time
node s1 , since its immediate neighbors are nodes s2 and s3 , to time, including the saturated caches, overloaded routers,
it calculates the following Q values: malfunctional hardware, etc. All these factors may cause the
deterioration of the routers’ performance. In a network with
Q(s1 , a2 ) = rt + γ(Psa12s2 V (s2 ) + Psa12s1 V (s1 )
numerous routers, the data forwarding capability varies in each
= −1 + 0.5V (s2 ) = −1 local network region at each time slot. Since a communication
Q(s1 , a3 ) = rt + γ(Psa13s3 V (s3 ) + Psa13s1 V (s1 ) session may involve many hops from the source to the desti-
= −1 + 0.5V (s3 ) = −1 (15) nation, the routing algorithms confront magnificent challenges
in terms of finding a global optimal path among many can-
Thus, node s1 updates its V values as V (s1 ) = didate nodes in a highly dynamic network environment, and
maxQ(s1 , a) = −1 by choosing either node s2 or node s3 , some nodes may provide good local transmission performance
a
since Q(s1 , a2 ) = Q(s1 , a3 ). Node s2 then forwards the pack- while deteriorate the global end-to-end routing performance.
ets to node s4 and updates its V value through the same Finding a global optimal path demands heavy computa-
procedure. The path of the first packet and all the V values of tion load. DL can be an efficient approach to relieve the path

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2608 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

Fig. 12. Reward Value Variation of QELAR Scheme. (a) first packet (without energy consideration) (b) second packet (without energy consideration)
(c) converged V values (without energy consideration) (d) converged V values (with energy consideration).

search burden. Kato et al. [70] proposed an DL approach for


the traffic control in heterogeneous networks. First, the tra-
ditional routing protocols such as Open Shortest Path First
(OSPF) are executed in the network for performance collec-
tion purpose. Once enough parameters have been collected,
a supervised training process is implemented. In the training
phase, each node trains M models, where M is the number of
potential receivers in the entire network. The training proce-
dure is initialized by a greedy layer-wise training method and
is fine-tuned by a backpropagation algorithm [71]. Once the
training is finished, the optimal path can be found in the run-
ning phase. To find the optimal path from a source to the
destination, the DL model needs to be run for k rounds by the
source node, where k is the number of hops from the source
to the destination. For each round, the history of the traffic Fig. 13. A Deep Learning based Routing Protocol.
patterns of all the routers in the network serves as the input
and only one router is chosen as the output.
Fig. 13 shows an example. Assume there are 10 routers in However, the source node has to train many DL models, which
the entire network, the source node is n1 , and the destina- demands huge computation power and storage for every node.
tion node is n10 . The first step of the running phase is to find Mao et al. [72] integrated this method with programmable
the optimal router immediately next to the source. To do so, routers and achieved good performances.
the DL model DL1,10 is run by n1 . The input of DL1,10 is
vector A = [α1 , α2 , . . . , α10 ], where αi is the traffic pattern
of router i. At the output side, one out of 10 routers is cho- C. DL for Other Routing Performance Optimizations
sen, indicating the first hop in the path chosen by the source, Natural disasters and terrorist attacks may damage the
which is n3 in the example. Following that, node n1 runs communication infrastructures. In these situations, the col-
model DL3,10 to find the second hop node. This operation is laborations between infrastructure devices and wireless nodes
repeated until the destination node is reached. Compared to (e.g., ad-hoc nodes) are crucial to maintain effective com-
OSPF, this scheme decreases the routing overhead by about munications. The routing design in such a hybrid wireless
70% and increases the throughput by about 2%. Apparently, network is challenging. Lee [73] proposed a DL-based rout-
there is no need for a central controller to implement the ing scheme, which treats connectivity as the routing priority.
DL-based routing algorithm, which increases the flexibility. In their scheme, the degree of each node, which indicates

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2609

the connectivity of a node, is first evaluated using DL environment data, such as nodes’ energy condition, queue size,
algorithm. Following that, a virtual route is generated by signal strength, etc., have to be sent to the central controller.
Viterbi algorithm [74] with the consideration of node degree. In this circumstance, transmission load yielded by the environ-
Then, an IP-based routing procedure is implemented to estab- ment data is huge, and too much overhead decreases the good
lish the route in the hybrid network. This scheme increased throughput of the network. Second, routing topology needs to
the reachability compared to AODV, OLSR, and ZRP routing be built up within limited time. However, the channel envi-
protocols. Note that there is a demand of a Route Information ronment data may be delayed when transmitted to the central
Server (RIS) in this scheme, which determines the node degree controller, thereby causing the delay of the routing formation.
using deep learning algorithm and particular hardware. Third, less flexibility, i.e., a central node running DL algo-
Stampa et al. [75] used deep reinforcement learning to rithm is not always available. For instance, the routing method
optimize the routing performance with the aim of reducing proposed in [73] adopts a centralized DL strategy, which trains
transmission delay. The DRL network uses traffic matrix as the the DL model in the base station and classifies nodes’ con-
state, a path from the source to the destination as the action, nectivity level thereby. However, for some ad hoc network, it
and the mean of end-to-end delays as the reward. Note that the is difficult to find a central server that has huge computation
scheme only considers the traffic matrix, i.e., the bandwidth power as well as an appropriate geographic location.
requests of the traffic flows, as the state and doesn’t consider On the other hand, if a distributed routing strategy is used,
other network factors, such as nodes’ queue size, link quality, each node (or each source node) has to train several DL mod-
etc. The routing results may be further optimized if more con- els. Therefore, huge computation power and storage are needed
ditions are considered. To test the performance, they used the for every node. For instance, the distributed DL strategy has
OMNeT++ discrete event simulator [76], [77] to collect trans- been adopted by QELAR [68] and Kato’s scheme [70], where
mission delay with given traffic and routing parameters [78]. the source node triggers the DL process and generates routing
Their experimental results showed a significant improvement topology using the trained models.
on transmission delay with various traffic intensities, compared Therefore, for the DL-based routing design, a sophisticated
to the benchmark routing scheme. choice between the centralized and distributed strategy is cru-
Valadarsky et al. [79] applied machine learning and DRL cial, which should be made based on plenty of considerations,
respectively for network routing. In the DRL approach, the such as the network structure and size, routing algorithm, deep
environment of the network is described by the demand matrix learning method, etc.
(DM), of which element dij indicates the traffic demand
between the source node i and the destination node j. (i , j =
VI. DL FOR OTHER N ETWORK F UNCTIONS
1, 2, . . . , N , and N is the number of nodes in the entire
network.) The reward of the DRL is the link utilization rate. In A. Vehicle Network Scheduling
each time slot, the agent chooses a routing scheme based on Vehicular Ad-Hoc Network (VANET) provides a fully con-
the routing strategy and DMs. Then the DRL system learns a nected network among vehicles and infrastructures, and is the
mapping from DMs to the routing schemes in such a manner foundation to establish an intelligent transportation system.
that the discounted reward is maximized. Their simulations use There are two types of communications in VANETs, i.e.,
the open-source implementation of TRPO [80], [81], and it is Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I),
shown that learning from the historical routing schemes with where the infrastructures are usually composed of Road-
the consideration of the demand matrix and link utilization Side Units (RSUs). The prior communication in VANETs is
provides an efficient approach for the agent to smartly choose Driving-Safety-Related (DSR) services. However, in V2I com-
the optimal routing topology for future data forwarding. munications, there are many non-DSR services, such as Web
Valadarsky’s scheme has some similarities with Stampa’s browsing and online games. To guarantee QoS performance,
scheme, but using different reward object. From their work Atallah et al. [82] proposed a DRL-based scheduling scheme
we see that how to choose reward function is a crucial issue among the RSUs, which targets to reduce the energy con-
for the DL applications in routing layer. According to the sumption of the road-side units while providing a safe driving
network characteristics and environmental features, design- environment. The DRL agent is deployed at the RSUs and
ers choose the most important attribute to optimize, which interacts with the VANET environment. Assuming that there
could be throughput, end-to-end delay, link utilization, flow are M vehicles in a RSU’s coverage, at time slot i, the action ai
completion time, etc. taken by the agent is either to receive DSR messages from the
vehicle, represented as ai = 0, or to send non-DSR messages
to a vehicle upon a download service request, represented as
D. A Brief Discussion on DL Application in Routing Layer ai = j , where j = 1, 2, . . . , M indicates which vehicle is
Centralized routing versus distributed routing is a tricky downloading non-DSR messages. If the RSU chooses to trans-
choice for DL-based routing schemes. This is because that mit data to a vehicle, the reward is measured by the number of
the deep learning process demands tremendous parameters as transmitted bits, and the cost is composed of two parts: 1) the
input to make a decision as well as huge computation to train power consumed by the RSU, and 2) the waiting time of a DSR
the neural network. message potentially occurred during this non-DSR communi-
If a centralized routing strategy is adopted, three main issues cation period. On the other hand, if the RSU chooses to receive
should be carefully addressed. First, large amount of network DSR messages, the induced cost is the power consumption

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2610 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

Fig. 14. BW-NQ-DNN Framework.

of the message receiving operations. The DRL-based scheme Each element


 of Y, say yi , is nonlinearly companded as
can 1) minimize the delay of DSR messages, 2) maximize the φ(yi ) = K j =−K ci Ψ(yi /Δ−j ), where ci is the coefficient of
service amount, including both DSR and non-DSR messages, the nonlinear compand function, Ψ is a basis function, and Δ
and 3) extend the battery lifetime of RSUs. Their simula- is a constant. All the components of Y are companded by the
tions showed that the DQN-based approach achieves better same process. Without loss of generality, only the process of
performances compared to Random Vehicle Selection (RVS) y1 is shown in Fig. 14. To avoid the discontinuity of the deriva-
algorithm, Least Residual Time (LRT) algorithm, and Greedy tive of the quantized value φ, a straight-through estimator [84]
Power Conservation (GPC) algorithm in terms of RSU battery that considers saturation effect, denoted as ES, is used upon
lifetime, RSU busy time, and incomplete requests ratio. the gradient ∂C /∂φ. Finally, the compressed and quantized
measurements can be recovered through a nonlinear recov-
ery network based on Multi-Layer Perceptron (MLP) [85]
B. Sensor Data Reduction
architecture. Since the whole system aims at the optimization
Wireless sensor networks have strict constraints on the size of the Signal to Noise and Distortion Ratio (SNDR) for
of data it transmits due to its limited network capacity. For the recovered messages, the cost function is chosen as the
instance, the Implanted Medical Devices (IMDs) are usu- Mean Squared Error (MSE) between the recovered message,
ally constrained by power consumption. Unfortunately, the X̂ , and the original message, X. To update the parameters
network tends to be overwhelmed by a large amount of sen- of the compression, quantization and recovery networks, the
sor data. To reduce the data size, Quantized Compressed Stochastic Gradient Descent (SGD) algorithm [86] is used in
Sensing (QCS) technique is extensively used. Assuming that the backward propagation of each part. Compared to the pop-
the original data measured by sensors is X ∈ RN , then the ular QCS schemes such as SDNCS [87], BPDQ [88] and
compressed data is obtained as Y = ΦX , where Y ∈ RM QVMP [89], the DL-based BW-NQ-DNN provides higher
and Φ is the measurement matrix with a size of M × N SNDR and classification accuracy.
(M < N). To retrieve the original data from Y, the input
X must be sparse and the measurement matrix Φ must sat-
isfy the Restricted Isometry Property (RIP), which raises big C. Hardware Resource Allocation
challenges for computation. Sun et al. [83] thus proposed a Operating system supports some application layer tasks for
Binary-Weighed, Non-uniform Quantizer, and Deep Neural the network communications. And hardware resource allo-
Network (BW-NQ-DNN) for QCS. Instead of using a ran- cation significantly impacts the communication performance.
domly or deterministically generated measurement matrix, Mao et al. [90] proposed to use reinforcement learning in
BW-NQ-DNN learns a measurement matrix from the previous resource management. Assume there are D resources serving
experiences (i.e., training data). Furthermore, the BW-NQ- M tasks, and task i lasts for ti seconds and is finished in ci
DNN is used for the optimization of non-uniform quantizer. seconds (i = 1, 2, . . . , M ). The slowdown value of task i is
The BW-NQ-DNN is composed of a compression net, a defined as si = ci /ti , which can effectively serve as an eval-
quantization net, and a recovery net, as shown in Fig. 14. uation index for the resource allocation. The goal of resource
The compression network is fully connected, i.e., it collects management is to minimize the average slowdown value. In
the original N-dimensional data X and represents it with a M- DL-based
 scheme, the reward at each timestep is defined as
dimensional data Y, with each element of Y being connected i∈J −s i where J is the current task set. The state of the
,
with all the elements of X. To reduce the hardware imple- DL system is the current resource allocation plus the resource
mentation complexity, the weights of the compression network demands of all the tasks. The action space is {0, 1, 2, . . . , M },
are constrained as wij ∈ {−1, +1}. The quantization network where at each timestep, action a = 0 indicates the agent
is composed of nonlinear compand and gradient cancelation. does not schedule any task and a = i (i = 1, 2, . . . , M )

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2611

indicates that the agent schedules task i for a specific resource. Lotfollahi et al. [101] proposed a DL-based traffic clas-
Simulation shows that the DL-based resource allocation out- sification method, namely, deep packet, which not only
performs the popular methods, such as the Shortest Job First distinguishes traffic type (such as streaming, P2P, etc.)
(SJF) scheme, Packer and Tetris in terms of the average task but also classifies application types (such as Spotify, Bit
slowdown value. Torrent, etc.). The ‘ISCX VPN-nonVPN traffic dataset’ [95]
was adopted in their experiments. Two DL methods, i.e.,
convolutional NN and stacked autoencoder NN, are applied.
D. Network Security The simulation platform is built based on Keras library [102]
Traffic inference and intrusion detection are crucial issues and Tensorflow, and the scheme is shown to achieve 97.0%
for cyber security. The decision-making process of these prob- precision for traffic type classification and 95.4% accuracy for
lems requires an analysis of a large number of network features application type classification, both outperforming the general
and an abstraction towards the attack-related characteristics. ML-based schemes [95], [103]. A comprehensive comparison
DL schemes have shown promising performances in these upon the traffic classification performances among four
tasks. ANNs, i.e., backpropagation-based multilayer perceptron
1) Traffic Identification: Flow inference aims to describe (BB-MLP), resilient-backpropagation-based multilayer per-
the original flow features generated at the transmitter side ceptron (RBB-MLP), recurrent neural network (RNN), and
according to the received packets at the receiver side. It is deep learning stacked autoencoder (SAE), has been presented
crucial for intrusion detection, traffic monitoring, queue man- by Oliveira et al. [104].
agement, etc. An easy way to identify traffic is to classify them Discussion: To identify traffic accurately, two crucial issues
by port numbers. However, many recent applications, such as should be carefully considered. First, researchers need to decide
P2P traffic and video calls, may use port numbers that are on which layer the DL algorithm is implemented. Many traffic
initially assigned to other traffic types. Therefore, more accu- identification schemes are implemented upon transport layer
rate ways are required to identify traffic types. In the past or IP layer, such as [100]. However, some DL-based schemes
several years, traffic identification methods using statistical analyze MAC layer or application layer features to identify the
models [91], [92] or machine learning [93]–[95] have been traffic type. For instance, Gwon and Kung’s scheme [96] uses
extensively studied, for which the traffic features such as time the runs-and-gaps model upon MAC layer as the input of the DL
interval between packets, packet size, etc., are exploited to ana- model and identifies the traffic type thereby. Another example
lyze traffic types. Due to the complexity of the networks, the is Lotfollahi et al.’s scheme [101], which provides both traffic
patterns of the received flows may have nonlinear alterations, characterization and application identification to meet various
which makes flow inference challenging. requirements. The second issue is to determine what features
In 2014, Gwon and Kung [96] proposed a DL-based flow are used for DL analysis. The choice of data features used
inference scheme, which classifies the received packet pat- for DL models significantly impacts the accuracy of traffic
terns and infers the original properties (e.g., burst size and identification. For instance, [100] uses the scaled values of the
inter-burst gaps). Note that although the flow inference is TCP flow data, especially the first 25 values of the payload,
achieved by exploiting MAC layer parameters, it analyzes as the input of the DL network. Although from intuition and
the TCP/UDP/IP flows. The inference system is composed experience, these payload values indicate the traffic type to a
of two independent layers, each of which comprises a fea- certain extent, more intrinsic features, such as the correlation
ture extractor (FE) and a classifier (CL). For each layer, the and distribution features of the payload values, may further
sparse coding [97] is used to extract features from time-series improve the identification performances of the DL network.
data, and the max pooling [98] is used to reduce the num- 2) Intrusion Detection: Network Intrusion Detection (NID)
ber of features for the purpose of computation reduction. protects networks from malicious attacks by detecting software
The two-layer structure allows both the local features (such intrusions. Traditional NID schemes are mostly based on user
as run and gap sizes) and the global features (such as peri- signatures. However, this method needs the administration cen-
odicity) to be extracted by the learning system. Simulations ter to maintain a large number of user’s signatures. Currently,
show that the deep learning based flow inference scheme has the anomaly-based detection is extensively studied, which ana-
high true positive rate and low false positive rate, compared lyzes network activities and marks out abnormal data access as
to ARMAX-least squares [99], Naive Bayes classifiers and an intrusion. Since the utter goal of NID is to classify network
Gaussian mixture models. traffics into many categories (i.e., normal traffic and various
In 2015, Wang [100] proposed a DL-based traffic identi- abnormal traffic types) according to numerous traffic features,
fication scheme. In this scheme, the TCP flow is used for it is an ideal choice to use DL approaches to detect network
traffic identification, since the bytes of different protocol pay- intrusions by learning traffic features [105].
loads represent different distributions. Therefore, the bytes Niyaz et al. [106] proposed a flow-based, Self-taught
of TCP sessions are first normalized from integers (rang- Learning (STL) [107] approach to detect network intrusion.
ing from 0-255) to decimals (ranging from 0-1). Then the In their scheme, the NSL-KDD dataset [108], [109], a bench-
normalized data is sent to ANN as the input for traffic identi- mark for network intrusion, is used for training and testing.
fication. Simulation shows that this scheme can distinguish 25 The network traffic provided by NSL-KDD dataset includes
most popular protocols, such as SSL, HTTP-Connect, MySQL, normal flows and various anomalous flows, including Denial-
SMB, etc., most of which have a precision of higher than 95%. of-Service (DoS) attack flow, Remote-to-Local (R2L) attack

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2612 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

Fig. 15. Deep Learning Networks Used by Intrusion Detection. (a) Deep Auto-encoder (b) Deep Boltzmann Machine (c) Deep Believe Network.

flow, User-to- Root (U2R) attack flow, Probe attack flow, etc. and classification to a large extent, and one of the most chal-
For each traffic, forty-one features are provided, including the lenging topics is to select appropriate features to balance the
average packet number per flow, average time duration per detection accuracy and computation cost.
flow, protocol types (e.g., TCP, UDP), etc. The scheme in [106] Another type of DL network used for intrusion detec-
chooses 22 out of 41 features for the DL process, which tion is Deep Boltzmann Machine (DBM) [111], in which
consists of two stages: 1) an Unsupervised Feature Learning each node is bidirectionally connected with the nodes of
(UFL) process, which is based on sparse auto-encoder, and other layers, as shown in Fig. 15(b). To decrease the com-
2) a supervised learning process with the goal of classifi- putation cost of the gradient, the intra-layer links (the red
cation. Auto-encoder is a feedforward non-recurrent neural dotted lines in Fig. 15(b)) are abandoned to use, yielding a
network with an input layer, an output layer and one or sev- Restricted Boltzmann Machine (RBM) [112]. As a matter of
eral hidden layers, as shown in Fig. 15(a). Specifically, the fact, in many real applications, the network detectors may
node number of the input layer and the output layer is the not know what features the anomalous traffic possesses. Thus,
same, which is larger than the node number of the hidden Fiore et al. [113] proposed a Discriminative RBM (DRBM)
layer(s). The goal of the output layer is to reconstruct the based intrusion detection method, which is a semi-supervised
input. Therefore, the cost function is composed of an average learning system, i.e., the system is trained only by normal
of sum-of-square errors upon all the inputs, a weight decay traffic data. The trained network is tested by real-world traffic
term to avoid over-fitting, and a sparsity penalty term to main- collected from a workstation for 24 hours and KDD CUP 1999
tain a low activation values. Using the trained DL network, the dataset, respectively, both of which include normal and anoma-
testing traffic is classified as two types, i.e., normal traffic and lous traffic. Simulation results show that, when the learning
anomalous traffic. Simulations showed that the STL scheme system is trained and tested with the real-world data, a high
achieves 88.39% accuracy for 2-class detection (normal and accuracy (about 94%) is obtained. However, when the DBRM
anomaly), and 79.1% accuracy for 5-class detection (normal is trained with KDD dataset and tested with real-world data,
and four different attack categories), which are higher than the accuracy is as low as 84% around.
the accuracies achieved by the Soft-Max Regression (SMR) If we limit the node connections only between the layers,
scheme. the DBM is transformed to Deep Believe Network (DBN).
To reduce the number of features the learning system uses, Alternatively, DBN can be formed by cascading a stack of
Tang et al. [110] proposed a DL-based NID scheme for Restricted Boltzmann Machine in serial. Furthermore, one or
software defined networking (SDN). The data used to train more additional layers can be added to perform classifica-
the learning network is also chosen from NSL-KDD dataset. tion after a supervised learning process. Therefore, DBN is
However, only six features of each traffic flow are considered often pre-trained by unlabeled data (unsupervised learning)
in order to reduce the computation cost, i.e., flow’s duration, and then fine-tuned by labeled data (supervised learning), as
protocol type (e.g., TCP, UDP), number of data bytes from shown in Fig. 15(c). Gao et al. [114] used a DBN to detect
source to destination, number of data bytes from destination network intrusion, for which the learning network is trained
to source, number of connections to the same host, and num- with KDD CUP 1999 dataset via three stages. The first stage
ber of connections to the same service. A deep neural network pre-processes data, for which the features of the traffic are dig-
with three hidden layers is adopted. Although there are four itized and normalized. The second stage pre-trains the DBN,
malicious traffic types in the dataset, this scheme only clas- i.e., the weights of a stack of RBMs were learned through an
sifies the traffic as two types (normal and anomalous), and unsupervised greedy contrastive divergence algorithm. Finally,
it achieved an accuracy of 74.67%. Compared the schemes the weights of the entire DBN are fine-tuned through the back-
in [106] and [110], it can be seen that the detection accuracy propagation of error derivatives by the labeled data. In their
depends on the traffic features the system used for training simulations, 41 features of each KDD CUP 1999 traffic flow

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2613

TABLE V
C OMPARISON OF D EEP L EARNING A PPLICATIONS IN I NTRUSION D ETECTION

106

110

113

114

116

are first mapped to 122 attributes, then several DBNs with two-type detection (normal and anomaly) and 79.1% accuracy
different structures are established. Each DBN in their simula- with five-type detection (normal and four anomaly types).
tions has 122 input elements and 5 output elements (1 normal Thus, designers need to carefully balance the detection accu-
traffic and 4 different attack traffics). However, the hidden racy and the number of detection types. 2) How to select the
layer structures are different, varying from as shallow as 122-5 network features as the input of the DL network. Many cur-
(no hidden layer) to as deep as 122-150-90-50-5 (three hid- rent DL-based intrusion detection schemes use KDD dataset,
den layers with 150, 90, 50 nodes respectively). Apparently, which provides 41 features for each traffic flow. Apparently,
the deeper the DBN becomes, the better detection accuracy using all the 41 features for DL analysis yields a huge bur-
can be achieved. For the 122-150-90-50-5 DBN, the accu- den from the computation’s point of view. Therefore, many
racy reaches 93.49%, outperforming SVM (86.82%) and NN NID schemes select a part of features to detect intrusion. For
(82.3%). Besides, Alom et al. [115] applied the similar method instance, Niyaz et al.’s scheme [106] uses 22 features while
to NSL-KDD dataset. Tang et al.’s scheme [110] uses only 6 features. As a con-
Using similar DBN structure as shown in Fig. 15(c), sequence, Niyaz et al.’s scheme achieves 88.39% accuracy
Kang and Kang [116] proposed an intrusion detection scheme with two-type detection, while Tang et al.’s scheme yields
for In-Vehicle Controller Area Network (CAN). Each CAN 74.67% accuracy. 3) What dataset is used for DL network
packet includes 12 bits of arbitration field, 6 bits of con- training. As we see from Table V, most current schemes use
trol field, 0-8 bytes of data field, etc. Kang and Kang’s NSL-KDD dataset, which is an improved version of KDD
scheme utilizes the data field as the learning object. The data Cup 99 dataset (proposed in 1999). NSL-KDD dataset reduces
field is composed of mode information (such as controlling some redundant records of the KDD Cup 99 dataset, which
wheels) and value information (such as the wheel angle) of makes the sizes of training set and testing set reasonable. The
the Electronics Control Unit (ECU), and they yield differ- NSL-KDD dataset includes normal traffic and four attacking
ent bit distributions. Since there are different attack scenarios, categories, i.e., DoS, U2R, R2L, and probing. However, with
the learning system first uses the mode information to iden- the accelerated development of network attack techniques,
tify the attack scenarios, then trains the learning network for new intrusion methods appear with astonishing speed. The
each scenario. The DBN has less than 64 input nodes (each DL models trained by KDD dataset may yield deteriorated
node corresponds to a bit of the data field but with reduced performance in detecting real-world data, as shown in [113].
number of bits considering the semantics redundancy), sev-
eral hidden layers, and 2 output nodes (indicating normal and
anomalous scenarios). In the testing phase, the attack sce- VII. DL-BASED W IRELESS P LATFORM I MPLEMENTATION
nario of each CAN packet is first determined by matching the There are abundant DL implementation methods, some of
mode information, then the corresponding trained model is which have been performed in wireless networks. In the fol-
used to determine whether the packet is normal or anomalous. lowing, a summary of DL implementations is presented. Those
Experiments show that the scheme achieves 97.8% accuracy, methods have been used in wireless testbeds.
outperforming Support Vector Machine (SVM) and Artificial • 1) MATLAB Neural Network Toolbox: This toolbox
Neural Network (ANN). includes the most popular DL algorithms, such as
Discussion: A comparison towards the DL-based intrusion ANN, CNN, DBN, SAE, and convolutional autoencoders
detection is shown in Table V. Note that for the output clas- (CAE). The input layer takes the original raw data. The
sification number, the number of 2 indicates normal traffic hidden layers perform convolution, pooling, or ReLU
and anomaly traffic, and the number of n (n > 2) indi- functions upon the raw data. The convolution opera-
cates a normal traffic and n − 1 different anomaly traffic tion is composed of a set of convolutional filters, which
types. From the comparison we see that there are several extract certain features from the input data. The pooling
challenges in the topic. 1) How many intrusion types the operations perform the following operations: nonlinear
DL network detects. Apparently, the more types the network sampling upon the output of the convolutional filters,
detects, the lower accuracy will be given. For instance, reducing the dimensions of the parameters, and con-
Niyaz et al.’s scheme [106] achieves 88.39% accuracy with trolling the complexity of the deep learning network.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2614 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

The ReLUs map negative values to zero and keep pos- For those applications, developers implemented the
itive values, thereby improving the training efficiency. algorithms by customized DL systems. The language
By repeating these three functions in the hidden layer used to develop these systems varies from C, C++,
and training the parameters of the functions, specific MATLAB, Python, to Java, depending on the features
features are extracted efficiently for the classification pur- of the learning system, the library used by the learning
pose. Then, the output layer performs classification upon process, and the communication patterns with the other
the features. A softmax function is typically adopted for simulators (such as wireless network simulators).
classification. In addition, there are lots of other popular deep learning
• 2) TensorFlow [117]: It is an open-source software orig- software, such as MXNet [120], developed by Distributed
inally developed by Google Brain team. TensorFlow is Machine Learning Community, Torch [121], Microsoft
written in Python, C++, and CUDA, and it is supported Cognitive Toolkit [122], developed by Microsoft Research, etc.
by Linux, macOS, Windows, and Android systems. It is However, few of them can be found in network applications.
a symbolic math library composed of nodes and edges. Table VI presents a comparison of deep learning software
The nodes in the graph represent mathematical opera- platforms used in wireless networks.
tions, and the edges represent the connections (tensors)
between nodes. TensorFlow is a flexible, flow-based pro- VIII. F UTURE R ESEARCH T RENDS
gramming model. Although it is originally developed
In order to help current researchers to identify unsolved
for conducting ML and deep neural network algorithms,
issues in this important field, in this section we will explain
TensorFlow is capable of conducting many other flow-
10 challenging issues and point out the future research
based implementations.
trends. Although those 10 issues do not represent all the
• 3) Caffe (Convolutional Architecture for Fast Feature
unsolved research topics on DL-based wireless networking,
Embedding) [43]: It is an open-source software tool,
they have long-term dominant impacts on today’s popular
and was developed by Berkeley AI Research (BAIR)
wireless infrastructures, including cognitive radio networks,
and community contributors. Caffe is a DL framework
software-defined networks, dew/cloud computing, big data
targeting image classification and segmentation. It has
networks, etc.
the features of expressive architecture, extensible code,
high speed, and modularity. Caffe supports CNN, RCNN,
LSTM and has fully connected neural network struc- A. (Challenge 1) DL for Transport Layer Optimizations
tures. There are a variety of functions to be chosen Congestion control is the main function of transport layer.
to build a DL network using Caffe, including con- However, the existing congestion control methods are mostly
volution, pooling, inner products, ReLU, logistic unit, based on the end-to-end ACK or NACK feedback to indi-
local response normalization, element-wise operations, rectly deduce the congestion occurrences. For example, TCP
softmax, and hinge. uses ACK feedback to infer the congestion event. The most
• 4) Theano (Keras) [118]: It an open-source Python accurate way is to directly analyze the queues in each node of
library that allows users to symbolically define, optimize, the routing path to pinpoint exactly which node’s queue has
and evaluate mathematical expressions such as multi- overflow event, which indicates the congestion in that node.
dimensional arrays. Users can use Theano to implement Apparently, a single node’s queue cannot reflect the conges-
and train NN models on fast concurrent graphics pro- tion distribution in the entire path. It is important to perform
cessing unit (GPU) architectures. The network is built multi-queue co-modeling between different nodes to detect
by apply nodes and variable nodes, which represent the ‘congestion propagation’. For example, one node may
mathematical operations and tensors, respectively. have very light congestion (i.e., overflow occurs sparsely) in
• 5) Keras [102]: It is an open-source neural network an earlier stage; however, multiple sparse congestions could
library that can run upon TensorFlow, CNTK, Theano, be accumulated into a serious congestion later on another
etc. Keras is originally developed by a Google engi- node. Multi-queue co-modeling can help in finding such an
neer, Francois Chollet. It provides neural network ele- accumulation pattern.
ments such as layers, objectives, optimizers, activation Particularly, DL can be used to perform large-scale network
functions, etc., and it supports convolutional networks, queueing analysis. Assume that each node reports their queue
recurrent networks, and the combination of the two types. status (such as size, input traffic rate, outgoing traffic rate,
In 2017, Google’s TensorFlow team decided to support etc.) to a central node via an out-of-band control channel, the
Keras in TensorFlow’s core library. central node can then run DL to analyze the queues’ data
• 6) WILL [119]: It is an open-source neural network accumulation status. For instance, it can find out whether the
toolkit created by Prevision Limited Company (Hong traffic gets accumulated in a particular node’s queue and may
Kong). It supports convolution, pooling layers, full- cause overflow with a high probability. DL can also help to
connection, and some popular functions such as reLU, find an optimal solution to relieve the congestion situation. For
sigmoid, tanh and softmax. example, it can find a node with relatively small queue size
• 7) Customized models: Many DL-based networks during most of the time, and that node may accept a higher
use special functions or neural network structures, incoming traffic rate; or, it may find another set of nodes near
and some systems run upon embedded systems. the RED zone (i.e., a network area with congested queues), to

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2615

TABLE VI
C OMPARISON OF D EEP L EARNING A PPLICATIONS IN I NTRUSION D ETECTION

104

44 57 58 101

46
51 101

70 72
52 59 62 68 79
82 83 90 96

establish a back-up path to divert the congested traffic from


the main path.
Many interesting research issues can be investigated in the
above scenario. For example, how do we come up with a
time-evolving DL algorithm to detect the multi-queue evolu-
tion pattern? How do we define the congestion threshold, i.e.,
what queue size is a good indication of RED zone? How do
we integrate the congestion control scheme with the back-up
routing path establishment protocol? If the congestion occurs
in multiple nodes which are not neighbors, how do we con-
trol the traffic rates in the congested/non-congested nodes to Fig. 16. BW-NQ-DNN Framework.
achieve a smooth flow in the entire path? etc.

then determine the traffic allocation in different links to avoid


B. (Challenge 2) Using DL to Facilitate Big Data
possible traffic bottleneck.
Transmissions
Again, many research issues need to be investigated. For
Today there are many big data applications such as large- example, how does the DL help to build/maintain a thick rout-
scale smart city monitoring, national healthcare management, ing pipe that can deliver >1T packets per second? How do
air pollution monitoring, etc. Wireless transmission of big data we apply DL to predict the link failure in some hops? What
is necessary in remote sensing and harsh environments without are the appropriate MAC parameters (such as backoff window
the deployment of wires. For example, in a large city, numer- size, time slot length, RTS/CTS timing, etc.) to adapt to the
ous mobile phones can send their data (such as user trajectory, QoS requirements for the huge velocity/volume of big traffic
user behaviors, patient healthcare data, etc.) to nearby wire- among a group of neighboring nodes? and so on.
less base stations (which can be Wi-Fi access points, cellular
network towers, 5G routers, etc.), and eventually reach the
cloud servers. C. (Challenge 3) DL-Based Network Swarming
The transmission of big data is a challenging task due to There are many interesting wireless network swarming
the following 3 reasons: First, there are no standards/protocols applications. A typical example is UAV swarming, i.e., a
to specify the wireless network operations that can efficiently large number of UAVs establish various network topologies to
deliver >100T bits of data per second. Second, existing routing achieve different missions (such as environment monitoring,
protocols cannot provide a ‘thick’ data pipe to concurrently enemy hunting, forest fire control, etc.), as shown in Fig. 16.
deliver >1T packets each second. Third, the network status is Other application scenarios include swarming by undersea
extremely difficult to monitor in real-time, due to the huge vehicles to explore sea resources; collaborated war-fighting
traffic density in very short time (recall that big data has 4Vs robots, etc.
features, i.e., Volume, Velocity, Veracity, and Variety). A challenging issue in swarming control is the node place-
DL is a promising method to analyze big data transmis- ment based on both mission and communication requirements.
sion dynamics in terms of routing delay analysis of big flows, In the beginning of the network deployment, all nodes were
traffic balancing among nodes, and link access control. For randomly distributed. Those nodes may have different density
example, we can use DL to analyze the spatio-temporal pat- distributions (some places have more densely distributed nodes
terns of huge traffic in each hop, and find out the hot spots while other places may be sparse). Then the issue is: how
of the network with the largest amount of big data traffic. By do we guide each node’s mobility trajectory to achieve two
comparing hop-to-hop big traffic delivery delays, we get to purposes: (1) forming the desired swarming shape; (2) main-
know the average link quality/stability in each hop, and can taining good communication architecture? Note that some

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2616 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

data forwarding performance (such as queueing delay, link


rate, packet loss rate, etc.) to the CP in real-time. Thus the CP
is able to build the global network profile. (2) The CP has the
centralized control. DL can be executed in the CP controller(s)
to analyze the network profile and determine the rule changes
of the flow table in DP.
Some research issues exist in the above scenario: First, we
need to solve the network profile formation issue. What types
of parameters need to be collected from the DP? What types
of big data architecture is suitable to network profile descrip-
tion, e.g., big tensor, big graph, or big time series? What
purposes should the DL be used for, e.g., routing decision,
queue control, or schedule control? How do multiple CP con-
Fig. 17. BW-NQ-DNN Framework. trollers coordinate with each other to achieve a consistent view
of the entire network? and many other issues.
nodes may have stronger communication capability than oth-
ers, for example, they may have more powerful directional E. (Challenge 5) Distributed DL Implementation in
antennas, higher speed of radio links, faster CPU speed, etc. Wireless Nodes
Those nodes may be placed into the “bridging” positions in the Although DL is a powerful scheme to extract the network
formation (such as the locations of A and B in Fig. 17). Those dynamics/patterns, it may bring much burden if running in
bridging places play critical roles in swarming shape mainte- a single network node. While by decreasing the DL algo-
nance as well as communication connectivity enhancement. rithm complexity, we can relieve the overhead of the running
Once they are broken, different swarming regions become node. Another efficient method is to distribute the DL com-
isolated. putation load to multiple nodes, i.e., using distributed DL
The entire swarming network may have different cluster dis- implementation.
tributions, i.e., some clusters have more nodes (higher density) Some challenging issues need to be solved when using dis-
the others. This makes node placement becoming a difficult tributed DL model. First, which parts of the DL algorithm
task. What types of nodes should be moved to different clusters can be decomposed into distributed tasks? The gradient neural
or between those clusters? DL could be used here to describe network layer weight updating/error propagation can be out-
the topology profile, such as node communication/computation sourced to a group of neighboring nodes based on the proper
capacity, distance to the desired swarming position (for each allocation of ‘neural cells’ to different nodes. Second, how do
node), mobility speed, etc. Then the bridging points can be different wireless nodes exchange the input parameters and
weighted (i.e., determining their importance levels) based on output data (i.e., calculation results) with each other via MAC
their locations between different types of clusters. The nodes protocols? Note that such a MAC should minimize the chan-
can be dispatched to those places based on the topology nel access collisions by using either well scheduled RTS/CTS
profile. exchanges or TDMA-based transmissions. Third, which node
is responsible for the final DL output layer assembly? How
D. (Challenge 4) Pairing DL With Software-Defined does this node ensure that the distributed algorithm converges
Network (SDN) into a stable result?
The software-defined network (SDN) becomes a promis-
ing networking scheme due to its greatly simplified rout-
ing/switching management via centralized control. It adopts F. (Challenge 6) DL-Based Cross-Layer Design
separate control panel (CP) and data panel (DP) manage- While the above sections have covered different individual
ment. The CP consists of one or multiple controllers that layers in terms of DL applications, cross-layer design may
send the traffic forwarding control rules to the DP. The flow be a more efficient approach to fully explore all the layers’
table(s) in the DP accept those rules to perform data forward- information for long-term network performance optimization.
ing functions. In the SDN the conventional vendor-specific As a matter of fact, each layer could provide many valuable
routers/switches have been replaced by the universal, simple performance metrics (some listed in Table VII) which can
data forwarding devices (called switches) in the DP. The DP be used to achieve cross-layer global network performance
does not run protocol-specific data link/physical layer proto- optimization. For example, the channel quality, antenna beam
cols. Instead, they just simply interpret the flow table rules orientation, node mobility, etc., can be used to determine the
and use the rules to forward the packets to the next switch in traffic allocation in each beam of the antenna. The routing
the DP. layer hop-by-hop delay and packet loss rate can be used to
From the above SDN features we can see that DL and SDN determine the transport layer congestion window size. The
could be “a match made in the heaven” due to two reasons: MAC layer one-hop link access success/failure information
(1) The CP has the global network monitoring function. This can be used to determine the routing path selection, and
is because that each switch/router in the DP can feedback its so on.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2617

TABLE VII
N ETWORK PARAMETERS C ONSIDERED IN C ROSS -L AYER D ESIGN

The DL is a perfect tool to fuse the above various cross-layer DL outputs the suggested performance goal change in AL. For
metrics and extract the intrinsic network patterns for protocol example, if the network can only provide >100ms of end-to-
optimization. By using big tensor concept, we can carefully end delay, it will suggest the AL to use different video coding
arrange the above metrics into multi-type tensor records, and methods to meet the network limits.
then apply tensor decomposition to extract the essential pat- Additionally, DL can be directly used to improve AL
terns. Those patterns can tell us whether the network will performance. For example, it can be used to analyze the
have significant packet loss in the near future, and classify the webpage display performance (refreshing rate, display speed,
network topology into hotspots and light traffic areas. The pat- image resolution, etc.). It can also be used to perform cyber
terns can also indicate the link interference distribution across security analysis to detect spam emails and malicious Web
the whole network, and help us to avoid the high interference sites.
areas. The challenging issue here is to define a low-complexity
Based on the DL pattern extraction results, different lay- DL model based on the AL performance goal or application
ers should co-operate with each other to perform cross-layer profile data, and solve the DL problem to generate a series of
optimization. For example, if the DL indicates that a group useful results that can be interpreted by the lower layers for
of nodes form a high-packet-loss ‘dark hole’, the MAC layer protocol operation control purpose. For example, if the AL has
should use much stronger FEC to overcome the bit errors in a video streaming application, how do we define the AL model
that area; the routing layer can re-establish a new path to to translate the QoS/QoE performance goals into the concrete
detour around such a hole; and the transport layer can use congestion control and routing parameters? How does the AL
much smaller congestion control window size. classify different applications into various cross-layer protocol
design options? etc.

G. (Challenge 7) DL-Based Application Layer


Enhancement H. (Challenge 8) DL-Based Dew-Fog-Cloud
So far, we have not discussed much on the DL-based appli- Computing Security
cation layer (AL) enhancement since this survey focuses on We have summarized the application of DL algorithms for
the core network protocol design issues. However, the AL has network security such as intrusion detection. This is a critical
significant impacts on other layers from mission requirements field and will continue to attract many research interests since
viewpoint. For example, if our mission is to deliver a HDTV numerous new attacks keep coming up. Here we would like to
flow to multicast users, the AL should specify all the QoS and point out that DL will play an important role in the security
QoE (quality of experience) requirements for the video stream. of a promising network infrastructure, called dew/fog/cloud
Then the lower layers can take those QoS/QoE parameters as computing (DFC-C). This new network architecture has the
the performance goal and adjust their corresponding protocol following two important features: (1) Collecting data via
operations. dew computing units: The large amount of dew computing
DL can be used to learn the network status based on the col- devices (such as sensors, RFID chips, etc.) can be deployed
lected network performance parameters (see Table VII). Then everywhere, and get connected via wireless networks (such

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2618 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

as Zigbee-based systems). The fog computing infrastructure Proper DRL models need to be clearly defined based on
consists of a series of long-distance wireless relays such as various CRN operation requirements. For example, the DRL
Wi-Max nodes or cell phone towers, to deliver the aggregated may need to be integrated with queuing models to determine
dew computing data to any cloud server. the spectrum handoff delay, i.e., how long a user can occupy
From security viewpoint, the above dew-fog-cloud archi- the existing channel based on the PU traffic analysis, and when
tecture exposes many attack opportunities to the adversaries. the user should start to look for a new channel, and so on.
For example, one can ‘pollute’ partial dew computing data
by falsifying the sensor data, or mislead the routing path J. (Challenge 10) Efficient DL/DRL Implementations in
selection in the fog computing segments by claiming a better Practical Wireless Platforms
path, and so on.
The above DL/DRL algorithms eventually need to be imple-
To handle the large-scale dew computing sources and con-
mented in practical wireless network products. However, the
current fog computing routing topology, DL is a natural
pure theoretical understandings cannot be simply programmed
choice to parse all the wireless nodes/links parameters and
in the wireless devices due to the following challenges:
deduce the possible attack positions and types. For example,
(1) Difficulty to collect network parameters for DL input
we can use all the dew nodes’ sensor data as the samples,
layers: All DL algorithms require the training and testing
and run DL-based data clustering test to find out a potential
phases. In each phase, the input layer of the deep neural
data sample poisoning attack. The challenging issue here is
network consists of the data samples’ parameters. The more
to clearly define a DL model with the proper input/output
complete the samples are (in terms of data attributes), the
layer interpretations based on a particular network secu-
more accurately the DL can recognize the network features.
rity problem. Different security/privacy problems mean that
Many network parameters come from MAC and routing lay-
the DL should have different input/output/gradient parame-
ers, which involve many relay nodes’ responses. However,
ter updating structures. For example, the privacy preservation
those nodes may not have fast feedback about their commu-
emphasizes the protection of the sensitive data attributes
nication status due to the unpredictable link delays and radio
(such as patients’ names), and various ID-hiding models
interference. Therefore, the DL models should be designed
can be used to define the DL gradient weight updating
to tolerate certain parameter miss or data errors in the input
process.
layers.
(2) The resource limits of the wireless devices: Many wire-
I. (Challenge 9) From DL to DRL: Applications for less products have limited memory and CPU capabilities. They
Cognitive Radio Network Control do not allow complex algorithms to be programmed into their
existing protocols. Since DL has iterative execution nature, it
DL focuses on ‘passive’ data learning to recognize the
may elongate the system response time. The DL algorithms
intrinsic patterns hidden in the data. However, it does not have
should minimize the intermediate computation parameters to
concrete ‘reactions’ for each of the extracted data patterns.
save the memory space. The algorithms should be optimized
Deep reinforcement learning uses Markov decision models to
to reduce the execution time.
guide the choices of different ‘actions’ based on the state tran-
(3) Incomplete training sample collections: DL requires
sition models. Therefore, in many practical applications DRL
complete or nearly complete training samples to accurately
plays more important roles than DL algorithms.
recognize the network patterns. However, the training samples
Here we emphasize the benefits of DRL for cognitive radio
may be very limited due to the difficulty to collect so many
network (CRN) control. The CRN is an important type of
data points for each possible network status. This requires that
wireless network due to its flexible spectrum access, i.e., the
DL should have the capability of adding new samples after the
nodes can grab any available (free) channel to send out data,
failure of recognizing a new pattern. The new added samples
and can timely vacate the channel if the primary user (PU) is
can improve the accuracy of the DL models.
coming back to use the channel again.
In addition, the network engineers/programmers should
DRL can be used to control the following CRN opera-
carefully define the DL data formats since different network
tions: (1) Spectrum sensing: the DRL model can be used to
parameters have very different data attributes and formatting
determine the channel scanning order. Some channels with the
requirements. Some proper numerical representations and data
higher chance of being idle should be scanned first. Note that
normalization methods should be defined clearly to aggregate
spectrum scanning is a time-consuming process if thousands
multiple network parameters into the same DL input layer.
of channels need to be scanned and analyzed. By first check-
ing the free channels, we can save spectrum sensing time.
(2) Spectrum handoff: When the PU comes back, the node IX. C ONCLUSION
should switch to other channels. The channel switching tim- This paper has comprehensively reviewed the method-
ing and which channel to switch to, are two critical issues ologies of applying DL schemes for wireless network
to be solved. Should we wait for the PU’s coming event to performance enhancement. In a nut shell, (1) DL/DRL is very
decide the channel switching operation, or can we predict the useful for intelligent wireless network management due to
timing of PU’s transmissions and search for backup channels its human-brain-like pattern recognition capability. With the
beforehand? Obviously, the latter has a better communication hardware performance improvement of today’s wireless prod-
quality and can avoid some packet loss events. ucts, its adoption becomes easier. (2) It plays critical roles in

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2619

multiple protocol layers. We have summarized its applications [18] M. A. Alsheikh, S. Lin, D. Niyato, and H.-P. Tan, “Machine learning
in physical, MAC and routing layers. It makes the network in wireless sensor networks: Algorithms, strategies, and applica-
tions,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 1996–2018,
more intelligently realize the change of the entire topology and 4th Quart., 2014.
link conditions, and helps to generate more appropriate pro- [19] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Machine
tocol parameter controls. (3) It can be integrated with today’s learning for wireless networks with artificial Intelligence: A tutorial on
neural networks,” eprint arXiv: 1710.02913, Oct. 2017.
various wireless networking schemes, including CRNs, SDNs, [20] J. Chen, U. Yatnalli, and D. Gesbert, “Learning radio maps for UAV-
etc., to achieve either centralized or distributed resource allo- aided wireless networks: A segmented regression approach,” in Proc.
cation and traffic balancing functions. This article also lists IEE Int. Conf. Commun. (ICC) Signal Process. Commun. Symp., Paris,
France, May 2017, pp. 1–6.
ten important research issues that need to be solved in the [21] Y. Xiao, Z. Han, D. Niyato, and C. Yuen, “Bayesian reinforcement
near future in this field. They cover some promising wireless learning for energy harvesting communication systems with uncer-
applications such as network swarming, CRN spectrum hand- tainty,” in Proc. IEEE Int. Conf. Commun. (ICC) Next Gener. Netw.
Symp., London, U.K., Jun. 2015, pp. 5398–5403.
off, SDN flow table update, dew/fog computing security, etc. [22] M. Bennis and D. Niyato, “A Q-learning based approach to interference
This paper will help readers to understand the state-of-the-art avoidance in self-organized femtocell networks,” in Proc. IEEE Glob.
of DL-enhanced wireless networking protocols and find some Commun. Conf. (GLOBECOM) Workshops Femtocell Netw., Miami,
FL, USA, Dec. 2010, pp. 706–710.
interesting and challenging research topics to pursue in this [23] M. Chen et al., “Caching in the sky: Proactive deployment of cache-
critical field. enabled unmanned aerial vehicles for optimized quality-of-experience,”
IEEE J. Sel. Areas Commun., vol. 35, no. 5, pp. 1046–1061, May 2017.
[24] M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for
proactive caching in cloud-based radio access networks with mobile
R EFERENCES users,” IEEE Trans. Wireless Commun., vol. 16, no. 6, pp. 3520–3535,
Jun. 2017.
[1] J. Patterson and A. Gibson, Deep Learning: A Practitioner’s Approach. [25] T. Serre, L. Wolf, and T. Poggio, “Object recognition with features
Sebastopol, CA, USA: O’Reilly Media, 2017. inspired by visual cortex,” in Proc. IEEE Comput. Soc. Conf. Comput.
[2] G. Hinton et al., “Deep neural networks for acoustic modeling in speech Vis. Pattern Recognit. (CVPR), , CA, USA, Jun. 2005, pp. 994–1000.
recognition,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, [26] T. V. Maia, “Reinforcement learning, conditioning, and the brain:
Nov. 2012. Successes and challenges,” Cognit. Affective Behav. Neurosci., vol. 9,
[3] T. N. Sainath, A.-R. Mohamed, B. Kingsbury, and B. Ramabhadran, no. 4, pp. 343–364, Dec. 2009.
“Deep convolutional neural networks for LVCSR,” in Proc. IEEE [27] V. Mnih et al., “Human-level control through deep reinforcement
Int. Conf. Acoust. Speech Signal Process. (ICASSP), Vancouver, BC, learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
Canada, May 2013, pp. 8614–8618. [28] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8,
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica- nos. 3–4, pp. 279–292, May 1992.
tion with deep convolutional neural networks,” in Proc. 25th Int. Conf. [29] S. S. Sonawane and P. A. Kulkarni, “Graph based representation and
Neural Inf. Process. Syst. (NIPS), vol. 1, Dec. 2012, pp. 1097–1105. analysis of text document: A survey of techniques,” Int. J. Comput.
[5] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierar- Appl., vol. 96, no. 19. pp. 1–8, Jun. 2014.
chical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. [30] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst,
Intell., vol. 35, no. 8, pp. 1915–1929, Oct. 2013. “Geometric deep learning: Going beyond Euclidean data,” IEEE Sig.
[6] J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a Proc. Mag., vol. 34, no. 4, pp. 18–42, May 2017.
convolutional network and a graphical model for human pose estima- [31] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks
tion,” in Proc. 27th Int. Conf. Neural Inf. Process. Syst. (NIPS), vol. 1. and deep locally connected networks on graphs,” in Proc. 2nd Int. Conf.
Montreal, QC, Canada, Dec. 2014, pp. 1799–1807. Learn. Represent. (ICLR), Banff, AB, Canada, Apr. 2014, pp. 1–14.
[7] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, [32] Y. Hechtlinger, P. Chakravarti, and J. Qin, “A generalization
pp. 436–444, May 2015. of convolutional neural networks to graph-structured data,” eprint
[8] R. Collobert et al., “Natural language processing (almost) from arXiv:1704.08165, Apr. 2017.
scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493–2537, Aug. 2011. [33] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks on
[9] A. Bordes, J. Weston, and S. Chopra, “Question answering with sub- graph-structured data,” eprint arXiv:1506.05163, Jun. 2015.
graph embeddings,” in Proc. Conf. Empir. Methods Nat. Lang. Process. [34] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neu-
(EMNLP), Doha, Qatar, Oct. 2014, pp. 615–620. ral networks on graphs with fast localized spectral filtering,” in Proc.
[10] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning Conf. Adv. Neural Inf. Process. Syst. (NIPS), vol. 29. Barcelona, Spain,
with neural networks,” in Proc. 27th Int. Conf. Neural Inf. Process. Dec. 2016, pp. 3837–3845.
Syst. (NIPS), Montreal, QC, Canada, Dec. 2014, pp. 3104–3112. [35] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
convolutional networks,” in Proc. 5th Int. Conf. Learn. Represent.
[11] S. Jean, K. Cho, R. Memisevic, and Y. Bengio, “On using very large
(ICLR), Toulon, France, Apr. 2017, pp. 1–14.
target vocabulary for neural machine translation,” in Proc. 53rd Annu.
[36] J. Lee, H. Kim, J. Lee, and S. Yoon, “Transfer learning for deep learn-
Meeting Assoc. Comput. Linguist. (ALC), Beijing, China, Jul. 2015,
ing on graph-structured data,” in Proc. 31st AAAI Conf. Artif. Intell.
pp. 1–10.
(AAAI), San Francisco, CA, USA, Feb. 2017, pp. 1–7.
[12] D. Silver et al., “Mastering the game of go with deep neural networks [37] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.
and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
[13] J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik, “Deep [38] V. R. Cadambe and S. A. Jafar, “Interference alignment and degrees of
neural nets as a method for quantitative structure-activity relationships,” freedom of the K -user interference channel,” IEEE Trans. Inf. Theory,
J. Chem. Inf. Model., vol. 55, no. 2, pp. 263–274, Jan. 2015. vol. 54, no. 8, pp. 3425–3441, Aug. 2008.
[14] M. Helmstaedter et al., “Connectomic reconstruction of the inner [39] N. Zhao, F. R. Yu, M. Jin, Q. Yan, and V. C. M. Leung, “Interference
plexiform layer in the mouse retina,” Nature, vol. 500, no. 7461, alignment and its applications: A survey, research issues, and chal-
pp. 168–174, Oct. 2014. lenges,” IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 1779–1803,
[15] M. K. K. Leung, H. Y. Xiong, L. J. Lee, and B. J. Frey, “Deep learning 3rd Quart., 2016.
of the tissue-regulated splicing code,” Bioinformatics, vol. 30, no. 12, [40] Y. He, C. Liang, F. R. Yu, N. Zhao, and H. Yin, “Optimization of
pp. 121–129, Jun. 2014. cache-enabled opportunistic interference alignment wireless networks:
[16] H. Y. Xiong, B. Alipanahi, L. J. Lee, H. Bretschneider, and D. A big data deep reinforcement learning approach,” in Proc. IEEE Int.
Merico, “The human splicing code reveals new insights into the genetic Conf. Commun. (ICC), Paris, France, May 2017, pp. 1–6.
determinants of disease,” Science, vol. 347, no. 6218, pp. 144–151, [41] G. Han, L. Xiao, and H. V. Poor, “Two-dimensional anti-jamming com-
Jan. 2015. munication based on deep reinforcement learning,” in Proc. IEEE Int.
[17] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Conf. Acoust. Speech Signal Process. (ICASSP), New Orleans, LA,
Cambridge, MA, USA: MIT Press, 1998. USA, Mar. 2017, pp. 2087–2091.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2620 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018

[42] S. Peng, H. Jiang, H. Wang, H. Alwageed, and Y.-D. Yao, “Modulation [66] L. Liu, Y. Cheng, L. Cai, S. Zhou, and Z. Niu, “Deep learning based
classification using convolutional neural network based deep learning optimization in wireless network,” in Proc. IEEE Int. Conf. Commun.
model,” in Proc. 26th Wireless Opt. Commun. Conf. (WOCC), Newark, (ICC), Paris, France, May, 2017, pp. 21–25.
NJ, USA, Apr. 2017, pp. 1–5. [67] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm
[43] Y. Jia et al., “Caffe: Convolutional architecture for fast feature embed- for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554,
ding,” in Proc. 22nd ACM Int. Conf. Multimedia (MM), Orlando, FL, Jul. 2006.
USA, Nov. 2014, pp. 675–678. [68] T. Hu and Y. Fei, “QELAR: A machine-learning-based adaptive routing
[44] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode lin- protocol for energy-efficient and lifetime-extended underwater sensor
ear codes using deep learning,” in Proc. 54th Annu. Allerton Conf. networks,” IEEE Trans. Mobile Comput., vol. 9, no. 6, pp. 796–809,
Commun. Control Comput. (Allerton), Monticello, IL, USA, Sep. 2016, Jun. 2010.
pp. 341–346. [69] P. Xie, J.-H. Cui, and L. Lao, “VBF: Vector-based forwarding proto-
[45] E. Nachmani, E. Marciano, D. Burshtein, and Y. Be’ery, “RNN col for underwater sensor networks,” in Proc. 5th Int. IFIP-TC6 Conf.
decoding of linear block codes,” eprint arXiv:1702.07560, Feb. 2017. Netw. Technol. Services Protocols (NETWORK.), Coimbra, Portugal,
[46] T. Gruber, S. Cammerer, J. Hoydis, and S. Brink, “On deep learning- May 2006, pp. 1216–1221.
based channel decoding,” in Proc. IEEE 51st Annu. Conf. Inf. Sci. Syst. [70] N. Kato et al., “The deep learning vision for heterogeneous network
(CISS), Baltimore, MD, USA, Mar. 2017, pp. 1–6. traffic control: Proposal, challenges, and future perspective,” IEEE
[47] S. Cammerer, T. Gruber, J. Hoydis, and S. T. Brink, “Scaling deep Wireless Commun., vol. 24, no. 3, pp. 146–153, Jun. 2017.
learning-based decoding of polar codes via partitioning,” in Proc. IEEE [71] I. J. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
Glob. Commun. Conf. (GLOBECOM), Singapore, Dec. 2017, pp. 1–6. Cambridge, MA, USA: MIT Press, Nov. 2016.
[48] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” in [72] B. Mao et al., “Routing or computing? The paradigm shift towards
Proc. IEEE 18th Int. Workshop Signal Process. Adv. Wireless Commun. intelligent computer network packet transmission based on deep learn-
(SPAWC), Sapporo, Japan, Jul. 2017, pp. 1–5. ing,” IEEE Trans. Comput., vol. 66, no. 11, pp. 1946–1960, Nov. 2017.
[49] Y.-S. Jeon, S.-N. Hong, and N. Lee, “Blind detection for MIMO [73] Y. Lee, “Classification of node degree based on deep learning and
systems with low-resolution ADCs using supervised learning,” in Proc. routing method applied for virtual route assignment,” Ad Hoc Netw.,
IEEE Int. Conf. Commun. (ICC), Paris, France, May 2017, pp. 1–6. vol. 58, pp. 70–85, Apr. 2017.
[50] N. Farsad and A. Goldsmith, “Detection algorithms for communication [74] Q. You, Y. Li, M. S. Rahman, and Z. Chen, “A near optimal routing
systems using deep learning,” eprint arXiv:1705.08044, Jul. 2017. scheme for multi-hop relay networks based on Viterbi algorithm,” in
[51] T. O’Shea and J. Hoydis, “An introduction to deep learning for the Proc. IEEE Int. Conf. Commun. (ICC), Ottawa, ON, Canada, Jun. 2012,
physical layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 4531–4536.
pp. 563–575, Dec. 2017. [75] G. Stampa, M. Arias, D. Sanchez-Charles, V. Muntes-Mulero, and
[52] U. Challita, L. Dong, and W. Saad, “Deep learning for proactive A. Cabellos, “A deep-reinforcement learning approach for software-
resource allocation in LTE-U networks,” in Proc. 23rd Eur. Wireless defined networking routing optimization,” eprint arXiv:1709.07080,
Conf. (Eur. Wireless), Dresden, Germany, May 2017, pp. 1–6. Sep. 2017.
[53] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural [76] A. Varga, “The OMNeT++ discrete event simulation system,” in
network regularization,” in Proc. Int. Conf. Learn. Represent. (ICLR), Proc. 15th Eur. Simulat. Multiconf. (ESM), Prague, Czech Republic,
San Diego, CA, USA, May 2015, pp. 1–8. Jun. 2001, pp. 1–7.
[54] R. J. Williams, “Simple statistical gradient-following algorithms for [77] A. Varga and R. Hornig, “An overview of the OMNeT++ simulation
connectionist reinforcement learning,” Mach. Learn., vol. 8, nos. 3–4, environment,” in Proc. 1st Int. Conf. Simulat. Tools Techn. Commun.
pp. 229–256, May 1992. Netw. Syst Workshops (Simutools), Marseille, France, Mar. 2008,
[55] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient pp. 1–10.
methods for reinforcement learning with function approximation,” in [78] A. Mestres et al., “Knowledge-defined networking training datasets,”
Proc. 12th Int. Conf. Neural Inf. Process. Syst. (NIPS), Denver, CO, Universitat Politecnica de Catalunya, Barcelona, Spain, Oct. 2017.
USA, Dec. 1999, pp. 1057–1063. [Online]. Available: http://knowledgedefinednetworking.org
[56] M. Balazinska and P. Castro, “Characterizing mobility and network [79] A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar, “A machine
usage in a corporate wireless local-area network,” in Proc. 1st Int. learning approach to routing,” eprint arXiv:1708.03074, Aug. 2017.
Conf. Mobile Syst. Appl. Services, San Francisco, CA, USA, May 2003, [80] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust
pp. 303–316. region policy optimization,” in Proc. 32nd Int. Conf. Mach. Learn.
[57] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deep reinforce- (ICML), Lille, France, Jul. 2015, pp. 1–9.
ment learning based framework for power-efficient resource allocation [81] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel,
in cloud RANs,” in Proc. IEEE Int. Conf. Commun. (ICC), Paris, “Benchmarking deep reinforcement learning for continuous control,”
France, May 2017, pp. 1–6. in Proc. 33rd Int. Conf. Mach. Learn. (ICML), New York, NY, USA,
[58] H. Sun et al., “Learning to optimize: Training deep neural networks Apr. 2016, pp. 1329–1338.
for wireless resource management,” in Proc. IEEE 18th Int. Workshop [82] R. Atallah, C. Assi, and M. Khabbaz, “Deep reinforcement learning-
Signal Process. Adv. Wireless Commun. (SPAWC), Sapporo, Japan, based scheduling for roadside communication networks,” in Proc. 15th
Jul. 2017, pp. 1–6. Int. Symp. Model. Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), Paris,
[59] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted France, May 2017, pp. 1–8.
MMSE approach to distributed sum-utility maximization for a MIMO [83] B. Sun, H. Feng, K. Chen, and X. Zhu, “A deep learning framework
interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, of quantized compressed sensing for wireless neural recording,” IEEE
no. 9, pp. 4331–4340, Apr. 2011. Access, vol. 4, pp. 5169–5178, 2016.
[60] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learn- [84] T. Tieleman and G. Hinton, “Lecture 6.5-Rmsprop: Divide the gradient
ing for distributed dynamic spectrum access,” eprint arXiv:1704.02613, by a running average of its recent magnitude,” COURSERA Neural
Apr. 2017. Netw. Mach. Learn., vol. 4, no. 2, pp. 26–31, Oct. 2012.
[61] K. Cohen, A. Leshem, and E. Zehavi, “Game theoretic aspects of the [85] S. O. Haykin, Neural Networks and Learning Machines, 3rd ed.
multi-channel ALOHA protocol in cognitive radio networks,” IEEE J. Harlow, U.K.: Pearson Higher Educ., Nov. 2008.
Sel. Areas Commun., vol. 31, no. 11, pp. 2276–2288, Nov. 2013. [86] L. Bottou, “Large-scale machine learning with stochastic gradient
[62] J. Wang et al., “Spatiotemporal modeling and prediction in cellular descent,” in Proc. 19th Int. Conf. Comput. Stat. (COMPSTAT), Paris,
networks: A big data enabled deep learning approach,” in Proc. IEEE France, Sep. 2010, pp. 177–187.
Conf. Comput. Commun. (INFOCOM), Atlanta, GA, USA, May 2017, [87] J. Zhang et al., “An efficient and compact compressed sensing
pp. 1–9. microsystem for implantable neural recordings,” IEEE Trans. Biomed.
[63] R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Circuits Syst., vol. 8, no. 4, pp. 485–496, Aug. 2014.
Practice. Heathmont, VIC, Australia: OTexts, Sep. 2014. [88] L. Jacques, D. K. Hammond, and J. M. Fadili, “Dequantizing com-
[64] H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, pressed sensing: When oversampling and non-Gaussian constraints
“Support vector regression machines,” in Proc. 9th Int. Conf. Neural combine,” IEEE Trans. Inf. Theory, vol. 57, no. 1, pp. 559–571,
Inf. Process. Syst. (NIPS), Denver, CO, USA, Dec. 1996, pp. 155–161. Jan. 2011.
[65] J. A. K. Suykens and J. Vandewalle, “Least squares support vector [89] Z. Yang, L. Xie, and C. Zhang, “Variational Bayesian algorithm for
machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, quantized compressed sensing,” IEEE Trans. Signal Process., vol. 61,
Jun. 1999. no. 11, pp. 2815–2824, Jun. 2013.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2621

[90] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource manage- [115] M. Z. Alom, V. Bontupalli, and T. M. Taha, “Intrusion detection
ment with deep reinforcement learning,” in Proc. 15th ACM Workshop using deep belief networks,” in Proc. Nat. Aerosp. Electron. Conf.
Hot Topics Netw. (HotNets), Atlanta, GA, USA, Nov. 2016, pp. 50–56. (NAECON), Dayton, OH, USA, Jun. 2015, pp. 339–344.
[91] M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli, “Traffic classification [116] M.-J. Kang and J.-W. Kang, “Intrusion detection system using deep
through simple statistical fingerprinting,” ACM SIGCOMM Comput. neural network for in-vehicle network security,” PLoS ONE, vol. 11,
Commun. Rev., vol. 37, no. 1, pp. 5–16, Jan. 2007. no. 6, pp. 1–17, Jun. 2016.
[92] X. Wang and D. J. Parish, “Optimised multi-stage TCP traffic classifier [117] M. Abadi et al., “TensorFlow: Large-scale machine learning on het-
based on packet size distributions,” in Proc. 3rd Int. Conf. Commun. erogeneous distributed systems,” eprint arXiv:1603.04467, Mar. 2016.
Theory Rel. Qual. Service (CTRQ), Glyfada, Greece, Jun. 2010, [118] R. Al-Rfou et al., “Theano: A Python framework for fast computation
pp. 98–103. of mathematical expressions,” eprint arXiv:1605.02688, May 2016.
[93] R. Sun et al., “Traffic classification using probabilistic neural [119] WILL API. Accessed: Jan. 2018. [Online]. Available:
networks,” in Proc. 6th Int. Conf. Nat. Comput. (ICNC), Yantai, China, https://scarsty.gitbooks.io/will/content/
Aug. 2010, pp. 1914–1919. [120] T. Chen et al., “MXNet: A flexible and efficient machine learning
[94] H. Ting, W. Yong, and T. Xiaoling, “Network traffic classification based library for heterogeneous distributed systems,” in Proc. NIPS Workshop
on kernel self-organizing maps,” in Proc. Int. Conf. Intell. Comput. Mach. Learn. Syst. (NIPS), Barcelona, Spain, Dec. 2016, pp. 1–6.
Integr. Syst. (ICISS), Guilin, China, Oct. 2010, pp. 310–314. [121] R. Collobert, K. Kavukcuoglu, and C. Farabet, “Torch7: A MATLAB-
[95] A. H. Lashkari, G. D. Gil, M. S. I. Mamun, and A. A. Ghorbani, like environment for machine learning,” in Proc. BigLearn NIPS
“Characterization of encrypted and VPN traffic using time-related fea- Workshop (NIPS), Granada, Spain, Dec. 2011, pp. 1–6.
tures,” in Proc. 2nd Int. Conf. Inf. Syst. Security Privacy (ICISSP), [122] Comparison of Deep Learning Software. Accessed: Jun. 2018. [Online].
Rome, Italy, Feb. 2016, pp. 407–414. Available: https://en.wikipedia.org/wiki/Comparison_of_deep_
learning_software
[96] Y. L. Gwon and H. T. Kung, “Inferring origin flow patterns in Wi-Fi
with deep learning,” in Proc. 11th Int. Conf. Auton. Comput. (ICAC),
Philadelphia, PA, USA, Jun. 2014, pp. 73–83.
[97] B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive
field properties by learning a sparse code for natural images,” Nature,
vol. 381, no. 6583, pp. 607–609, Jun. 1996.
Qian Mao received the B.S. degree from the
[98] Y. L. Boureau, J. Ponce, and Y. LeCun, “A theoretical analysis of
Nanjing University of Aeronautics and Astronautics,
feature pooling in visual recognition,” in Proc. 27th Int. Conf. Mach.
Jiangsu, China, in 2000, the M.E. degree from
Learn. (ICML), Haifa, Israel, Jun. 2010, pp. 111–118.
Shanghai Ship and Shipping Research Institute,
[99] L. Ljung, System Identification: Theory for the User, 2nd ed. Shanghai, China, in 2003, and the Ph.D. degree
Englewood Cliffs, NJ, USA: Prentice-Hall, Jan. 1999. in traffic information engineering and control from
[100] Z. Wang, “The applications of deep learning on traffic identification,” Tongji University, Shanghai, in 2006. She is cur-
in Proc. Black Hat USA, Las Vegas, NV, USA, Aug. 2015, pp. 1–10. rently pursuing the Ph.D. degree with the University
[101] M. Lotfollahi, R. S. H. Zade, M. J. Siavoshani, and M. Saberian, “Deep of Alabama, AL, USA. She was an Assistant
packet: A novel approach for encrypted traffic classification using deep Professor with the University of Shanghai for
learning,” eprint arXiv:1709.02656, Sep. 2017. Science and Technology from 2006 to 2015.
[102] F. Chollet et al. Keras: Deep Learning for Humans. Accessed: Her research interests include big data, cyber-physical system security, deep
Jun. 2018. [Online]. Available: https://github.com/fchollet/keras learning, and wireless networks.
[103] B. Yamansavascilar, M. A. Guvensan, A. G. Yavuz, and M. E. Karsligil,
“Application identification via network traffic classification,” in Proc.
Int. Conf. Comput. Netw. Commun. (ICNC), Santa Clara, CA, USA,
Jan. 2017, pp. 843–848.
[104] T. P. Oliveira, J. S. Barbar, and A. S. Soares, “Computer network traffic
prediction: A comparison between traditional and deep learning neural Fei Hu (M’01) received the first Ph.D. degree in
networks,” Int. J. Big Data Intell., vol. 3, no. 1, pp. 28–37, Jan. 2016. signal processing from Tongji University, Shanghai,
[105] E. Hodo, X. Bellekens, A. Hamilton, C. Tachtatzis, and R. Atkinson, China, in 1999, and the second Ph.D. degree in
“Shallow and deep networks intrusion detection system: A taxonomy electrical and computer engineering from Clarkson
and survey,” eprint arXiv:1701.02145, Jan. 2017. University, New York, NY, USA, in 2002. He is cur-
[106] Q. Niyaz, W. Sun, A. Y. Javaid, and M. Alam, “A deep learning rently a Professor with the Department of Electrical
approach for network intrusion detection system,” in Proc. 9th EAI and Computer Engineering, University of Alabama,
Int. Conf. Bio Inspired Inf. Commun. Technol. (BICT), New York, NY, Tuscaloosa, AL, USA. He has published over 200
USA, Dec. 2015, pp. 21–26. journal/conference papers and book (chapters) in the
[107] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learn- field of wireless networks and machine learning. His
ing: Transfer learning from unlabeled data,” in Proc. 24th Int. Conf. research interests are wireless networks, machine
Mach. Learn. (ICML), Corvallis, OR, USA, Jun. 2007, pp. 759–766. learning, big data, network security and their applications. His research has
[108] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed been supported by U.S. NSF, DoE, DoD, Cisco, and Sprint.
analysis of the KDD CUP 99 data set,” in Proc. 2nd IEEE Symp.
Comput. Intell. Security Defence Appl. (CISDA), Ottawa, ON, Canada,
Jul. 2009, pp. 1–6.
[109] (2017). NSL-KDD Dataset. [Online]. Available: http://nsl.cs.unb.ca/nsl-
kdd/ Qi Hao received the B.E. and M.E. degrees in
[110] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho, electrical and computer engineering from Shanghai
“Deep learning approach for network intrusion detection in soft- Jiao Tong University, Shanghai, China, in 1994 and
ware defined networking,” in Proc. Int. Conf. Wireless Netw. Mobile 1997, respectively, and the Ph.D. degree in electri-
Commun. (WINCOM), Fes, Morocco, Oct. 2016, pp. 1–6. cal and computer engineering from Duke University,
[111] R. Salakhutdinov and G. Hinton, “Deep Boltzmann machines,” Artif. Durham, NC, USA, in 2006. He was Post-Doctoral
Intell., vol. 5, no. 2, pp. 448–455, Jan. 2009. Fellow with the Center for Visualization and Virtual
[112] G. E. Hinton, “A practical guide to training restricted Boltzmann Environment, University of Kentucky, Lexington,
machines,” Momentum, vol. 9, no. 1, pp. 1–21, Aug. 2010. KY, USA. From 2007 to 2014, he was an Assistant
[113] U. Fiore, F. Palmieri, A. Castiglione, and A. D. Santis, “Network Professor with the Department of Electrical and
anomaly detection with the restricted Boltzmann machine,” Computer Engineering, University of Alabama,
Neurocomputing, vol. 122, pp. 13–23, Dec. 2013. Tuscaloosa, AL, USA. He is currently an Associate Professor with the
[114] N. Gao, L. Gao, Q. Gao, and H. Wang, “An intrusion detection model Department of Computer Science and Engineering, Southern University of
based on deep belief networks,” in Proc. 2nd Int. Conf. Adv. Cloud Big Science and Technology, Shenzhen, China. His current research interests
Data (CBD), Nov. 2014, pp. 247–252. include smart sensors, machine learning, and autonomous systems.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.

You might also like