Deep Learning Wireless Networks
Deep Learning Wireless Networks
Abstract—As a promising machine learning tool to handle the which has been developed as an eye-catching technology,
accurate pattern recognition from complex raw data, deep learn- i.e., Deep Learning (DL) [1]. In the DL process, first, com-
ing (DL) is becoming a powerful method to add intelligence to puters need to learn from experiences and build up certain
wireless networks with large-scale topology and complex radio
conditions. DL uses many neural network layers to achieve a training model. This training process allows computers to
brain-like acute feature extraction from high-dimensional raw determine appropriate weight values between neural nodes,
data. It can be used to find the network dynamics (such as which are able to extract the features from the input data.
hotspots, interference distribution, congestion points, traffic bot- Once the neural network has been trained, an appropriate deci-
tlenecks, spectrum availability, etc.) based on the analysis of a sion is able to be made to achieve high reward. This idea
large amount of network parameters (such as delay, loss rate, link
signal-to-noise ratio, etc.). Therefore, DL can analyze extremely has shown great success in many real-world control scenarios,
complex wireless networks with many nodes and dynamic link such as voice recognition [2], [3], image recognition [4]–[7],
quality. This paper performs a comprehensive survey of the appli- semantic analysis [8], [9], language interpretation [10], [11],
cations of DL algorithms for different network layers, including game control [12], drug discovery [13], and biomedical sci-
physical layer modulation/coding, data link layer access con- ences [14]–[16], etc.
trol/resource allocation, and routing layer path search, and traffic
balancing. The use of DL to enhance other network functions, DL is a subclass of machine learning which uses cascaded
such as network security, sensing data compression, etc., is also layers to extract features from the input data and eventually
discussed. Moreover, the challenging unsolved research issues forms a decision. The application of DL should consider four
in this field are discussed in detail, which represent the future aspects: (1) How to represent the state of the environment in
research trends of DL-based wireless networks. This paper can suitable numerical formats, which will be taken as the input
help the readers to deeply understand the state-of-the-art of
the DL-based wireless network designs, and select interesting layer of the DL network; (2) How to represent/interpret the
unsolved issues to pursue in their research. recognition results, i.e., the physical meaning of the output
layer of the DL network; (3) How to compute/update the
Index Terms—Wireless networks, deep learning (DL), deep
reinforcement learning (DRL), protocol layers, performance reward value, and what is the proper reward function that
optimization. can guide the iterative weight updating in each neural layer;
(4) The structure of the DL system, including how many hid-
den layers, the structure of each layer, and the connections
I. I NTRODUCTION between layers.
UMAN brains possess powerful data processing capa- Currently, many DL systems are tied with Reinforcement
H bilities. Every day we confront numerous data from
the external world. Under a complex environment, a large
Learning (RL) models [17], which comprises three parts: 1) an
environment which can be described by some features, 2) an
number of object features are first collected by our sense agent which takes actions to change the environment, and 3) an
organs. Then the brain extracts the abstract characteristics interpreter which announces the current state and the action the
from those feature data and finally makes a decision. In agent takes. Meanwhile, the interpreter announces the reward
many fields computers have already shown comparable or after the action takes effect in the environment, as shown in
even more powerful capabilities compared to human beings, Fig. 1. The goal of the RL is to train the agent in such a way
such as the game playing, auto control, voice and image that for a given environment state, it chooses the optimal action
recognition, etc. The approach for the computer to achieve that yields the highest reward. Therefore, one of the main dif-
these abilities is very similar to what human brain does, ferences between DL and RL is that the former usually learns
from examples (e.g., training data) to create a model to clas-
Manuscript received January 20, 2018; revised April 20, 2018; accepted sify data, however, the latter trains the model by maximizing
June 5, 2018. Date of publication June 12, 2018; date of current ver-
sion November 19, 2018. This work was supported in part by the the reward associated with different actions.
Science and Technology Innovation Commission of Shenzhen under Grant DL has already shown astonishing capabilities in dealing
CKFW2016041415372174, and in part by the National Natural Science with many real-world scenarios, such as the success of Alpha
Foundation of China under Grant 61773197. (Corresponding author: Qi Hao.)
Q. Mao and F. Hu are with the Department of Electrical and Computer Go, the face recognition on mobile phones, etc. Researchers
Engineering, University of Alabama, Tuscaloosa, AL 35401 USA (e-mail: in computer network areas also cast strong interests in DL
[email protected]; [email protected]). applications. By using DL model the complex network envi-
Q. Hao is with the Department of Computer Science and Engineering,
Southern University of Science and Technology, Shenzhen 518055, China. ronment can be represented, abstract features can be obtained,
Digital Object Identifier 10.1109/COMST.2018.2846401 and a better decision can be achieved finally for the computer
1553-877X c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2596 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2597
52 68 70 72
41 40 57 58 59 60 82
73
(2) DL advantages in security and other network functions: II. F UNDAMENTALS OF D EEP L EARNING
Besides the above protocol stack, we will also discuss the DL originated from Machine Learning (ML). In this sec-
advantages of using DL in other network functions. One criti- tion, we first analyze the differences and relationship between
cal area is security and privacy protection. Today the intrusion those two techniques. Then, a brief introduction towards DL
detection becomes more challenging due to the network scale principles is presented.
increase and huge amount of traffic passing through the attack
detector/filters. DL is an ideal tool to perform large-scale
network profile analysis to detect the potential intrusion events. A. From Machine Learning to Deep Learning
We will explain how DL can be used to classify the pack- Both ML and DL solve real-world problems with neu-
ets into benign/malicious types, and how it can be integrated ral networks. A typical ML system is composed of three
with other machine learning schemes such as unsupervised parts: 1) Input layer, which takes pre-processed data as the
clustering, to achieve a better anomaly detection effect. system input. The features of the real-world data (e.g., pixel
(3) Future trends: Since this field is still far from maturity values, shape, texture, etc.) need to be pre-processed and
and many issues are not solved yet, we will introduce 10 chal- identified by humans so that the ML system can deal with
lenging problems on the use of DL to enhance some of the them. 2) Feature extraction and processing layer, in which a
popular wireless networks, such as cognitive radio networks single layer of data processing is used to extract the data pat-
(CRNs), software-defined networks (SDNs), dew/fog comput- terns. Currently, Support Vector Machines (SVM), Principal
ing, etc. We will provide the context, motivation, problem Component Analysis (PCA), Hidden Markov Model (HMM),
statement, and concrete unsolved issues for each of those 10 etc., are extensively used for feature extraction. 3) Output
problems. They are helpful to the readers who are seeking for layer, which spills out the results of classification, regression,
new research directions. clustering, density estimation, or dimensionality reduction,
Roadmap: The rest of this paper is organized as fol- depending on the task of the ML model. The schematic
lows: In Section II, to prepare for the discussions of DL structure of ML is shown in Fig. 3(a).
applications for wireless network functions, we first explain The original data input into the learning system could be
the fundamental math models of DL, including its rela- quite diverse, varying from natural information such as image,
tions with general machine learning and graph-based learning audio and video, to various quantitative event descriptions.
framework. Then we move to the discussions of DL-based Although the input of the learning system may be different,
physical layer enhancements in terms of signal interference the core data learning module requires that the input data has a
and modulation classification in Section III. Section IV dis- uniformed form, based on which the input events are classified.
cusses the importance of DL in data link layer design. Therefore, to enable the learning process to “recognize” the
Some typical MAC design examples are explained with DL- input data, the original natural data needs to be pre-processed,
based enhancements. In Section V the DL-based routing i.e., the raw data needs to be transformed into a suitable repre-
layer operations such as path establishment and optimization sentation or feature vector, which can be accepted by the ML
are described. The utilizations of DL for security and other classification system. This pre-processing needs to be sophisti-
network functions are discussed in Section VI. Section VII catedly designed in such a way that the features of the original
summarizes some DL implementation platforms that have natural data related with classification are well preserved. And
been extensively used in wireless network research. Ten the classification accuracy is significantly affected by the data
challenging research issues to be solved next are stated pre-processing schemes.
in Section VIII, followed by the concluding marks in Machine learning systems usually have only one hidden
Section IX. layer between the input and output layers. This type of learning
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2598 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2599
can be handcrafted, or the environment’s state space is low- instant t is to take an action that maximizes the future rewards
dimensional and can be fully observed, the performances of with the current observation representation and policy π, i.e.,
reinforcement learning is limited. Furthermore, reinforcement maxE(Rt |st = s, at = a, π), which can be represented as the
π
learning tends to be unstable or even diverge when a nonlin- following equation,
ear function approximator is used to represent the reward. To
∗ ∗
solve these problems, deep Q-network (DQN) was proposed, Q (s, a) = Es r + γmax
Q (s , a )|s, a , (6)
a
which employs two novel strategies to overcome the insta-
bility problem of deep learning, i.e., experienced replay and where is the observation of the next instance and a is the
s
iterative update [27], [28]. In DQN, the agent interacts with action taken at the current instance. In practice, an approxima-
the environment through a sequence of actions, with the goal tor is used to estimate the action reward, i.e., using Q(s, a, θi− )
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2600 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
TABLE I
PARAMETERS OF D EEP L EARNING S CHEME FOR A NTI -JAMMING
to replace Q ∗ (s, a), where θi− is the weights of the neural there are three main types of layers in a CNN architecture,
network in an iteration before i. (For instance, θi− = θi−1 .) i.e., convolutional layer, pooling layer and fully connected
Thus, an approximate estimated reward value is layer. The main difference between the graph-oriented CNN
and regular CNN is that the former builds graphs for each
y = r + γmaxQ(s , a , θi− ), (7) neural node of the learning system, which is achieved by
a
selecting neighbors and determining the connection weights
For each round of the Q-network training, say round i, the
with the neighbors for each real-world node, as shown in
training is implemented by adjusting the weights with the aim
Fig. 6(a) and (b). To represent graphs in DL models, the input
of reducing the mean square error of (7).
data is denoted as vertices and edges, i.e., G = (V, E, A), where
V represents the set of vertices, E represents the set of edges,
C. Deep Learning for Graph-Structured Data and A is the weighted adjacency matrix. Then the graph is
In many practical applications the data often has the struc- input into the learning system monolithically.
tured features, i.e., nodes are connected with each other The graph DL can be conducted in either spectral domain
spatially or temporally, or both. For instance, when predicting or spatial domain [30]. The spatial approach generalizes CNN
the behavior of a person in the kitchen, the interactions using the graph’s spatial structure, capturing the essence
between the person and the appliances are connected spa- of the convolution as an inner product of the parame-
tially or temporally. Under such a circumstance, considering ters with spatially close neighbors, as shown in Fig. 6(b).
the spatio-temporal relations among nodes in the DL frame- Bruna et al. [31] uses multi-scale clustering to define the
work, we can use the graph-based structure to achieve the network architecture, in which the convolutions are defined
promising performance [29]. for each cluster. However, the spatial approaches tend to have
Currently, the graph-structured data is usually generalized as difficulties in finding a shift-invariance convolution for non-
Convolutional Neural Networks (CNNs). A CNN is a sequence grid data. To overcome this problem, Hechtlinger et al. [32]
of layers, each of them transforms one volume of activa- proposed a spatial CNN, which uses the relative distance
tions to another through a differentiable function. Usually between nodes. Assume G = (V , E ) is a graph, where
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2601
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2602 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2603
TABLE II
C OMPARISON OF D EEP L EARNING A PPLICATIONS IN E RROR C ORRECTING C ODES AND S IGNAL D ETECTION
45
46
47
48
49
50
51
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2604 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
TABLE III
PARAMETERS OF D EEP Q-L EARNING S CHEME FOR R ESOURCE A LLOCATION
the unlicensed channels. Finally, an action decoder interprets P(A, S, G) comprises actual power consumption and transition
the abstract vector into multiple predicted action sequences power amount (due to sleep/active switch), i.e.,
for the SBSs. For a SBS j, the goal is to maximize the total 1
throughput, uj , during its allocated airtime with the selected P (A, S , G) = |wr ,u |2 + Pr ,active
channel C and the time window T, i.e., η
r ∈A u∈U l r ∈A
T
C + Pr ,sleep + Pr ,transition , (9)
uj (aj , a−j ) = αj ,c,t γj ,c,t (8) r ∈S r ∈G
t=1 c=1
where aj denotes the action vector of SBS j, a−j denotes where wr ,u is the beamforming weight from RRH r to user
the action vector of all other SBSs except j, αj ,c,t is the u, ηl is the drain efficiency constant of the power amplifier,
achievable airtime fraction of SBS j on channel c in time A, S and G represent the sets of active, sleep and transition
epoch t, and γj ,c,t is a channel-related parameter. To achieve RRHs, respectively, U is the user set, and Pr ,active , Pr ,sleep
the optimization goal, the RL algorithm is used to train the and Pr ,transition are RRH’s power consumptions in active,
weights of the traffic encoder and the action decoder, for sleep and transition modes, respectively.
which the reward is defined as the approximation of the SBS’s For the active RRHs selected by the first step, the DRL
throughput, ûj (aj , a−j ). By maximizing the expected reward agent computes the optimal beamforming weights by solving
ûj (aj , a−j ) according to the gradient with respect to the pol- the following optimization problem:
icy parameters, the weights of the RL neural network can be 1
trained [54], [55]. The simulations were run upon the dataset minimize |wr ,u |2
ηl
provided by [56]. Compared to the traditional reactive allo- r ∈A u∈U
cation approaches, this scheme increases the average airtime subject to SINRu ≥ γu , u∈U
1
allocated for LTE-U by around 18%. |wr ,u |2 ≤ Pr , r ∈A (10)
Cloud Radio Access Network (RAN) is proposed for future ηl
u∈U
cellular networks, e.g., 5G, which is a centralized, cloud-
computing-based radio access network. In a cloud RAN, there where Ru is user u’s demanded data rate, Pr is RRH r’s max-
are a central Base Band Unit (BBU) pool in the cloud and imum allowable transmit power, and γu = Γm (2Ru /B − 1)
many distributed Remote Radio Heads (RRHs) near the users. (B is the channel bandwidth, and Γm is the SNR gap depend-
The RRHs only maintain basic transmission functions, com- ing on modulation). Compared to the single BS association
pressing and forwarding users’ radio signals to the BBUs via approach, this scheme is shown to satisfy the users’ demands
fronthaul links. The resource allocation problem, i.e., how to better when the amount of demand is high. And compared
minimize power consumption of the RRHs while satisfying to the full coordinated association approach, this scheme
users’ demands, has become one of the main tasks in cloud consumes less power.
RANs. To tackle this problem, Xu et al. [57] proposed a DL- Sun et al. [58] proposed a DL-based wireless resource allo-
based scheme for power-efficient resource allocation in RANs. cation scheme. This scheme puts the resources into a “black
In the scheme, the decisions are made via two steps: first, box” and trains a DNN in such a way that the power allocation
using a deep Q-learning algorithm to determine which RRHs for each transmitter is optimized and the system throughput
should be turned on or switched into sleep status; second, is maximized. The input layer of the DNN is fully connected,
using convex optimization algorithm to calculate the beam- the multiple hidden layers use the Rectified Linear Unite
forming weights from the RRH to the user for all the active (ReLU), max(x , 0), as the activation function, and the out-
RRHs. The parameters of the first step, i.e., Q-learning oper- put layer uses min(max(x , 0), Pmax ) to intake the resource
ation, are shown in Table III. The state representation of constraints, where x is the input of a neural node and Pmax is
time slot t is st = (m1 , m2 , . . . , mR , d1 , d2 , . . . , dU ), where the power budget of each transmitter. The proposed DL-based
mi ∈ {0, 1} denotes whether the i-th RRH is active or sleep, power allocation scheme is tested in Gaussian interference
R is the total number of RRHs, dj ∈ [dmin , dmax ] represents channel (IC) and multi-cell interfering multiple-access channel
the demand of the j-th active RRH. In each time slot, the (IMAC), respectively, and it is shown that compared to random
DRL agent determines which RRH is active. The immediate power allocation and maximum power allocation, the DL-
reward is defined as the gap between the maximum possible based scheme provides much higher throughput, and compared
power consumption Pmax and the actual power consumption, to WMMSE [59], the throughput of the DL-based scheme is
i.e., Pmax − P (A, S , G), where the actual power consumption close to WMMSE while the computation time is much shorter.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2605
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2606 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
TABLE IV
C OMPARISON OF D EEP L EARNING A PPLICATIONS IN DATA L INK L AYER
52
57
58
60
62
66
To predict the traffic for a cell, the traffics of both the cell The output of the DBN is the evaluation values of all links,
itself and its neighbors are input into the LSAE and GSAE. Y = (y1 , y2 , . . . , yM ), where M is the number of links in
The size of the neighboring region should be carefully bal- the entire network. Each element of Y, denoted as yi , indi-
anced between the prediction accuracy and the computation. In cates the probability of the input flow belonging to link i
the simulations in [62], a 11 × 11 square is used as a neighbor- (i = 1, 2, . . . , M ). Apparently, the higher the value of yi is, the
ing region, i.e., for each cell, the traffics of its 120 neighboring more likely that link i will be used by the flow. Based on the
cells are considered. Another tricky issue of traffic prediction evaluation results, the links that are not likely to be scheduled
is the temporal correlation, which may have periodicity in the for a flow will be excluded from the link optimization pro-
range of day, week, month, and year. cess. This approach efficiently reduces the problem size of link
optimization. Simulation results show that the scheme reduces
C. DL for Link Evaluation the computation cost by at least 50% without decreasing the
Due to the huge size and complex structure of modern optimization performance.
networks (such as multi-layer structure, heterogeneous charac-
teristics, hybrid network resources, etc.), the scale of network D. A Brief Discussion on DL Application in Data
optimization problem tends to be enormous. Therefore, reduc- Link Layer
ing the computational complexity is a critical problem. For The applications of DL in data link layer are mostly focus-
the link evaluation based optimization problem, Liu et al. [66] ing on resource allocation, traffic prediction, and link evalu-
proposed to reduce the problem size instead of to reduce the ation problems, which yield promising performance improve-
algorithm complexity. In their scheme, one possible status ment, as shown in Table IV. Considering the large size of
of all virtual links is defined as a network pattern (denoted modern network, DL system usually needs to read tremen-
as set A), and the goal is to minimize the overall power dous DLL parameters to make a decision. Therefore, how
consumption by scheduling all the patterns appropriately. to limit computation and data size are huge challenges for
This optimization goal can be achieved by solving a Linear the deep learning applications in DLL. Meanwhile, accurate
Programming(LP) problem, for which the objective function estimations towards the channel conditions are crucial for the
is min E = a∈A Pa ta . Here Pa is the power consumption deep learning system to make accurate DLL decisions, which
of pattern a, and ta is the active time. However, the problem is challenging due to the fast channel variations and the time
scale is huge due to the large number of virtual links. To limit of the decision-making process.
reduce the problem size, the authors suggest that many vir-
tual links of the network would not be scheduled, or merely V. ROUTING L AYER
carry a small amount of traffic. If these links are excluded
Modern routing protocols developed for wireless networks
from the LP problem, the computation will be magnificently
are basically categorized into four types: routing-table-based
decreased without much degradation of the optimization objec-
proactive protocols, on-demand reactive protocols, geographical
tive. Therefore, a Deep Belief Network (DBN) [67] is first
protocols, and ML/DL-based routing protocols. DL-based rout-
used before the LP model, which evaluates the link quality.
ing protocols have been extensively studied in the past several
The input of the DBN is flow information, which is repre-
years due to its superior performance for complex networks.
sented by a flow demand vector X = (x1 , x2 , . . . , xN ), where
N is the total number of network nodes. For a flow with a
demand of dc and travelling from the source node ns to the A. Lifetime-Aware Routing Based on RL
destination node nd , the element of X is xi = dc if i = ns , Underwater sensor network usually confronts two big chal-
xi = −dc if i = nd , otherwise xi = 0 (i = 1, 2, . . . , N ). lenges, i.e., a large propagation delay due to the use of
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2607
acoustic channels and the stringent power usage due to each node (after the packet has been delivered) are shown in
the high power consumption and inconvenience of battery Fig. 12(a).
charging. To deal with these challenges, a balanced rout- For the second packet, node s1 calculates the Q values of
ing protocol that distributes traffic evenly among all sensors its neighbors as follows:
was suggested in [68], which proposed an adaptive, energy-
Q(s1 , a2 ) = rt + γ(Psa12s2 V (s2 ) + Psa12s1 V (s1 )
efficient, and lifetime-aware routing scheme, called QELAR,
based on Q-learning algorithm. For the Q-learning model = −1 + 0.5(−1) = −1.5
[S , A, Pa (s, s ), Ra (s, s )], where S, A, P and R are the set of Q(s1 , a3 ) = rt + γ(Psa13s3 V (s3 ) + Psa13s1 V (s1 )
states, actions, state transition probabilities and rewards, the = −1 + 0.5V (0) = −1 (16)
value of taking action a in state s under a policy π, Qπ (s, a),
is defined as Therefore, node s1 updates its V values as V (s1 ) =
maxQ(s1 , a) = −1, and chooses the node with a larger Q
a
Qπ (s, a) = Eπ {Rt |st = s, at = a} value, which is s3 , to forward the packet. In this way, the
∞
previous packet forwarding conducted by node s2 acts as a
= Eπ γk rt+k |st = s, at = a . (11) ‘penalty’ in (17), which causes node s1 to choose node s3 to
k =0 forward the current packet. Node s3 then calculates the Q val-
The optimal value of state s is defined as V ∗ (s) = ues of its neighbors as Q(s3 , a1 ) = −1.5, Q(s3 , a2 ) = −1.5
maxQ ∗ (s, a). To consider the nodes’ energy condition, the and Q(s3 , a5 ) = −1. Thus node s3 forwards the packet to
a
scheme assumes that the residual energy of a node is Eres (s), node s5 and updates its V values as V (s3 ) = −1. This
the initial energy of a node is Einit (s), and the average resid- procedure is repeated for each packet. Finally, the V val-
ual energy in the group including the node is E (s). If a packet ues of each node converge to stable status, as shown in
is successfully transferred from node s to node s , the reward is Fig. 12(c). To balance the tasks among nodes, the resid-
ual energy of each node should be considered. Therefore,
Ra (s, s ) = −g − α1 [c(s) + c(s )] + α2 [d (s) + c(s )], in (15) and (16), α1 , α2 , β1 , β2 ∈ (0, 1]. In this circum-
(12) stance, the V value of each node may converge as shown in
Fig. 12(d), where the number tagged to each node represents
where c(s) = 1 − Eres (s)/Einit (s) and d (s) = the residual energy. Compared to the Vector-Based-Forwarding
2
π arctan(Eres (s)−E (s)) are residual energy-related rewards, (VBF) scheme [69], which is a popular routing protocol for
α1 and α2 are their weights, and g is a punishment coefficient Underwater Sensor Networks, the lifetime of QELAR is 20%
due to power consumption when a node attempts to forward a longer.
packet. On the other hand, if the packet forwarding from node QELAR forms the routing topology based on the task bal-
s to node s fails, the reward is ance among nodes, which significantly increases the batteries’
Ra (s, s) = −g − β1 c(s) + β2 d (s), (13) lifetime if the link conditions are perfect. However, in real-
life network, many factors may deteriorate link quality, such
where β1 and β2 are weights. Then the overall reward rt as a large queue in node’s data sending buffer, high mobility,
in (14) is weak signal strength, interference, etc. The deteriorated link
quality may decrease the end-to-end transmission quality and
rt = Pa (s, s )Ra (s, s ) + Pa (s, s)Ra (s, s). (14)
increase packet retransmissions. As a consequence, the batter-
where Pa (s, s ) is the transition probability from node s to ies’ lifetime is shortened. Therefore, considering other factors
node s with action a and Pa (s, s) is the transition probability together with task balance might be a good routing strategy,
from node s to node s with action a (failed data forwarding). especially when the size of the wireless network is large or
For instance, in the network as shown in Fig. 12, node s1 the payload is heavy.
wants to send packets to node s4 . Initially, all the Q values
and V values are set as 0s, and let γ = 0.5 and g = 1. If the B. DL for Routing Path Search
nodes’ residual energy is not considered, α1 = α2 = 0. For In many networks the conditions of routers vary from time
node s1 , since its immediate neighbors are nodes s2 and s3 , to time, including the saturated caches, overloaded routers,
it calculates the following Q values: malfunctional hardware, etc. All these factors may cause the
deterioration of the routers’ performance. In a network with
Q(s1 , a2 ) = rt + γ(Psa12s2 V (s2 ) + Psa12s1 V (s1 )
numerous routers, the data forwarding capability varies in each
= −1 + 0.5V (s2 ) = −1 local network region at each time slot. Since a communication
Q(s1 , a3 ) = rt + γ(Psa13s3 V (s3 ) + Psa13s1 V (s1 ) session may involve many hops from the source to the desti-
= −1 + 0.5V (s3 ) = −1 (15) nation, the routing algorithms confront magnificent challenges
in terms of finding a global optimal path among many can-
Thus, node s1 updates its V values as V (s1 ) = didate nodes in a highly dynamic network environment, and
maxQ(s1 , a) = −1 by choosing either node s2 or node s3 , some nodes may provide good local transmission performance
a
since Q(s1 , a2 ) = Q(s1 , a3 ). Node s2 then forwards the pack- while deteriorate the global end-to-end routing performance.
ets to node s4 and updates its V value through the same Finding a global optimal path demands heavy computa-
procedure. The path of the first packet and all the V values of tion load. DL can be an efficient approach to relieve the path
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2608 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
Fig. 12. Reward Value Variation of QELAR Scheme. (a) first packet (without energy consideration) (b) second packet (without energy consideration)
(c) converged V values (without energy consideration) (d) converged V values (with energy consideration).
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2609
the connectivity of a node, is first evaluated using DL environment data, such as nodes’ energy condition, queue size,
algorithm. Following that, a virtual route is generated by signal strength, etc., have to be sent to the central controller.
Viterbi algorithm [74] with the consideration of node degree. In this circumstance, transmission load yielded by the environ-
Then, an IP-based routing procedure is implemented to estab- ment data is huge, and too much overhead decreases the good
lish the route in the hybrid network. This scheme increased throughput of the network. Second, routing topology needs to
the reachability compared to AODV, OLSR, and ZRP routing be built up within limited time. However, the channel envi-
protocols. Note that there is a demand of a Route Information ronment data may be delayed when transmitted to the central
Server (RIS) in this scheme, which determines the node degree controller, thereby causing the delay of the routing formation.
using deep learning algorithm and particular hardware. Third, less flexibility, i.e., a central node running DL algo-
Stampa et al. [75] used deep reinforcement learning to rithm is not always available. For instance, the routing method
optimize the routing performance with the aim of reducing proposed in [73] adopts a centralized DL strategy, which trains
transmission delay. The DRL network uses traffic matrix as the the DL model in the base station and classifies nodes’ con-
state, a path from the source to the destination as the action, nectivity level thereby. However, for some ad hoc network, it
and the mean of end-to-end delays as the reward. Note that the is difficult to find a central server that has huge computation
scheme only considers the traffic matrix, i.e., the bandwidth power as well as an appropriate geographic location.
requests of the traffic flows, as the state and doesn’t consider On the other hand, if a distributed routing strategy is used,
other network factors, such as nodes’ queue size, link quality, each node (or each source node) has to train several DL mod-
etc. The routing results may be further optimized if more con- els. Therefore, huge computation power and storage are needed
ditions are considered. To test the performance, they used the for every node. For instance, the distributed DL strategy has
OMNeT++ discrete event simulator [76], [77] to collect trans- been adopted by QELAR [68] and Kato’s scheme [70], where
mission delay with given traffic and routing parameters [78]. the source node triggers the DL process and generates routing
Their experimental results showed a significant improvement topology using the trained models.
on transmission delay with various traffic intensities, compared Therefore, for the DL-based routing design, a sophisticated
to the benchmark routing scheme. choice between the centralized and distributed strategy is cru-
Valadarsky et al. [79] applied machine learning and DRL cial, which should be made based on plenty of considerations,
respectively for network routing. In the DRL approach, the such as the network structure and size, routing algorithm, deep
environment of the network is described by the demand matrix learning method, etc.
(DM), of which element dij indicates the traffic demand
between the source node i and the destination node j. (i , j =
VI. DL FOR OTHER N ETWORK F UNCTIONS
1, 2, . . . , N , and N is the number of nodes in the entire
network.) The reward of the DRL is the link utilization rate. In A. Vehicle Network Scheduling
each time slot, the agent chooses a routing scheme based on Vehicular Ad-Hoc Network (VANET) provides a fully con-
the routing strategy and DMs. Then the DRL system learns a nected network among vehicles and infrastructures, and is the
mapping from DMs to the routing schemes in such a manner foundation to establish an intelligent transportation system.
that the discounted reward is maximized. Their simulations use There are two types of communications in VANETs, i.e.,
the open-source implementation of TRPO [80], [81], and it is Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I),
shown that learning from the historical routing schemes with where the infrastructures are usually composed of Road-
the consideration of the demand matrix and link utilization Side Units (RSUs). The prior communication in VANETs is
provides an efficient approach for the agent to smartly choose Driving-Safety-Related (DSR) services. However, in V2I com-
the optimal routing topology for future data forwarding. munications, there are many non-DSR services, such as Web
Valadarsky’s scheme has some similarities with Stampa’s browsing and online games. To guarantee QoS performance,
scheme, but using different reward object. From their work Atallah et al. [82] proposed a DRL-based scheduling scheme
we see that how to choose reward function is a crucial issue among the RSUs, which targets to reduce the energy con-
for the DL applications in routing layer. According to the sumption of the road-side units while providing a safe driving
network characteristics and environmental features, design- environment. The DRL agent is deployed at the RSUs and
ers choose the most important attribute to optimize, which interacts with the VANET environment. Assuming that there
could be throughput, end-to-end delay, link utilization, flow are M vehicles in a RSU’s coverage, at time slot i, the action ai
completion time, etc. taken by the agent is either to receive DSR messages from the
vehicle, represented as ai = 0, or to send non-DSR messages
to a vehicle upon a download service request, represented as
D. A Brief Discussion on DL Application in Routing Layer ai = j , where j = 1, 2, . . . , M indicates which vehicle is
Centralized routing versus distributed routing is a tricky downloading non-DSR messages. If the RSU chooses to trans-
choice for DL-based routing schemes. This is because that mit data to a vehicle, the reward is measured by the number of
the deep learning process demands tremendous parameters as transmitted bits, and the cost is composed of two parts: 1) the
input to make a decision as well as huge computation to train power consumed by the RSU, and 2) the waiting time of a DSR
the neural network. message potentially occurred during this non-DSR communi-
If a centralized routing strategy is adopted, three main issues cation period. On the other hand, if the RSU chooses to receive
should be carefully addressed. First, large amount of network DSR messages, the induced cost is the power consumption
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2610 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2611
indicates that the agent schedules task i for a specific resource. Lotfollahi et al. [101] proposed a DL-based traffic clas-
Simulation shows that the DL-based resource allocation out- sification method, namely, deep packet, which not only
performs the popular methods, such as the Shortest Job First distinguishes traffic type (such as streaming, P2P, etc.)
(SJF) scheme, Packer and Tetris in terms of the average task but also classifies application types (such as Spotify, Bit
slowdown value. Torrent, etc.). The ‘ISCX VPN-nonVPN traffic dataset’ [95]
was adopted in their experiments. Two DL methods, i.e.,
convolutional NN and stacked autoencoder NN, are applied.
D. Network Security The simulation platform is built based on Keras library [102]
Traffic inference and intrusion detection are crucial issues and Tensorflow, and the scheme is shown to achieve 97.0%
for cyber security. The decision-making process of these prob- precision for traffic type classification and 95.4% accuracy for
lems requires an analysis of a large number of network features application type classification, both outperforming the general
and an abstraction towards the attack-related characteristics. ML-based schemes [95], [103]. A comprehensive comparison
DL schemes have shown promising performances in these upon the traffic classification performances among four
tasks. ANNs, i.e., backpropagation-based multilayer perceptron
1) Traffic Identification: Flow inference aims to describe (BB-MLP), resilient-backpropagation-based multilayer per-
the original flow features generated at the transmitter side ceptron (RBB-MLP), recurrent neural network (RNN), and
according to the received packets at the receiver side. It is deep learning stacked autoencoder (SAE), has been presented
crucial for intrusion detection, traffic monitoring, queue man- by Oliveira et al. [104].
agement, etc. An easy way to identify traffic is to classify them Discussion: To identify traffic accurately, two crucial issues
by port numbers. However, many recent applications, such as should be carefully considered. First, researchers need to decide
P2P traffic and video calls, may use port numbers that are on which layer the DL algorithm is implemented. Many traffic
initially assigned to other traffic types. Therefore, more accu- identification schemes are implemented upon transport layer
rate ways are required to identify traffic types. In the past or IP layer, such as [100]. However, some DL-based schemes
several years, traffic identification methods using statistical analyze MAC layer or application layer features to identify the
models [91], [92] or machine learning [93]–[95] have been traffic type. For instance, Gwon and Kung’s scheme [96] uses
extensively studied, for which the traffic features such as time the runs-and-gaps model upon MAC layer as the input of the DL
interval between packets, packet size, etc., are exploited to ana- model and identifies the traffic type thereby. Another example
lyze traffic types. Due to the complexity of the networks, the is Lotfollahi et al.’s scheme [101], which provides both traffic
patterns of the received flows may have nonlinear alterations, characterization and application identification to meet various
which makes flow inference challenging. requirements. The second issue is to determine what features
In 2014, Gwon and Kung [96] proposed a DL-based flow are used for DL analysis. The choice of data features used
inference scheme, which classifies the received packet pat- for DL models significantly impacts the accuracy of traffic
terns and infers the original properties (e.g., burst size and identification. For instance, [100] uses the scaled values of the
inter-burst gaps). Note that although the flow inference is TCP flow data, especially the first 25 values of the payload,
achieved by exploiting MAC layer parameters, it analyzes as the input of the DL network. Although from intuition and
the TCP/UDP/IP flows. The inference system is composed experience, these payload values indicate the traffic type to a
of two independent layers, each of which comprises a fea- certain extent, more intrinsic features, such as the correlation
ture extractor (FE) and a classifier (CL). For each layer, the and distribution features of the payload values, may further
sparse coding [97] is used to extract features from time-series improve the identification performances of the DL network.
data, and the max pooling [98] is used to reduce the num- 2) Intrusion Detection: Network Intrusion Detection (NID)
ber of features for the purpose of computation reduction. protects networks from malicious attacks by detecting software
The two-layer structure allows both the local features (such intrusions. Traditional NID schemes are mostly based on user
as run and gap sizes) and the global features (such as peri- signatures. However, this method needs the administration cen-
odicity) to be extracted by the learning system. Simulations ter to maintain a large number of user’s signatures. Currently,
show that the deep learning based flow inference scheme has the anomaly-based detection is extensively studied, which ana-
high true positive rate and low false positive rate, compared lyzes network activities and marks out abnormal data access as
to ARMAX-least squares [99], Naive Bayes classifiers and an intrusion. Since the utter goal of NID is to classify network
Gaussian mixture models. traffics into many categories (i.e., normal traffic and various
In 2015, Wang [100] proposed a DL-based traffic identi- abnormal traffic types) according to numerous traffic features,
fication scheme. In this scheme, the TCP flow is used for it is an ideal choice to use DL approaches to detect network
traffic identification, since the bytes of different protocol pay- intrusions by learning traffic features [105].
loads represent different distributions. Therefore, the bytes Niyaz et al. [106] proposed a flow-based, Self-taught
of TCP sessions are first normalized from integers (rang- Learning (STL) [107] approach to detect network intrusion.
ing from 0-255) to decimals (ranging from 0-1). Then the In their scheme, the NSL-KDD dataset [108], [109], a bench-
normalized data is sent to ANN as the input for traffic identi- mark for network intrusion, is used for training and testing.
fication. Simulation shows that this scheme can distinguish 25 The network traffic provided by NSL-KDD dataset includes
most popular protocols, such as SSL, HTTP-Connect, MySQL, normal flows and various anomalous flows, including Denial-
SMB, etc., most of which have a precision of higher than 95%. of-Service (DoS) attack flow, Remote-to-Local (R2L) attack
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2612 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
Fig. 15. Deep Learning Networks Used by Intrusion Detection. (a) Deep Auto-encoder (b) Deep Boltzmann Machine (c) Deep Believe Network.
flow, User-to- Root (U2R) attack flow, Probe attack flow, etc. and classification to a large extent, and one of the most chal-
For each traffic, forty-one features are provided, including the lenging topics is to select appropriate features to balance the
average packet number per flow, average time duration per detection accuracy and computation cost.
flow, protocol types (e.g., TCP, UDP), etc. The scheme in [106] Another type of DL network used for intrusion detec-
chooses 22 out of 41 features for the DL process, which tion is Deep Boltzmann Machine (DBM) [111], in which
consists of two stages: 1) an Unsupervised Feature Learning each node is bidirectionally connected with the nodes of
(UFL) process, which is based on sparse auto-encoder, and other layers, as shown in Fig. 15(b). To decrease the com-
2) a supervised learning process with the goal of classifi- putation cost of the gradient, the intra-layer links (the red
cation. Auto-encoder is a feedforward non-recurrent neural dotted lines in Fig. 15(b)) are abandoned to use, yielding a
network with an input layer, an output layer and one or sev- Restricted Boltzmann Machine (RBM) [112]. As a matter of
eral hidden layers, as shown in Fig. 15(a). Specifically, the fact, in many real applications, the network detectors may
node number of the input layer and the output layer is the not know what features the anomalous traffic possesses. Thus,
same, which is larger than the node number of the hidden Fiore et al. [113] proposed a Discriminative RBM (DRBM)
layer(s). The goal of the output layer is to reconstruct the based intrusion detection method, which is a semi-supervised
input. Therefore, the cost function is composed of an average learning system, i.e., the system is trained only by normal
of sum-of-square errors upon all the inputs, a weight decay traffic data. The trained network is tested by real-world traffic
term to avoid over-fitting, and a sparsity penalty term to main- collected from a workstation for 24 hours and KDD CUP 1999
tain a low activation values. Using the trained DL network, the dataset, respectively, both of which include normal and anoma-
testing traffic is classified as two types, i.e., normal traffic and lous traffic. Simulation results show that, when the learning
anomalous traffic. Simulations showed that the STL scheme system is trained and tested with the real-world data, a high
achieves 88.39% accuracy for 2-class detection (normal and accuracy (about 94%) is obtained. However, when the DBRM
anomaly), and 79.1% accuracy for 5-class detection (normal is trained with KDD dataset and tested with real-world data,
and four different attack categories), which are higher than the accuracy is as low as 84% around.
the accuracies achieved by the Soft-Max Regression (SMR) If we limit the node connections only between the layers,
scheme. the DBM is transformed to Deep Believe Network (DBN).
To reduce the number of features the learning system uses, Alternatively, DBN can be formed by cascading a stack of
Tang et al. [110] proposed a DL-based NID scheme for Restricted Boltzmann Machine in serial. Furthermore, one or
software defined networking (SDN). The data used to train more additional layers can be added to perform classifica-
the learning network is also chosen from NSL-KDD dataset. tion after a supervised learning process. Therefore, DBN is
However, only six features of each traffic flow are considered often pre-trained by unlabeled data (unsupervised learning)
in order to reduce the computation cost, i.e., flow’s duration, and then fine-tuned by labeled data (supervised learning), as
protocol type (e.g., TCP, UDP), number of data bytes from shown in Fig. 15(c). Gao et al. [114] used a DBN to detect
source to destination, number of data bytes from destination network intrusion, for which the learning network is trained
to source, number of connections to the same host, and num- with KDD CUP 1999 dataset via three stages. The first stage
ber of connections to the same service. A deep neural network pre-processes data, for which the features of the traffic are dig-
with three hidden layers is adopted. Although there are four itized and normalized. The second stage pre-trains the DBN,
malicious traffic types in the dataset, this scheme only clas- i.e., the weights of a stack of RBMs were learned through an
sifies the traffic as two types (normal and anomalous), and unsupervised greedy contrastive divergence algorithm. Finally,
it achieved an accuracy of 74.67%. Compared the schemes the weights of the entire DBN are fine-tuned through the back-
in [106] and [110], it can be seen that the detection accuracy propagation of error derivatives by the labeled data. In their
depends on the traffic features the system used for training simulations, 41 features of each KDD CUP 1999 traffic flow
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2613
TABLE V
C OMPARISON OF D EEP L EARNING A PPLICATIONS IN I NTRUSION D ETECTION
106
110
113
114
116
are first mapped to 122 attributes, then several DBNs with two-type detection (normal and anomaly) and 79.1% accuracy
different structures are established. Each DBN in their simula- with five-type detection (normal and four anomaly types).
tions has 122 input elements and 5 output elements (1 normal Thus, designers need to carefully balance the detection accu-
traffic and 4 different attack traffics). However, the hidden racy and the number of detection types. 2) How to select the
layer structures are different, varying from as shallow as 122-5 network features as the input of the DL network. Many cur-
(no hidden layer) to as deep as 122-150-90-50-5 (three hid- rent DL-based intrusion detection schemes use KDD dataset,
den layers with 150, 90, 50 nodes respectively). Apparently, which provides 41 features for each traffic flow. Apparently,
the deeper the DBN becomes, the better detection accuracy using all the 41 features for DL analysis yields a huge bur-
can be achieved. For the 122-150-90-50-5 DBN, the accu- den from the computation’s point of view. Therefore, many
racy reaches 93.49%, outperforming SVM (86.82%) and NN NID schemes select a part of features to detect intrusion. For
(82.3%). Besides, Alom et al. [115] applied the similar method instance, Niyaz et al.’s scheme [106] uses 22 features while
to NSL-KDD dataset. Tang et al.’s scheme [110] uses only 6 features. As a con-
Using similar DBN structure as shown in Fig. 15(c), sequence, Niyaz et al.’s scheme achieves 88.39% accuracy
Kang and Kang [116] proposed an intrusion detection scheme with two-type detection, while Tang et al.’s scheme yields
for In-Vehicle Controller Area Network (CAN). Each CAN 74.67% accuracy. 3) What dataset is used for DL network
packet includes 12 bits of arbitration field, 6 bits of con- training. As we see from Table V, most current schemes use
trol field, 0-8 bytes of data field, etc. Kang and Kang’s NSL-KDD dataset, which is an improved version of KDD
scheme utilizes the data field as the learning object. The data Cup 99 dataset (proposed in 1999). NSL-KDD dataset reduces
field is composed of mode information (such as controlling some redundant records of the KDD Cup 99 dataset, which
wheels) and value information (such as the wheel angle) of makes the sizes of training set and testing set reasonable. The
the Electronics Control Unit (ECU), and they yield differ- NSL-KDD dataset includes normal traffic and four attacking
ent bit distributions. Since there are different attack scenarios, categories, i.e., DoS, U2R, R2L, and probing. However, with
the learning system first uses the mode information to iden- the accelerated development of network attack techniques,
tify the attack scenarios, then trains the learning network for new intrusion methods appear with astonishing speed. The
each scenario. The DBN has less than 64 input nodes (each DL models trained by KDD dataset may yield deteriorated
node corresponds to a bit of the data field but with reduced performance in detecting real-world data, as shown in [113].
number of bits considering the semantics redundancy), sev-
eral hidden layers, and 2 output nodes (indicating normal and
anomalous scenarios). In the testing phase, the attack sce- VII. DL-BASED W IRELESS P LATFORM I MPLEMENTATION
nario of each CAN packet is first determined by matching the There are abundant DL implementation methods, some of
mode information, then the corresponding trained model is which have been performed in wireless networks. In the fol-
used to determine whether the packet is normal or anomalous. lowing, a summary of DL implementations is presented. Those
Experiments show that the scheme achieves 97.8% accuracy, methods have been used in wireless testbeds.
outperforming Support Vector Machine (SVM) and Artificial • 1) MATLAB Neural Network Toolbox: This toolbox
Neural Network (ANN). includes the most popular DL algorithms, such as
Discussion: A comparison towards the DL-based intrusion ANN, CNN, DBN, SAE, and convolutional autoencoders
detection is shown in Table V. Note that for the output clas- (CAE). The input layer takes the original raw data. The
sification number, the number of 2 indicates normal traffic hidden layers perform convolution, pooling, or ReLU
and anomaly traffic, and the number of n (n > 2) indi- functions upon the raw data. The convolution opera-
cates a normal traffic and n − 1 different anomaly traffic tion is composed of a set of convolutional filters, which
types. From the comparison we see that there are several extract certain features from the input data. The pooling
challenges in the topic. 1) How many intrusion types the operations perform the following operations: nonlinear
DL network detects. Apparently, the more types the network sampling upon the output of the convolutional filters,
detects, the lower accuracy will be given. For instance, reducing the dimensions of the parameters, and con-
Niyaz et al.’s scheme [106] achieves 88.39% accuracy with trolling the complexity of the deep learning network.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2614 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
The ReLUs map negative values to zero and keep pos- For those applications, developers implemented the
itive values, thereby improving the training efficiency. algorithms by customized DL systems. The language
By repeating these three functions in the hidden layer used to develop these systems varies from C, C++,
and training the parameters of the functions, specific MATLAB, Python, to Java, depending on the features
features are extracted efficiently for the classification pur- of the learning system, the library used by the learning
pose. Then, the output layer performs classification upon process, and the communication patterns with the other
the features. A softmax function is typically adopted for simulators (such as wireless network simulators).
classification. In addition, there are lots of other popular deep learning
• 2) TensorFlow [117]: It is an open-source software orig- software, such as MXNet [120], developed by Distributed
inally developed by Google Brain team. TensorFlow is Machine Learning Community, Torch [121], Microsoft
written in Python, C++, and CUDA, and it is supported Cognitive Toolkit [122], developed by Microsoft Research, etc.
by Linux, macOS, Windows, and Android systems. It is However, few of them can be found in network applications.
a symbolic math library composed of nodes and edges. Table VI presents a comparison of deep learning software
The nodes in the graph represent mathematical opera- platforms used in wireless networks.
tions, and the edges represent the connections (tensors)
between nodes. TensorFlow is a flexible, flow-based pro- VIII. F UTURE R ESEARCH T RENDS
gramming model. Although it is originally developed
In order to help current researchers to identify unsolved
for conducting ML and deep neural network algorithms,
issues in this important field, in this section we will explain
TensorFlow is capable of conducting many other flow-
10 challenging issues and point out the future research
based implementations.
trends. Although those 10 issues do not represent all the
• 3) Caffe (Convolutional Architecture for Fast Feature
unsolved research topics on DL-based wireless networking,
Embedding) [43]: It is an open-source software tool,
they have long-term dominant impacts on today’s popular
and was developed by Berkeley AI Research (BAIR)
wireless infrastructures, including cognitive radio networks,
and community contributors. Caffe is a DL framework
software-defined networks, dew/cloud computing, big data
targeting image classification and segmentation. It has
networks, etc.
the features of expressive architecture, extensible code,
high speed, and modularity. Caffe supports CNN, RCNN,
LSTM and has fully connected neural network struc- A. (Challenge 1) DL for Transport Layer Optimizations
tures. There are a variety of functions to be chosen Congestion control is the main function of transport layer.
to build a DL network using Caffe, including con- However, the existing congestion control methods are mostly
volution, pooling, inner products, ReLU, logistic unit, based on the end-to-end ACK or NACK feedback to indi-
local response normalization, element-wise operations, rectly deduce the congestion occurrences. For example, TCP
softmax, and hinge. uses ACK feedback to infer the congestion event. The most
• 4) Theano (Keras) [118]: It an open-source Python accurate way is to directly analyze the queues in each node of
library that allows users to symbolically define, optimize, the routing path to pinpoint exactly which node’s queue has
and evaluate mathematical expressions such as multi- overflow event, which indicates the congestion in that node.
dimensional arrays. Users can use Theano to implement Apparently, a single node’s queue cannot reflect the conges-
and train NN models on fast concurrent graphics pro- tion distribution in the entire path. It is important to perform
cessing unit (GPU) architectures. The network is built multi-queue co-modeling between different nodes to detect
by apply nodes and variable nodes, which represent the ‘congestion propagation’. For example, one node may
mathematical operations and tensors, respectively. have very light congestion (i.e., overflow occurs sparsely) in
• 5) Keras [102]: It is an open-source neural network an earlier stage; however, multiple sparse congestions could
library that can run upon TensorFlow, CNTK, Theano, be accumulated into a serious congestion later on another
etc. Keras is originally developed by a Google engi- node. Multi-queue co-modeling can help in finding such an
neer, Francois Chollet. It provides neural network ele- accumulation pattern.
ments such as layers, objectives, optimizers, activation Particularly, DL can be used to perform large-scale network
functions, etc., and it supports convolutional networks, queueing analysis. Assume that each node reports their queue
recurrent networks, and the combination of the two types. status (such as size, input traffic rate, outgoing traffic rate,
In 2017, Google’s TensorFlow team decided to support etc.) to a central node via an out-of-band control channel, the
Keras in TensorFlow’s core library. central node can then run DL to analyze the queues’ data
• 6) WILL [119]: It is an open-source neural network accumulation status. For instance, it can find out whether the
toolkit created by Prevision Limited Company (Hong traffic gets accumulated in a particular node’s queue and may
Kong). It supports convolution, pooling layers, full- cause overflow with a high probability. DL can also help to
connection, and some popular functions such as reLU, find an optimal solution to relieve the congestion situation. For
sigmoid, tanh and softmax. example, it can find a node with relatively small queue size
• 7) Customized models: Many DL-based networks during most of the time, and that node may accept a higher
use special functions or neural network structures, incoming traffic rate; or, it may find another set of nodes near
and some systems run upon embedded systems. the RED zone (i.e., a network area with congested queues), to
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2615
TABLE VI
C OMPARISON OF D EEP L EARNING A PPLICATIONS IN I NTRUSION D ETECTION
104
44 57 58 101
46
51 101
70 72
52 59 62 68 79
82 83 90 96
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2616 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2617
TABLE VII
N ETWORK PARAMETERS C ONSIDERED IN C ROSS -L AYER D ESIGN
The DL is a perfect tool to fuse the above various cross-layer DL outputs the suggested performance goal change in AL. For
metrics and extract the intrinsic network patterns for protocol example, if the network can only provide >100ms of end-to-
optimization. By using big tensor concept, we can carefully end delay, it will suggest the AL to use different video coding
arrange the above metrics into multi-type tensor records, and methods to meet the network limits.
then apply tensor decomposition to extract the essential pat- Additionally, DL can be directly used to improve AL
terns. Those patterns can tell us whether the network will performance. For example, it can be used to analyze the
have significant packet loss in the near future, and classify the webpage display performance (refreshing rate, display speed,
network topology into hotspots and light traffic areas. The pat- image resolution, etc.). It can also be used to perform cyber
terns can also indicate the link interference distribution across security analysis to detect spam emails and malicious Web
the whole network, and help us to avoid the high interference sites.
areas. The challenging issue here is to define a low-complexity
Based on the DL pattern extraction results, different lay- DL model based on the AL performance goal or application
ers should co-operate with each other to perform cross-layer profile data, and solve the DL problem to generate a series of
optimization. For example, if the DL indicates that a group useful results that can be interpreted by the lower layers for
of nodes form a high-packet-loss ‘dark hole’, the MAC layer protocol operation control purpose. For example, if the AL has
should use much stronger FEC to overcome the bit errors in a video streaming application, how do we define the AL model
that area; the routing layer can re-establish a new path to to translate the QoS/QoE performance goals into the concrete
detour around such a hole; and the transport layer can use congestion control and routing parameters? How does the AL
much smaller congestion control window size. classify different applications into various cross-layer protocol
design options? etc.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2618 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
as Zigbee-based systems). The fog computing infrastructure Proper DRL models need to be clearly defined based on
consists of a series of long-distance wireless relays such as various CRN operation requirements. For example, the DRL
Wi-Max nodes or cell phone towers, to deliver the aggregated may need to be integrated with queuing models to determine
dew computing data to any cloud server. the spectrum handoff delay, i.e., how long a user can occupy
From security viewpoint, the above dew-fog-cloud archi- the existing channel based on the PU traffic analysis, and when
tecture exposes many attack opportunities to the adversaries. the user should start to look for a new channel, and so on.
For example, one can ‘pollute’ partial dew computing data
by falsifying the sensor data, or mislead the routing path J. (Challenge 10) Efficient DL/DRL Implementations in
selection in the fog computing segments by claiming a better Practical Wireless Platforms
path, and so on.
The above DL/DRL algorithms eventually need to be imple-
To handle the large-scale dew computing sources and con-
mented in practical wireless network products. However, the
current fog computing routing topology, DL is a natural
pure theoretical understandings cannot be simply programmed
choice to parse all the wireless nodes/links parameters and
in the wireless devices due to the following challenges:
deduce the possible attack positions and types. For example,
(1) Difficulty to collect network parameters for DL input
we can use all the dew nodes’ sensor data as the samples,
layers: All DL algorithms require the training and testing
and run DL-based data clustering test to find out a potential
phases. In each phase, the input layer of the deep neural
data sample poisoning attack. The challenging issue here is
network consists of the data samples’ parameters. The more
to clearly define a DL model with the proper input/output
complete the samples are (in terms of data attributes), the
layer interpretations based on a particular network secu-
more accurately the DL can recognize the network features.
rity problem. Different security/privacy problems mean that
Many network parameters come from MAC and routing lay-
the DL should have different input/output/gradient parame-
ers, which involve many relay nodes’ responses. However,
ter updating structures. For example, the privacy preservation
those nodes may not have fast feedback about their commu-
emphasizes the protection of the sensitive data attributes
nication status due to the unpredictable link delays and radio
(such as patients’ names), and various ID-hiding models
interference. Therefore, the DL models should be designed
can be used to define the DL gradient weight updating
to tolerate certain parameter miss or data errors in the input
process.
layers.
(2) The resource limits of the wireless devices: Many wire-
I. (Challenge 9) From DL to DRL: Applications for less products have limited memory and CPU capabilities. They
Cognitive Radio Network Control do not allow complex algorithms to be programmed into their
existing protocols. Since DL has iterative execution nature, it
DL focuses on ‘passive’ data learning to recognize the
may elongate the system response time. The DL algorithms
intrinsic patterns hidden in the data. However, it does not have
should minimize the intermediate computation parameters to
concrete ‘reactions’ for each of the extracted data patterns.
save the memory space. The algorithms should be optimized
Deep reinforcement learning uses Markov decision models to
to reduce the execution time.
guide the choices of different ‘actions’ based on the state tran-
(3) Incomplete training sample collections: DL requires
sition models. Therefore, in many practical applications DRL
complete or nearly complete training samples to accurately
plays more important roles than DL algorithms.
recognize the network patterns. However, the training samples
Here we emphasize the benefits of DRL for cognitive radio
may be very limited due to the difficulty to collect so many
network (CRN) control. The CRN is an important type of
data points for each possible network status. This requires that
wireless network due to its flexible spectrum access, i.e., the
DL should have the capability of adding new samples after the
nodes can grab any available (free) channel to send out data,
failure of recognizing a new pattern. The new added samples
and can timely vacate the channel if the primary user (PU) is
can improve the accuracy of the DL models.
coming back to use the channel again.
In addition, the network engineers/programmers should
DRL can be used to control the following CRN opera-
carefully define the DL data formats since different network
tions: (1) Spectrum sensing: the DRL model can be used to
parameters have very different data attributes and formatting
determine the channel scanning order. Some channels with the
requirements. Some proper numerical representations and data
higher chance of being idle should be scanned first. Note that
normalization methods should be defined clearly to aggregate
spectrum scanning is a time-consuming process if thousands
multiple network parameters into the same DL input layer.
of channels need to be scanned and analyzed. By first check-
ing the free channels, we can save spectrum sensing time.
(2) Spectrum handoff: When the PU comes back, the node IX. C ONCLUSION
should switch to other channels. The channel switching tim- This paper has comprehensively reviewed the method-
ing and which channel to switch to, are two critical issues ologies of applying DL schemes for wireless network
to be solved. Should we wait for the PU’s coming event to performance enhancement. In a nut shell, (1) DL/DRL is very
decide the channel switching operation, or can we predict the useful for intelligent wireless network management due to
timing of PU’s transmissions and search for backup channels its human-brain-like pattern recognition capability. With the
beforehand? Obviously, the latter has a better communication hardware performance improvement of today’s wireless prod-
quality and can avoid some packet loss events. ucts, its adoption becomes easier. (2) It plays critical roles in
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2619
multiple protocol layers. We have summarized its applications [18] M. A. Alsheikh, S. Lin, D. Niyato, and H.-P. Tan, “Machine learning
in physical, MAC and routing layers. It makes the network in wireless sensor networks: Algorithms, strategies, and applica-
tions,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 1996–2018,
more intelligently realize the change of the entire topology and 4th Quart., 2014.
link conditions, and helps to generate more appropriate pro- [19] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Machine
tocol parameter controls. (3) It can be integrated with today’s learning for wireless networks with artificial Intelligence: A tutorial on
neural networks,” eprint arXiv: 1710.02913, Oct. 2017.
various wireless networking schemes, including CRNs, SDNs, [20] J. Chen, U. Yatnalli, and D. Gesbert, “Learning radio maps for UAV-
etc., to achieve either centralized or distributed resource allo- aided wireless networks: A segmented regression approach,” in Proc.
cation and traffic balancing functions. This article also lists IEE Int. Conf. Commun. (ICC) Signal Process. Commun. Symp., Paris,
France, May 2017, pp. 1–6.
ten important research issues that need to be solved in the [21] Y. Xiao, Z. Han, D. Niyato, and C. Yuen, “Bayesian reinforcement
near future in this field. They cover some promising wireless learning for energy harvesting communication systems with uncer-
applications such as network swarming, CRN spectrum hand- tainty,” in Proc. IEEE Int. Conf. Commun. (ICC) Next Gener. Netw.
Symp., London, U.K., Jun. 2015, pp. 5398–5403.
off, SDN flow table update, dew/fog computing security, etc. [22] M. Bennis and D. Niyato, “A Q-learning based approach to interference
This paper will help readers to understand the state-of-the-art avoidance in self-organized femtocell networks,” in Proc. IEEE Glob.
of DL-enhanced wireless networking protocols and find some Commun. Conf. (GLOBECOM) Workshops Femtocell Netw., Miami,
FL, USA, Dec. 2010, pp. 706–710.
interesting and challenging research topics to pursue in this [23] M. Chen et al., “Caching in the sky: Proactive deployment of cache-
critical field. enabled unmanned aerial vehicles for optimized quality-of-experience,”
IEEE J. Sel. Areas Commun., vol. 35, no. 5, pp. 1046–1061, May 2017.
[24] M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for
proactive caching in cloud-based radio access networks with mobile
R EFERENCES users,” IEEE Trans. Wireless Commun., vol. 16, no. 6, pp. 3520–3535,
Jun. 2017.
[1] J. Patterson and A. Gibson, Deep Learning: A Practitioner’s Approach. [25] T. Serre, L. Wolf, and T. Poggio, “Object recognition with features
Sebastopol, CA, USA: O’Reilly Media, 2017. inspired by visual cortex,” in Proc. IEEE Comput. Soc. Conf. Comput.
[2] G. Hinton et al., “Deep neural networks for acoustic modeling in speech Vis. Pattern Recognit. (CVPR), , CA, USA, Jun. 2005, pp. 994–1000.
recognition,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, [26] T. V. Maia, “Reinforcement learning, conditioning, and the brain:
Nov. 2012. Successes and challenges,” Cognit. Affective Behav. Neurosci., vol. 9,
[3] T. N. Sainath, A.-R. Mohamed, B. Kingsbury, and B. Ramabhadran, no. 4, pp. 343–364, Dec. 2009.
“Deep convolutional neural networks for LVCSR,” in Proc. IEEE [27] V. Mnih et al., “Human-level control through deep reinforcement
Int. Conf. Acoust. Speech Signal Process. (ICASSP), Vancouver, BC, learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
Canada, May 2013, pp. 8614–8618. [28] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8,
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica- nos. 3–4, pp. 279–292, May 1992.
tion with deep convolutional neural networks,” in Proc. 25th Int. Conf. [29] S. S. Sonawane and P. A. Kulkarni, “Graph based representation and
Neural Inf. Process. Syst. (NIPS), vol. 1, Dec. 2012, pp. 1097–1105. analysis of text document: A survey of techniques,” Int. J. Comput.
[5] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierar- Appl., vol. 96, no. 19. pp. 1–8, Jun. 2014.
chical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. [30] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst,
Intell., vol. 35, no. 8, pp. 1915–1929, Oct. 2013. “Geometric deep learning: Going beyond Euclidean data,” IEEE Sig.
[6] J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a Proc. Mag., vol. 34, no. 4, pp. 18–42, May 2017.
convolutional network and a graphical model for human pose estima- [31] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks
tion,” in Proc. 27th Int. Conf. Neural Inf. Process. Syst. (NIPS), vol. 1. and deep locally connected networks on graphs,” in Proc. 2nd Int. Conf.
Montreal, QC, Canada, Dec. 2014, pp. 1799–1807. Learn. Represent. (ICLR), Banff, AB, Canada, Apr. 2014, pp. 1–14.
[7] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, [32] Y. Hechtlinger, P. Chakravarti, and J. Qin, “A generalization
pp. 436–444, May 2015. of convolutional neural networks to graph-structured data,” eprint
[8] R. Collobert et al., “Natural language processing (almost) from arXiv:1704.08165, Apr. 2017.
scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493–2537, Aug. 2011. [33] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks on
[9] A. Bordes, J. Weston, and S. Chopra, “Question answering with sub- graph-structured data,” eprint arXiv:1506.05163, Jun. 2015.
graph embeddings,” in Proc. Conf. Empir. Methods Nat. Lang. Process. [34] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neu-
(EMNLP), Doha, Qatar, Oct. 2014, pp. 615–620. ral networks on graphs with fast localized spectral filtering,” in Proc.
[10] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning Conf. Adv. Neural Inf. Process. Syst. (NIPS), vol. 29. Barcelona, Spain,
with neural networks,” in Proc. 27th Int. Conf. Neural Inf. Process. Dec. 2016, pp. 3837–3845.
Syst. (NIPS), Montreal, QC, Canada, Dec. 2014, pp. 3104–3112. [35] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
convolutional networks,” in Proc. 5th Int. Conf. Learn. Represent.
[11] S. Jean, K. Cho, R. Memisevic, and Y. Bengio, “On using very large
(ICLR), Toulon, France, Apr. 2017, pp. 1–14.
target vocabulary for neural machine translation,” in Proc. 53rd Annu.
[36] J. Lee, H. Kim, J. Lee, and S. Yoon, “Transfer learning for deep learn-
Meeting Assoc. Comput. Linguist. (ALC), Beijing, China, Jul. 2015,
ing on graph-structured data,” in Proc. 31st AAAI Conf. Artif. Intell.
pp. 1–10.
(AAAI), San Francisco, CA, USA, Feb. 2017, pp. 1–7.
[12] D. Silver et al., “Mastering the game of go with deep neural networks [37] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.
and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
[13] J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik, “Deep [38] V. R. Cadambe and S. A. Jafar, “Interference alignment and degrees of
neural nets as a method for quantitative structure-activity relationships,” freedom of the K -user interference channel,” IEEE Trans. Inf. Theory,
J. Chem. Inf. Model., vol. 55, no. 2, pp. 263–274, Jan. 2015. vol. 54, no. 8, pp. 3425–3441, Aug. 2008.
[14] M. Helmstaedter et al., “Connectomic reconstruction of the inner [39] N. Zhao, F. R. Yu, M. Jin, Q. Yan, and V. C. M. Leung, “Interference
plexiform layer in the mouse retina,” Nature, vol. 500, no. 7461, alignment and its applications: A survey, research issues, and chal-
pp. 168–174, Oct. 2014. lenges,” IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 1779–1803,
[15] M. K. K. Leung, H. Y. Xiong, L. J. Lee, and B. J. Frey, “Deep learning 3rd Quart., 2016.
of the tissue-regulated splicing code,” Bioinformatics, vol. 30, no. 12, [40] Y. He, C. Liang, F. R. Yu, N. Zhao, and H. Yin, “Optimization of
pp. 121–129, Jun. 2014. cache-enabled opportunistic interference alignment wireless networks:
[16] H. Y. Xiong, B. Alipanahi, L. J. Lee, H. Bretschneider, and D. A big data deep reinforcement learning approach,” in Proc. IEEE Int.
Merico, “The human splicing code reveals new insights into the genetic Conf. Commun. (ICC), Paris, France, May 2017, pp. 1–6.
determinants of disease,” Science, vol. 347, no. 6218, pp. 144–151, [41] G. Han, L. Xiao, and H. V. Poor, “Two-dimensional anti-jamming com-
Jan. 2015. munication based on deep reinforcement learning,” in Proc. IEEE Int.
[17] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Conf. Acoust. Speech Signal Process. (ICASSP), New Orleans, LA,
Cambridge, MA, USA: MIT Press, 1998. USA, Mar. 2017, pp. 2087–2091.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
2620 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 4, FOURTH QUARTER 2018
[42] S. Peng, H. Jiang, H. Wang, H. Alwageed, and Y.-D. Yao, “Modulation [66] L. Liu, Y. Cheng, L. Cai, S. Zhou, and Z. Niu, “Deep learning based
classification using convolutional neural network based deep learning optimization in wireless network,” in Proc. IEEE Int. Conf. Commun.
model,” in Proc. 26th Wireless Opt. Commun. Conf. (WOCC), Newark, (ICC), Paris, France, May, 2017, pp. 21–25.
NJ, USA, Apr. 2017, pp. 1–5. [67] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm
[43] Y. Jia et al., “Caffe: Convolutional architecture for fast feature embed- for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554,
ding,” in Proc. 22nd ACM Int. Conf. Multimedia (MM), Orlando, FL, Jul. 2006.
USA, Nov. 2014, pp. 675–678. [68] T. Hu and Y. Fei, “QELAR: A machine-learning-based adaptive routing
[44] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode lin- protocol for energy-efficient and lifetime-extended underwater sensor
ear codes using deep learning,” in Proc. 54th Annu. Allerton Conf. networks,” IEEE Trans. Mobile Comput., vol. 9, no. 6, pp. 796–809,
Commun. Control Comput. (Allerton), Monticello, IL, USA, Sep. 2016, Jun. 2010.
pp. 341–346. [69] P. Xie, J.-H. Cui, and L. Lao, “VBF: Vector-based forwarding proto-
[45] E. Nachmani, E. Marciano, D. Burshtein, and Y. Be’ery, “RNN col for underwater sensor networks,” in Proc. 5th Int. IFIP-TC6 Conf.
decoding of linear block codes,” eprint arXiv:1702.07560, Feb. 2017. Netw. Technol. Services Protocols (NETWORK.), Coimbra, Portugal,
[46] T. Gruber, S. Cammerer, J. Hoydis, and S. Brink, “On deep learning- May 2006, pp. 1216–1221.
based channel decoding,” in Proc. IEEE 51st Annu. Conf. Inf. Sci. Syst. [70] N. Kato et al., “The deep learning vision for heterogeneous network
(CISS), Baltimore, MD, USA, Mar. 2017, pp. 1–6. traffic control: Proposal, challenges, and future perspective,” IEEE
[47] S. Cammerer, T. Gruber, J. Hoydis, and S. T. Brink, “Scaling deep Wireless Commun., vol. 24, no. 3, pp. 146–153, Jun. 2017.
learning-based decoding of polar codes via partitioning,” in Proc. IEEE [71] I. J. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
Glob. Commun. Conf. (GLOBECOM), Singapore, Dec. 2017, pp. 1–6. Cambridge, MA, USA: MIT Press, Nov. 2016.
[48] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” in [72] B. Mao et al., “Routing or computing? The paradigm shift towards
Proc. IEEE 18th Int. Workshop Signal Process. Adv. Wireless Commun. intelligent computer network packet transmission based on deep learn-
(SPAWC), Sapporo, Japan, Jul. 2017, pp. 1–5. ing,” IEEE Trans. Comput., vol. 66, no. 11, pp. 1946–1960, Nov. 2017.
[49] Y.-S. Jeon, S.-N. Hong, and N. Lee, “Blind detection for MIMO [73] Y. Lee, “Classification of node degree based on deep learning and
systems with low-resolution ADCs using supervised learning,” in Proc. routing method applied for virtual route assignment,” Ad Hoc Netw.,
IEEE Int. Conf. Commun. (ICC), Paris, France, May 2017, pp. 1–6. vol. 58, pp. 70–85, Apr. 2017.
[50] N. Farsad and A. Goldsmith, “Detection algorithms for communication [74] Q. You, Y. Li, M. S. Rahman, and Z. Chen, “A near optimal routing
systems using deep learning,” eprint arXiv:1705.08044, Jul. 2017. scheme for multi-hop relay networks based on Viterbi algorithm,” in
[51] T. O’Shea and J. Hoydis, “An introduction to deep learning for the Proc. IEEE Int. Conf. Commun. (ICC), Ottawa, ON, Canada, Jun. 2012,
physical layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 4531–4536.
pp. 563–575, Dec. 2017. [75] G. Stampa, M. Arias, D. Sanchez-Charles, V. Muntes-Mulero, and
[52] U. Challita, L. Dong, and W. Saad, “Deep learning for proactive A. Cabellos, “A deep-reinforcement learning approach for software-
resource allocation in LTE-U networks,” in Proc. 23rd Eur. Wireless defined networking routing optimization,” eprint arXiv:1709.07080,
Conf. (Eur. Wireless), Dresden, Germany, May 2017, pp. 1–6. Sep. 2017.
[53] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural [76] A. Varga, “The OMNeT++ discrete event simulation system,” in
network regularization,” in Proc. Int. Conf. Learn. Represent. (ICLR), Proc. 15th Eur. Simulat. Multiconf. (ESM), Prague, Czech Republic,
San Diego, CA, USA, May 2015, pp. 1–8. Jun. 2001, pp. 1–7.
[54] R. J. Williams, “Simple statistical gradient-following algorithms for [77] A. Varga and R. Hornig, “An overview of the OMNeT++ simulation
connectionist reinforcement learning,” Mach. Learn., vol. 8, nos. 3–4, environment,” in Proc. 1st Int. Conf. Simulat. Tools Techn. Commun.
pp. 229–256, May 1992. Netw. Syst Workshops (Simutools), Marseille, France, Mar. 2008,
[55] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient pp. 1–10.
methods for reinforcement learning with function approximation,” in [78] A. Mestres et al., “Knowledge-defined networking training datasets,”
Proc. 12th Int. Conf. Neural Inf. Process. Syst. (NIPS), Denver, CO, Universitat Politecnica de Catalunya, Barcelona, Spain, Oct. 2017.
USA, Dec. 1999, pp. 1057–1063. [Online]. Available: http://knowledgedefinednetworking.org
[56] M. Balazinska and P. Castro, “Characterizing mobility and network [79] A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar, “A machine
usage in a corporate wireless local-area network,” in Proc. 1st Int. learning approach to routing,” eprint arXiv:1708.03074, Aug. 2017.
Conf. Mobile Syst. Appl. Services, San Francisco, CA, USA, May 2003, [80] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust
pp. 303–316. region policy optimization,” in Proc. 32nd Int. Conf. Mach. Learn.
[57] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deep reinforce- (ICML), Lille, France, Jul. 2015, pp. 1–9.
ment learning based framework for power-efficient resource allocation [81] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel,
in cloud RANs,” in Proc. IEEE Int. Conf. Commun. (ICC), Paris, “Benchmarking deep reinforcement learning for continuous control,”
France, May 2017, pp. 1–6. in Proc. 33rd Int. Conf. Mach. Learn. (ICML), New York, NY, USA,
[58] H. Sun et al., “Learning to optimize: Training deep neural networks Apr. 2016, pp. 1329–1338.
for wireless resource management,” in Proc. IEEE 18th Int. Workshop [82] R. Atallah, C. Assi, and M. Khabbaz, “Deep reinforcement learning-
Signal Process. Adv. Wireless Commun. (SPAWC), Sapporo, Japan, based scheduling for roadside communication networks,” in Proc. 15th
Jul. 2017, pp. 1–6. Int. Symp. Model. Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), Paris,
[59] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted France, May 2017, pp. 1–8.
MMSE approach to distributed sum-utility maximization for a MIMO [83] B. Sun, H. Feng, K. Chen, and X. Zhu, “A deep learning framework
interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, of quantized compressed sensing for wireless neural recording,” IEEE
no. 9, pp. 4331–4340, Apr. 2011. Access, vol. 4, pp. 5169–5178, 2016.
[60] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learn- [84] T. Tieleman and G. Hinton, “Lecture 6.5-Rmsprop: Divide the gradient
ing for distributed dynamic spectrum access,” eprint arXiv:1704.02613, by a running average of its recent magnitude,” COURSERA Neural
Apr. 2017. Netw. Mach. Learn., vol. 4, no. 2, pp. 26–31, Oct. 2012.
[61] K. Cohen, A. Leshem, and E. Zehavi, “Game theoretic aspects of the [85] S. O. Haykin, Neural Networks and Learning Machines, 3rd ed.
multi-channel ALOHA protocol in cognitive radio networks,” IEEE J. Harlow, U.K.: Pearson Higher Educ., Nov. 2008.
Sel. Areas Commun., vol. 31, no. 11, pp. 2276–2288, Nov. 2013. [86] L. Bottou, “Large-scale machine learning with stochastic gradient
[62] J. Wang et al., “Spatiotemporal modeling and prediction in cellular descent,” in Proc. 19th Int. Conf. Comput. Stat. (COMPSTAT), Paris,
networks: A big data enabled deep learning approach,” in Proc. IEEE France, Sep. 2010, pp. 177–187.
Conf. Comput. Commun. (INFOCOM), Atlanta, GA, USA, May 2017, [87] J. Zhang et al., “An efficient and compact compressed sensing
pp. 1–9. microsystem for implantable neural recordings,” IEEE Trans. Biomed.
[63] R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Circuits Syst., vol. 8, no. 4, pp. 485–496, Aug. 2014.
Practice. Heathmont, VIC, Australia: OTexts, Sep. 2014. [88] L. Jacques, D. K. Hammond, and J. M. Fadili, “Dequantizing com-
[64] H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, pressed sensing: When oversampling and non-Gaussian constraints
“Support vector regression machines,” in Proc. 9th Int. Conf. Neural combine,” IEEE Trans. Inf. Theory, vol. 57, no. 1, pp. 559–571,
Inf. Process. Syst. (NIPS), Denver, CO, USA, Dec. 1996, pp. 155–161. Jan. 2011.
[65] J. A. K. Suykens and J. Vandewalle, “Least squares support vector [89] Z. Yang, L. Xie, and C. Zhang, “Variational Bayesian algorithm for
machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, quantized compressed sensing,” IEEE Trans. Signal Process., vol. 61,
Jun. 1999. no. 11, pp. 2815–2824, Jun. 2013.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.
MAO et al.: DL FOR INTELLIGENT WIRELESS NETWORKS: COMPREHENSIVE SURVEY 2621
[90] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource manage- [115] M. Z. Alom, V. Bontupalli, and T. M. Taha, “Intrusion detection
ment with deep reinforcement learning,” in Proc. 15th ACM Workshop using deep belief networks,” in Proc. Nat. Aerosp. Electron. Conf.
Hot Topics Netw. (HotNets), Atlanta, GA, USA, Nov. 2016, pp. 50–56. (NAECON), Dayton, OH, USA, Jun. 2015, pp. 339–344.
[91] M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli, “Traffic classification [116] M.-J. Kang and J.-W. Kang, “Intrusion detection system using deep
through simple statistical fingerprinting,” ACM SIGCOMM Comput. neural network for in-vehicle network security,” PLoS ONE, vol. 11,
Commun. Rev., vol. 37, no. 1, pp. 5–16, Jan. 2007. no. 6, pp. 1–17, Jun. 2016.
[92] X. Wang and D. J. Parish, “Optimised multi-stage TCP traffic classifier [117] M. Abadi et al., “TensorFlow: Large-scale machine learning on het-
based on packet size distributions,” in Proc. 3rd Int. Conf. Commun. erogeneous distributed systems,” eprint arXiv:1603.04467, Mar. 2016.
Theory Rel. Qual. Service (CTRQ), Glyfada, Greece, Jun. 2010, [118] R. Al-Rfou et al., “Theano: A Python framework for fast computation
pp. 98–103. of mathematical expressions,” eprint arXiv:1605.02688, May 2016.
[93] R. Sun et al., “Traffic classification using probabilistic neural [119] WILL API. Accessed: Jan. 2018. [Online]. Available:
networks,” in Proc. 6th Int. Conf. Nat. Comput. (ICNC), Yantai, China, https://scarsty.gitbooks.io/will/content/
Aug. 2010, pp. 1914–1919. [120] T. Chen et al., “MXNet: A flexible and efficient machine learning
[94] H. Ting, W. Yong, and T. Xiaoling, “Network traffic classification based library for heterogeneous distributed systems,” in Proc. NIPS Workshop
on kernel self-organizing maps,” in Proc. Int. Conf. Intell. Comput. Mach. Learn. Syst. (NIPS), Barcelona, Spain, Dec. 2016, pp. 1–6.
Integr. Syst. (ICISS), Guilin, China, Oct. 2010, pp. 310–314. [121] R. Collobert, K. Kavukcuoglu, and C. Farabet, “Torch7: A MATLAB-
[95] A. H. Lashkari, G. D. Gil, M. S. I. Mamun, and A. A. Ghorbani, like environment for machine learning,” in Proc. BigLearn NIPS
“Characterization of encrypted and VPN traffic using time-related fea- Workshop (NIPS), Granada, Spain, Dec. 2011, pp. 1–6.
tures,” in Proc. 2nd Int. Conf. Inf. Syst. Security Privacy (ICISSP), [122] Comparison of Deep Learning Software. Accessed: Jun. 2018. [Online].
Rome, Italy, Feb. 2016, pp. 407–414. Available: https://en.wikipedia.org/wiki/Comparison_of_deep_
learning_software
[96] Y. L. Gwon and H. T. Kung, “Inferring origin flow patterns in Wi-Fi
with deep learning,” in Proc. 11th Int. Conf. Auton. Comput. (ICAC),
Philadelphia, PA, USA, Jun. 2014, pp. 73–83.
[97] B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive
field properties by learning a sparse code for natural images,” Nature,
vol. 381, no. 6583, pp. 607–609, Jun. 1996.
Qian Mao received the B.S. degree from the
[98] Y. L. Boureau, J. Ponce, and Y. LeCun, “A theoretical analysis of
Nanjing University of Aeronautics and Astronautics,
feature pooling in visual recognition,” in Proc. 27th Int. Conf. Mach.
Jiangsu, China, in 2000, the M.E. degree from
Learn. (ICML), Haifa, Israel, Jun. 2010, pp. 111–118.
Shanghai Ship and Shipping Research Institute,
[99] L. Ljung, System Identification: Theory for the User, 2nd ed. Shanghai, China, in 2003, and the Ph.D. degree
Englewood Cliffs, NJ, USA: Prentice-Hall, Jan. 1999. in traffic information engineering and control from
[100] Z. Wang, “The applications of deep learning on traffic identification,” Tongji University, Shanghai, in 2006. She is cur-
in Proc. Black Hat USA, Las Vegas, NV, USA, Aug. 2015, pp. 1–10. rently pursuing the Ph.D. degree with the University
[101] M. Lotfollahi, R. S. H. Zade, M. J. Siavoshani, and M. Saberian, “Deep of Alabama, AL, USA. She was an Assistant
packet: A novel approach for encrypted traffic classification using deep Professor with the University of Shanghai for
learning,” eprint arXiv:1709.02656, Sep. 2017. Science and Technology from 2006 to 2015.
[102] F. Chollet et al. Keras: Deep Learning for Humans. Accessed: Her research interests include big data, cyber-physical system security, deep
Jun. 2018. [Online]. Available: https://github.com/fchollet/keras learning, and wireless networks.
[103] B. Yamansavascilar, M. A. Guvensan, A. G. Yavuz, and M. E. Karsligil,
“Application identification via network traffic classification,” in Proc.
Int. Conf. Comput. Netw. Commun. (ICNC), Santa Clara, CA, USA,
Jan. 2017, pp. 843–848.
[104] T. P. Oliveira, J. S. Barbar, and A. S. Soares, “Computer network traffic
prediction: A comparison between traditional and deep learning neural Fei Hu (M’01) received the first Ph.D. degree in
networks,” Int. J. Big Data Intell., vol. 3, no. 1, pp. 28–37, Jan. 2016. signal processing from Tongji University, Shanghai,
[105] E. Hodo, X. Bellekens, A. Hamilton, C. Tachtatzis, and R. Atkinson, China, in 1999, and the second Ph.D. degree in
“Shallow and deep networks intrusion detection system: A taxonomy electrical and computer engineering from Clarkson
and survey,” eprint arXiv:1701.02145, Jan. 2017. University, New York, NY, USA, in 2002. He is cur-
[106] Q. Niyaz, W. Sun, A. Y. Javaid, and M. Alam, “A deep learning rently a Professor with the Department of Electrical
approach for network intrusion detection system,” in Proc. 9th EAI and Computer Engineering, University of Alabama,
Int. Conf. Bio Inspired Inf. Commun. Technol. (BICT), New York, NY, Tuscaloosa, AL, USA. He has published over 200
USA, Dec. 2015, pp. 21–26. journal/conference papers and book (chapters) in the
[107] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learn- field of wireless networks and machine learning. His
ing: Transfer learning from unlabeled data,” in Proc. 24th Int. Conf. research interests are wireless networks, machine
Mach. Learn. (ICML), Corvallis, OR, USA, Jun. 2007, pp. 759–766. learning, big data, network security and their applications. His research has
[108] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed been supported by U.S. NSF, DoE, DoD, Cisco, and Sprint.
analysis of the KDD CUP 99 data set,” in Proc. 2nd IEEE Symp.
Comput. Intell. Security Defence Appl. (CISDA), Ottawa, ON, Canada,
Jul. 2009, pp. 1–6.
[109] (2017). NSL-KDD Dataset. [Online]. Available: http://nsl.cs.unb.ca/nsl-
kdd/ Qi Hao received the B.E. and M.E. degrees in
[110] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho, electrical and computer engineering from Shanghai
“Deep learning approach for network intrusion detection in soft- Jiao Tong University, Shanghai, China, in 1994 and
ware defined networking,” in Proc. Int. Conf. Wireless Netw. Mobile 1997, respectively, and the Ph.D. degree in electri-
Commun. (WINCOM), Fes, Morocco, Oct. 2016, pp. 1–6. cal and computer engineering from Duke University,
[111] R. Salakhutdinov and G. Hinton, “Deep Boltzmann machines,” Artif. Durham, NC, USA, in 2006. He was Post-Doctoral
Intell., vol. 5, no. 2, pp. 448–455, Jan. 2009. Fellow with the Center for Visualization and Virtual
[112] G. E. Hinton, “A practical guide to training restricted Boltzmann Environment, University of Kentucky, Lexington,
machines,” Momentum, vol. 9, no. 1, pp. 1–21, Aug. 2010. KY, USA. From 2007 to 2014, he was an Assistant
[113] U. Fiore, F. Palmieri, A. Castiglione, and A. D. Santis, “Network Professor with the Department of Electrical and
anomaly detection with the restricted Boltzmann machine,” Computer Engineering, University of Alabama,
Neurocomputing, vol. 122, pp. 13–23, Dec. 2013. Tuscaloosa, AL, USA. He is currently an Associate Professor with the
[114] N. Gao, L. Gao, Q. Gao, and H. Wang, “An intrusion detection model Department of Computer Science and Engineering, Southern University of
based on deep belief networks,” in Proc. 2nd Int. Conf. Adv. Cloud Big Science and Technology, Shenzhen, China. His current research interests
Data (CBD), Nov. 2014, pp. 247–252. include smart sensors, machine learning, and autonomous systems.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 01,2020 at 07:47:46 UTC from IEEE Xplore. Restrictions apply.