Federated Deep Reinforcement Learning For RIS-Assisted

Uploaded by

tahirnaquash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

Federated Deep Reinforcement Learning For RIS-Assisted

Uploaded by

tahirnaquash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1

Federated Deep Reinforcement Learning for RIS-Assisted

Indoor Multi-Robot Communication Systems
Ruyu Luo, Wanli Ni, Graduate Student Member, IEEE, Hui Tian, Senior Member, IEEE,
and Julian Cheng, Senior Member, IEEE

Abstract—Indoor multi-robot communications face two key as the sudden collision, efficiency reduction and operation
challenges: one is the severe signal strength degradation caused restriction. To avoid these potential problems, the reconfig-
by blockages (e.g., walls) and the other is the dynamic envi- urable intelligent surface (RIS) can be deployed to create a
arXiv:2207.08056v1 [cs.RO] 17 Jul 2022

ronment caused by robot mobility. To address these issues, we

consider the reconfigurable intelligent surface (RIS) to overcome smart propagation environment in an enclosed room [4], while
the signal blockage and assist the trajectory design among reducing the hardware cost and system complexity compared
multiple robots. Meanwhile, the non-orthogonal multiple access with active relays [5]. On the other hand, due to the simulta-
(NOMA) is adopted to cope with the scarcity of spectrum neous motion of multiple robots, the traditional deterministic
and enhance the connectivity of robots. Considering the limited strategy is challenging to maintain satisfactory performance
battery capacity of robots, we aim to maximize the energy
efficiency by jointly optimizing the transmit power of the access of such a highly dynamic system [6]. Furthermore, the non-
point (AP), the phase shifts of the RIS, and the trajectory of orthogonal multiple access (NOMA) has been deemed as
robots. A novel federated deep reinforcement learning (F-DRL) a promising technique for enhancing the robot connectivity
approach is developed to solve this challenging problem with and throughput under limited spectrum resources [7], [8]. By
one dynamic long-term objective. Through each robot planning superimposing user signals in different power levels, it is
its path and downlink power, the AP only needs to determine
the phase shifts of the RIS, which can significantly save the of great significance to jointly optimize the power allocation
computation overhead due to the reduced training dimension. for interference reduction in NOMA networks [9], while the
Simulation results reveal the following findings: i) the proposed incorporation of mobile robots and RIS leads to a challenging
F-DRL can reduce at least 86% convergence time compared to energy efficiency maximization problem.
the centralized DRL; ii) the designed algorithm can adapt to the Recently, artificial intelligence has played a critical role
increasing number of robots; iii) compared to traditional OMA-
based benchmarks, NOMA-enhanced schemes can achieve higher in realizing smart resource management and automatic net-
energy efficiency. work control in 6G networks [10]. To deal with the un-
certainty and dynamics, deep reinforcement learning (DRL)
Index Terms—Federated deep reinforcement learning, indoor
robot communication, reconfigurable intelligent surface. is acknowledged as a promising method with a high level
of intelligence in wireless communications [11]. However,
the ever-increasing network scale brings huge communication
I. I NTRODUCTION overhead and unbearable training delay to centralized methods.
WING to their prominent features of flexible deploy-
O ment and high efficiency, intelligent robots have gained
widespread popularity and large-scale implementation in in-
To speed up training and leverage computing capabilities at
the network edge, an innovative paradigm is to implement
DRL in a federated manner [12], which can protect user
door environments, e.g., healthcare surveillance, packet de- privacy and alleviate traffic transmission by only exchanging
livery, house cleaning and automatic industrial production parameters over wireless networks. However, the quality of
[1]. So far, it is still impractical to deploy all intelligent federated training is affected by the channel conditions with all
applications on mobile indoor robots with limited resources training parameters transmitted over wireless networks, thus
such as computing, storage, and batteries [2]. Besides, indoor the wireless network needs to be reliable over the limited
environment presents several challenges in designing energy- spectrum and power resources [13]. Meanwhile, the distributed
efficient trajectories for robots. On the one hand, the line- method may obtain a worse solution due to the loss of global
of-sight (LoS) paths may be severely shields by obstacles information. Therefore, it is necessary to develop an intelligent
that are likely to have non-analytic shapes [3]. The resulting method to maximize energy efficiency in dynamic RIS-assisted
signal strength degradation can lead to undesirable effects such wireless systems.
In this paper, we focus on the energy efficiency problem of
Copyright (c) 2015 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be an RIS-assisted indoor system having multiple mobile robots.
obtained from the IEEE by sending a request to [email protected]. By jointly optimizing the transmit power at the AP, the phase
This work was supported by the National Natural Science Foundation of shifts of the RIS, and the trajectory of robots, a time-coupling
China under Grant 62071068. (Corresponding author: Hui Tian.)
R. Luo, W. Ni and H. Tian are with the State Key Laboratory of resource allocation problem is formulated. Considering the
Networking and Switching Technology, Beijing University of Posts and trade-off between performance and scalability, a federated
Telecommunications, Beijing 100876, China (e-mail: [email protected]; deep reinforcement learning (F-DRL) approach is proposed,
[email protected]; [email protected]).
J. Cheng is with the School of Engineering, The University of British which can accelerate convergence and is robust to the num-
Columbia, Kelowna, BC V1V 1V7, Canada (email: [email protected]). ber of robots. To the best of the authors’ knowledge, this
2

is the first semi-distributed F-DRL algorithm that combines AP

Sub-surface
the centralized RIS configuration with the federated robotic
RIS element
communications. The main contributions of this paper can be RIS controller
summarized as follows: NLoS link h A,k h A,1 h A, K g
LoS link
1) We incorporate RIS into indoor robot communication sys- Obstacle
tems to overcome signal blockage and avoid motion col- Destination
qS , k qD,1
lision. For the maximized energy efficiency of all robots, Starting point hR , K
hR ,k
a non-convex problem is formulated for communication- hR,1
RIS

aware trajectory design. The time-coupling and discrete qD , K Robot K

nature make this problem challenging to solve directly. Robot k
2) We develop an F-DRL method to optimize the AP trans- qS ,1 Robot 1 qD , k qS , K

mit power, RIS phase shifts, and robot trajectory in a Fig. 1: RIS-assisted indoor multi-robot communications
semi-distributed manner. The reduction of control dimen-
sion greatly accelerates the convergence at the training where sk is the transmit symbol for the k-th robot, pk > 0
stage. Benefiting from the decentralized implementation, is the downlink power allocated to the k-th robot, and nk ∼
F-DRL can easily adapt to changes in the robot number. CN (0, σ 2 ) is the additive white Gaussian noise.
3) We conduct numerical experiments to show the superior- To alleviate the interference among robots, we apply the
ity of the proposed F-DRL. Compared to the centralized successive interference cancellation (SIC) technique. Without
DRL, our method takes about 86% less training time and loss of optimality, the channel coefficients of all robots are
is more robust to the dynamic multi-robot environment. ranked by |hK | ≤ · · · ≤ |h2 | ≤ |h1 |. Then, to perform
Simulation results also show that the designed F-DRL can SIC successfully, the transmit power at the AP satisfies the
outperform benchmarks in terms of energy efficiency. following constraint:
2
Xk−1 2
∆k = pk |hk−1 | − pi |hk−1 | ≥ ρmin , ∀k ≥ 2, (2)
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION i=1
where ρmin > 0 is the required gap to distinguish the decoded
A. System Model signal. When the above power constraint is met, the achievable
As illustrated in Fig. 1, we consider an indoor multi-robot downlink data rate at the k-th robot can be obtained by
!
communication system aided by an RIS having M passive 2
|hk | pk
reflecting elements. Using downlink NOMA techniques, the Rk = log2 1 + 2 Pk−1
, ∀k. (3)
|hk | 2
i=1 pi + σ
AP serves K mobile robots1 , denoted by K = {1, 2, . . . , K}.
Since the energy consumed by motion is much larger than
To complete given tasks, we require the k-th robot to move
that consumed by communication, this paper mainly focuses
from a starting position qS,k to a destination qD,k within a
on the motion energy cost. Therefore, the total motion energy
given deadline Tmax . We define qtk as the position of the k-
consumed by the k-th robot is expressed as [2]
th robot at the t-th time slot, where t ∈ Tk = {1, 2, . . . , Tk }
and Tk ≤ Tmax is the total traveling time at the speed v. Ek = E1 Tk v + E2 Tk , ∀k, (4)
For brevity, the time index t is omitted in some parameters. where E1 and E2 are two constants related to the mechanical
We assume that robots update the trajectory each time slot. output power and the transforming loss, respectively [16].
The RIS is divided into N sub-surfaces, denoted by N = Their values depend on the exact robot motion model.
{1, 2, . . . , N }. Let θn ∈ [0, 2π) denote the phase shift of the
n-th sub-surface. Then, the RIS reflection matrix is denoted B. Problem Formulation
by Θ = diag(θN ×1 ⊗ 1(M/N )×1 ) = diag(ejθ1 , . . . , ejθM ) ∈ By optimizing the transmit power at the AP, the phase shifts
CM ×M with M = MR × MR , while MR is the element of the RIS, and the trajectory of robots, this paper aims to
number in the vertical or horizontal direction. In view of maximize the total energy efficiency of all robots during the
the hardware implementation, we consider the practical RIS mission. Subject to the constraints of transmit power, phase
with limited NR = 2b phase shifts [14], where θm ∈ R = shifts and robot mobility, a long-term optimization problem is
{ 12 ∆R , . . . , 2NR2 −1 ∆R }, ∀m, and ∆R = 2π/NR is the phase formulated as
resolution [15]. 1 XTk XK Rkt
Let h̄k ∈ C1×1 , hk ∈ C1×M and g ∈ CM ×1 denote the max (5a)
Θ,Q,p Tk t=1 k=1 Ek
channel coefficients from the AP to the k-th robot, from the s.t. q1k = qS,k , qTk k = qD,k , ∀k, (5b)
RIS to the k-th robot, from the AP to the RIS, respectively.
Then the combined channel coefficient experienced by the k- |htK | ≤ · · · ≤ |ht2 | ≤ |ht1 |, ∀t, (5c)
th robot is given by hk = hk Θg + h̄k . Thus, the received ∆tk ≥ ρmin , ptk > 0, ∀k, ∀t, (5d)
signal at the k-th robot is given by xmin ≤ xtk ≤ xmax , ∀k, ∀t, (5e)
√ X √
yk = hk pk sk + hk pi si + nk , ∀k, (1) ymin ≤ ykt ≤ ymax , ∀k, ∀t, (5f)
i6=k
XK
ptk ≤ Pmax , ∀t, (5g)
1 Withthe results obtained in this paper, the considered system can be easily k=1
extended to multi-antenna cases, which will be included in our future work. θnt ∈ R, ∀n, ∀t, (5h)
3

where Q = [q1 , q2 , . . . , qK ]T denotes the trajectory design of Global state sG Local state sL,1 Local state sL,K
all robots and p = [p1 , p2 , . . . , pK ]T is the power allocation ķ
strategy at the AP. However, the formulated problem (5) is Global decision stage Local
Experience decision
difficult to be solved by existing optimization methods and is Loss function replay memory
stage
Experience replay memory L(w L,1 )
also challenging to be solved optimally, due to the following
ˆ L,1 = w L,1 Experience
reasons. First, multiple optimization variables, {Θ, Q, p}, are Loss function
Gradient
descent
update
w L,1
w
replay
L(w G ) For every
memory
closely coupled in the objective function (5a). Second, the Q-network N Q time steps
Gradient
achievable data rate Rkt is not a continuous function due to Q( sL,1 , aL,1 ; w L,1 ) Fixed Q-targets
descent ˆ G = wG
w Robot 1

wG
the discrete phase shifts and the position-dependent channel update
Q-network
For every
coefficients. Third, the simultaneous motion of multiple robots Q-network N Q time steps
For every
N F time steps
Q( sL, K , aL, K ; w L, K ) Robot K

also makes problem (5) hard to solve even if only the subprob- Q( sG , aG ; w G ) ˆ L,1
w L,1 , w ˆL
wL , w ˆL
wL , w ˆ L, K
w L, K , w
NOMA
lem of trajectory design is considered. To sum up, traditional decoding AP w L = å k =1
K w L,k K
ˆ L = å k =1
,w
ˆ L,k
w
.
K K
one-shot optimization methods do not apply to this dynamic RIS phase shifts
Influence
order

problem with a time-coupling objective. Thus, it is necessary ĸ

Reward Action Robot trajectory & AP transmit power
to develop an intelligent method to address this challenging rG aG
Ĺ Action aL,k for "k ĺ Reward rL,k for "k
problem in an efficient manner.
RIS-assisted Indoor multi-robot communication system

III. P ROPOSED F-DRL A PPROACH Fig. 2: Proposed F-DRL approach for communication-aware
In this section, we develop an F-DRL approach that is trajectory design
capable of accelerating the training process and obtaining high
performance in terms of energy efficiency. As shown in Fig. function, because it is necessary for NOMA to maintain
2, the F-DRL approach is split into two stages: the global the distinctness among different signals.
stage for RIS configuration and the local stage for joint robot B. Local Decision Stage
trajectory and transmit power control. At the local decision stage, each robot determines its trajec-
A. Global Decision Stage tory and downlink transmit power with local state information.
At the global decision stage, the AP adjusts the RIS config- Because the control dimension of centralized DRL multiplies
uration with global state information. Specifically, we define with the increase of robots, we propose to train robots locally
the phase shift design problem as a Markov decision process and then aggregate a global model in a federated manner. The
(MDP), denoted by a transition tuple having three elements: MDP of local transition tuple hSL,k , AL,k , RL,k i maintained
hSG , AG , RG i, where SG is the state space, AG is the action by the k-th robot is defined as follows.
t
• State space: Let sL,k ∈ SL,k , ∀t. Then, the local state is
space, and RG is the reward.
t defined as
• State space: Let sG ∈ SG , ∀t. Since the combined
stL,k = qtk , h̄A,k
t

channel coefficients (hk )k∈K remain unknown before , ∀k, ∀t, (9)
the RIS phase shifts where the local state stL,k is a part of the global state stG .
are designed, the coefficients of
AP-robot links h̄k k∈K are considered as the channel • Action space: Let atL,k ∈ AL,k , ∀k, ∀t. Then, the local
features. Thus, the global state is defined as action for trajectory design and transmit power control is
stG = qtk , h̄tk | ∀k ∈ K , ∀t,

(6) defined as
atL,k = otk , ptk , ∀k, ∀t,

(10)
where the position qtk can be obtained by the simultane-
ous localization and mapping algorithm [17]. Meanwhile, where the k-th robot orientation ok ∈ {n, s, e, w} intends
the continuous 2D map is discretized into grids with the that robots move in four directions, i.e., north, south,
length of ∆S , while sampling positions are in the center east or west. To satisfy constraints in (5c), (5d) and
of each grid and satisfy constraints in (5e) and (5f).2 (5g), the first robot must guarantee p1 < Pmax /2K−1 .
• Action space: Let atG ∈ AG , ∀t. Then, the action for RIS Inspired by the discrete power control, we have pk ∈
phase shift design is defined as {Pmax /2, . . . , Pmax /2NP } and NP ≥ K.
atG = θnt | ∀n ∈ N , ∀t,

(7) • Reward: To maximize the energy efficiency, we define
t
the local reward rL,k as
where θn ∈ R is the discrete phase shift adopted by the t
n-th RIS sub-surface. rL,k = φRkt + ψRD,k
t
+ Rtime + Rgoal , ∀k, ∀t, (11)
t t−1 t
• Reward: With the aim of maximizing the sum rate, the where the guidance reward = − RD,kfor dD,k dD,k
reward is defined as t
XK t ≥ 2 and dD,k is the distance between the k-th robot
t
rG = τ1 Rkt , ∀t, (8) and its destination at the t-th time slot. The guidance
k=1 t
t reward RD,k leads the k-th robot to reach its destination.
where τ1 is a constant. Let rG < 0 to avoid robots
Moreover, the time cost Rtime is a constant and Rtime <
wondering. Additionally, it is inappropriate to put the sum
0. If the k-th robot arrives at its destination, it will gain a
of combined channel coefficients (hk )k∈K into the reward
positive reward Rgoal ; otherwise we have Rgoal = 0. In
2 Using the default track curve [18], the discrete sampling points can be this paper, the parameter φ must guarantee Rtime +φRkt <
reconstructed into continuous curves. 0 in most cases to prevent robots from wandering.
4

Algorithm 1 Proposed F-DRL Approach TABLE I: Parameter Settings

1: Initialize the environment E and DQN agents. Parameter Value Parameter Value
2: for episode e = 1 : Ne do ∆S 0.5 m α 0.0001
3: for time step t = 1 : Te do |D0 | 128 NF 25
4: Select RIS phase shifts atG with QG (stG , atG ; wG t
). Rtime −1 Rgoal {0, 100}
t t t
5: Robot k∈K selects aL,k with QL,k (sL,k , aL,k ; wL,k ). τ1 0.1 v 0.5 m/s
6: for DQN agent i ∈ K ∪ {kG } do E1 7.4 E2 0.29
7: Execute action, get reward and reach next state.
8: Store the transition in reply memory Di .
9: Sample random mini-batch Di,0 from Di .
10: Perform the gradient descent step to update wit .
11: Reset ŵit = wit every NQ time steps.
12: end for
13: for each robot k ∈ K do
t t
14: Upload wL,k ,ŵL,k to the AP every NF time steps.
15: if receive new global weights wLt , ŵLt then
t
16: Download weights wL,k = wLt ,ŵL,k
t
= ŵLt .
17: end if
18: end for
19: AP aggregates global weights every NF time steps.
20: end for
21: end for
Fig. 3: Convergence comparison versus episodes

C. Global Aggregation (4) Experience storage: agents obtain rewards and store tran-
Take training deep Q-network (DQN) as an example. All sitions. Algorithm 1 shows the detailed training procedure of
agents collaboratively build a shared DQN, where the replay the proposed F-DRL approach. On account of the interaction
memory D and -greedy policy are considered. For each DQN between the local agents and the global agent, the proposed
agent i ∈ K ∪ {kG }, the online Q-network and the target F-DRL approach operates in a semi-distributed manner.
Q-network are defined as Q(sti , ati ; wit ) and Q(sti , ati ; ŵit ), 2) Complexity Analysis: By reducing the control dimen-
respectively. To update the online Q-network, each agent sion, the complexity of F-DRL is lower than that of cen-
performs the gradient descent step with a learning rate α > 0 tralized learning. More precisely, the complexity for DQN
on the loss function. Meanwhile, the target Q-network reset using one-dimensional replay memory is O(1). The com-
ŵit = wit every NQ time steps. putational complexity of each agent mainly depends on the
Besides, the k-th robot trains networks locally and uploads transition and back-propagation, which can be calculated by
relevant weights wL,k , ŵL,k every NF time steps during O (|D| + abE|D0 |), where a, b and E denote the number of
the local decision stage. At each aggregation step, all robots layers, the transitions in each layer and the number of episodes,
upload local weights to the AP at the t-th time slot, and the respectively. Moreover, the action space size of F-DRL at
AP aggregates the global weights wLt and ŵLt as the global and local decision stage are (NR )N and (4NP )K ,
1 XK 1 XK respectively, but that of centralized DRL is (4NP )K ×(NR )N .
wLt = t
wL,k , ŵLt = t
ŵL,k , ∀k, ∀t. (12) Therefore, the proposed F-DRL has a lower complexity as
K k=1 K k=1
Then, the updated global weights are sent back to local robots compared to centralized DRL. The theoretical analysis of F-
at the next time step until convergence. DRL convergence has been completed in [19]. A detailed
Compared to traditional optimization algorithms, the pro- proof is omitted here for brevity. In the following, we conduct
posed intelligent approach can adapt to the uncertainty and experiments to show the convergence behavior of F-DRL.
dynamics of indoor systems. Moreover, due to the semi-
distributed training and decentralized execution, the proposed
IV. N UMERICAL R ESULTS
F-DRL approach can significantly reduce the communication
overhead and effectively alleviate privacy leakage. In this section, we verify the efficiency and robustness of the
1) Overall Training Methodology: As shown in Fig. 2, the proposed F-DRL approach for the considered communication
proposed F-DRL approach has four steps. (1) State observa- system. In the simulation, the robots are randomly located,
tion: agents observe the environmental states. (2) RIS action while the AP and the RIS are located at (15, 30, 2) and
execution: the AP controls RIS phase shifts atG according to (30, 7.5, 2), respectively. The maximum transmit power of the
QG (sG , aG ; wG ) obtained at the global decision stage, and AP is Pmax = 20 dBm and the noise power spectral density
determines the NOMA decoding order. (3) Robot action exe- is N0 = −100 dBm/Hz. The channel model is the same as
cution: the k-th robot decides its action atL,k of the orientation the settings in [20]. Other parameters are given in Table I. For
and downlink transmit power based on QL,k (sL,k , aL,k ; wL,k ). comparison, we consider the following baselines:
5

(a) MR = 0 (b) MR = 20

Fig. 4: Trajectory of robots under different values of MR , where the red, blue and yellow points denote the robot trajectory
using QoS-based energy efficiency (EE) policy, and the black markers denote the trajectory using QoE-based EE policy.

between the robots and the AP. The entire bandwidth

is equally divided by robots, and Rk = 1/K log2 (1 +
|hk |2 pk
σ 2 /K ) is the downlink data rate of the k-th robot.
• Baseline 3 (QoE-based energy efficiency policy [22]): Us-
ing the quality of experience (QoE) metric to evaluate the
performance of each robot, we have ηk = C1 lg(Rk )+C2 ,
where C1 and C2 are constants. Meanwhile, we replace
Rk with ηk in the reward returned back to each agent.
In Fig. 3, the convergence performance of the proposed
F-DRL is shown, where the total rewards versus training
episodes under different schemes are compared. We consider
the system with MR = 30, NR = 4, NP = 6, N = 1 but
different K. When K = 2, we find the proposed F-DRL
Fig. 5: Energy efficiency versus power budget Pmax takes at least 86% less training time than Baseline 1. More
significantly, the performance gain of the proposed F-DRL
• Baseline 1 (Centralized DRL [21]): All decisions of grows with the increase of K, while Baseline 1 cannot work
problem (5) are output by a centralized DQN. The global when K > 3. This is due to the fact that the global action
state is defined as stDQN = stG , the global action is stDQN increases exponentially with K. In contrast, F-DRL is
atDQN = {Θt , (otk , ptk )| ∀k ∈ K}, and the reward is set as robust to the changes in the number of robots. On the whole,
t
PK t t
rDQN = k=1 rL,k + rG /10 to prevent robot wandering. compared with Baseline 1, one can observe that our proposed
• Baseline 2 (OMA-RIS-based scheme [3]): In this scheme, F-DRL can converge faster and obtain higher rewards with
the orthogonal multiple access (OMA) is considered smaller fluctuations in the training process.
6

Fig. 4 demonstrates the trajectory of robots versus different [2] Y. Yan and Y. Mostofi, “To go or not to go: On energy-aware and
MR , where the performance of QoS-based energy efficiency communication-aware robotic operation,” IEEE Trans. Control Netw.
Syst., vol. 1, no. 3, pp. 218–231, Sep. 2014.
(EE) policy and QoE-based EE policy is compared. The [3] X. Mu, Y. Liu, L. Guo, J. Lin, and R. Schober, “Intelligent reflecting
parameters are set as NR = 4, NP = 6, N = 1 and K = 3. surface enhanced indoor robot path planning: A radio map-based ap-
The background in Fig. 4 reflects the communication quality proach,” IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4732–4747,
Jul. 2021.
of downlink channels. As expected, we find that the RIS [4] B. Di, H. Zhang, L. Song, Y. Li, Z. Han, and H. V. Poor, “Hybrid
enhances channel conditions, especially alleviating the severe beamforming for reconfigurable intelligent surface based multi-user
signal strength degradation caused by the walls. The QoS- communications: Achievable rates with limited discrete phase shifts,”
IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1809–1822, Aug. 2020.
based EE policy maintains better channel conditions rather [5] H. Yang, Z. Xiong, J. Zhao, D. Niyato, Q. Wu, H. V. Poor, and
than Baseline 3, especially when MR > 20. It is because that M. Tornatore, “Intelligent reflecting surface assisted anti-jamming com-
Baseline 3 cares more about the bad channel coefficients, while munications: A fast reinforcement learning approach,” IEEE Trans.
Wireless Commun., vol. 20, no. 3, pp. 1963–1974, Mar. 2021.
QoS-based EE policy cares more about the sum of channel [6] J. Park, S. Samarakoon, A. Elgabli, J. Kim, M. Bennis, S.-L. Kim,
conditions. Moreover, the result shows that QoS-based EE and M. Debbah, “Communication-efficient and distributed learning over
policy in the considered system can achieve higher energy wireless networks: Principles and applications,” Proc. IEEE, vol. 109,
no. 5, pp. 796–819, Feb. 2021.
efficiency, while the robot with the worst channel condition [7] W. Ni, Y. Liu, Z. Yang, H. Tian, and X. Shen, “Federated learning in
always maintains a required data rate in NOMA-based systems multi-RIS-aided systems,” IEEE Internet Things J., vol. 9, no. 12, pp.
under Baseline 3, because the logarithmic function is more 9608–9624, Jun. 2022.
[8] W. Ni, Y. Liu, Y. C. Eldar, Z. Yang, and H. Tian, “STAR-RIS integrated
sensitive to small data rate changes. non-orthogonal multiple access and over-the-air federated learning:
In Fig. 5, the energy efficiency under different environmen- Framework, analysis, and optimization,” IEEE Internet Things J., Jul.
tal parameters is illustrated. When NR = 4 and NP = 6, 2022, early access, doi: 10.1109/JIOT.2022.3188544.
[9] W. Ni, X. Liu, Y. Liu, H. Tian, and Y. Chen, “Resource allocation for
the energy efficiency is evaluated versus Pmax by changing multi-cell IRS-aided NOMA networks,” IEEE Trans. Wireless Commun.,
the number of robots K, multiple access technologies z ∈ vol. 20, no. 7, pp. 4253–4268, Jul. 2021.
{NOMA, OMA}, and the number of RIS elements MR . We [10] H. Yang, A. Alphones, Z. Xiong, D. Niyato, J. Zhao, and K. Wu,
“Artificial-intelligence-enabled intelligent 6G networks,” IEEE Netw.,
find that RIS is helpful to obtain higher energy efficiency. This vol. 34, no. 6, pp. 272–280, Nov. 2020.
is mainly because that the RIS can overcome signal blockage [11] H. Yang, Z. Xiong, J. Zhao, D. Niyato, L. Xiao, and Q. Wu, “Deep
by adjusting the radio environment. Meanwhile, when K = 3, reinforcement learning-based intelligent reflecting surface for secure
wireless communications,” IEEE Trans. Wireless Commun., vol. 20,
energy efficiency significantly increases with MR , and main- no. 1, pp. 375–388, Jan. 2020.
tains smaller improvement with 20 ≤ MR ≤ 30. Nevertheless, [12] Y. Nie, J. Zhao, F. Gao, and F. R. Yu, “Semi-distributed resource
the energy efficiency increases with 0 ≤ MR ≤ 30 when management in uav-aided mec systems: A multi-agent federated rein-
forcement learning approach,” IEEE Trans. Veh. Technol., vol. 70, no. 12,
K = 4. Such phenomenon reveals that there exists the suitable pp. 13 162–13 173, Dec. 2021.
transmit power budget Pmax and RIS elements MR satisfying [13] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint
the communication demands with lower values. Moreover, learning and communications framework for federated learning over
wireless networks,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp.
NOMA-RIS-based system gains higher energy efficiency than 269–283, Jan. 2021.
OMA-RIS-based benchmarks, because NOMA signals are su- [14] H. Zhang, B. Di, L. Song, and Z. Han, “Reconfigurable intelligent
perimposed in the same time-frequency resources and obtains surfaces assisted communications with limited phase shifts: How many
phase shifts are enough?” IEEE Trans. Veh. Technol., vol. 69, no. 4, pp.
enhanced bandwidth efficiency. In addition, fewer robots and 4498–4502, Apr. 2020.
smaller Pmax lead to lower energy efficiency. [15] W. Ni, Y. Liu, Z. Yang, H. Tian, and X. Shen, “Integrating over-the-air
federated learning and non-orthogonal multiple access: What role can
RIS play?” IEEE Trans. Wireless Commun., Jun. 2022, early access, doi:
V. C ONCLUSION 10.1109/TWC.2022.3181214.
[16] Y. Mei, Y.-H. Lu, Y. C. Hu, and C. G. Lee, “Deployment of mobile
We studied a long-term energy efficiency maximization robots with energy and timing constraints,” IEEE Trans. Robot., vol. 22,
no. 3, pp. 507–522, Jun. 2006.
problem of RIS-assisted indoor multi-robot systems. Through [17] X. Gao, Y. Liu, and X. Mu, “SLARM: Simultaneous localization and
training agents in a semi-distributed manner, we developed radio mapping for communication-aware connected robot,” in Proc. ICC
a novel methodology for the communication-aware design Workshops, Virtual, Jun. 2021, pp. 1–6.
[18] D. Rau, J. Rodina, and F. Štec, “Generating instant trajectory of an
problem by controlling the trajectory and downlink transmit indoor UAV with respect to its dynamics,” in Proc. ISMCR, Budapest,
power at local robots, and designing the RIS phase shifts at Hungary, Oct. 2020, pp. 1–5.
the AP. Owing to the decentralized nature of the proposed [19] X. Wang, C. Wang, X. Li, V. C. Leung, and T. Taleb, “Federated
deep reinforcement learning for internet of things with decentralized
F-DRL, the dynamics in such a multi-robot system can be cooperative edge caching,” IEEE Internet Things J., vol. 7, no. 10, pp.
well handled. Numerical simulations demonstrated that our 9441–9455, Apr. 2020.
designed F-DRL converges faster than the centralized method [20] R. Luo, H. Tian, and W. Ni, “Communication-aware path design for
indoor robots exploiting federated deep reinforcement learning,” in Proc.
and adapts to the changes in the number of robots, while PIMRC, Helsinki, Finland, Sept. 2021, pp. 1197–1202.
maintaining high performance in NOMA-RIS design. [21] T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement
learning for multiagent systems: A review of challenges, solutions, and
applications,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3826–3839, Sept.
R EFERENCES 2020.
[22] X. Liu, Y. Liu, Y. Chen, and H. V. Poor, “RIS enhanced massive non-
[1] M. Afrin, J. Jin, A. Rahman, A. Rahman, J. Wan, and E. Hossain, “Re- orthogonal multiple access networks: Deployment and passive beam-
source allocation and service provisioning in multi-agent cloud robotics: forming design,” IEEE J. Sel. Areas Commun., vol. 39, no. 4, pp. 1057–
A comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 23, no. 2, 1071, Apr. 2021.
pp. 842–870, 2nd Quart. 2021.

DRL For IoT Systems SI CFP IEEEIoTJournal2019
No ratings yet
DRL For IoT Systems SI CFP IEEEIoTJournal2019
1 page
ATAL FDP Brochure File
No ratings yet
ATAL FDP Brochure File
3 pages
III I ML Notes r22 Compressed
No ratings yet
III I ML Notes r22 Compressed
216 pages
Constrained Reinforcement Learning For Dynamic Material Handling
No ratings yet
Constrained Reinforcement Learning For Dynamic Material Handling
9 pages
Minimizing Energy Consumption in Mobile Robotics With STAR-RIS in Smart Factories
No ratings yet
Minimizing Energy Consumption in Mobile Robotics With STAR-RIS in Smart Factories
6 pages
CS 7 21CS731 QPModel 3402
No ratings yet
CS 7 21CS731 QPModel 3402
3 pages
204CS001 2
No ratings yet
204CS001 2
2 pages
Deep Reinforcement Learning For RIS-Aided Non-Orthogonal Multiple Access Downlink Networks
No ratings yet
Deep Reinforcement Learning For RIS-Aided Non-Orthogonal Multiple Access Downlink Networks
6 pages
Exaltitude
No ratings yet
Exaltitude
15 pages
DDQN 1
No ratings yet
DDQN 1
5 pages
Matecconf Icmite2017 00007
No ratings yet
Matecconf Icmite2017 00007
4 pages
ch01-SLIDE - (2) Data Communications and Networking by Behrouz A.Forouzan
100% (3)
ch01-SLIDE - (2) Data Communications and Networking by Behrouz A.Forouzan
18 pages
AI-AIML Question Bank
No ratings yet
AI-AIML Question Bank
6 pages
Machine Learning/Ai For Iot, M2M, and Computer Communication
No ratings yet
Machine Learning/Ai For Iot, M2M, and Computer Communication
3 pages
MP-DQNMulti-Pass Q-Networks For Deep Reinforcement Learning With Parameterised Action Spaces
No ratings yet
MP-DQNMulti-Pass Q-Networks For Deep Reinforcement Learning With Parameterised Action Spaces
8 pages
Energy Plus With RL
No ratings yet
Energy Plus With RL
112 pages
21CSL481
No ratings yet
21CSL481
3 pages
DRL Anti-Jamming for IoT Networks
No ratings yet
DRL Anti-Jamming for IoT Networks
6 pages
MARL For Networks
No ratings yet
MARL For Networks
25 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
BharatForge MidPrep InterIIT TechMeet
No ratings yet
BharatForge MidPrep InterIIT TechMeet
6 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
96%-1-Deep-Reinforcement-Learning-Based Semantic Navigation of Mobile Robots in Dynamic Environments
No ratings yet
96%-1-Deep-Reinforcement-Learning-Based Semantic Navigation of Mobile Robots in Dynamic Environments
6 pages
Review of Deep Reinforcement Learning Based Scheduling For Optimizing System Load and Response Time in Edge and Fog Computing Environments
No ratings yet
Review of Deep Reinforcement Learning Based Scheduling For Optimizing System Load and Response Time in Edge and Fog Computing Environments
2 pages
Evolution of NOMA Toward Next Generation Multiple Access NGMA
No ratings yet
Evolution of NOMA Toward Next Generation Multiple Access NGMA
7 pages
Atal FDP VJEC
No ratings yet
Atal FDP VJEC
3 pages
Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis On Reinforcing Successful Experiences
No ratings yet
Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis On Reinforcing Successful Experiences
8 pages
Deep-Learning-Based Resource Allocation For 6G NOMA-Assisted Backscatter Communications
No ratings yet
Deep-Learning-Based Resource Allocation For 6G NOMA-Assisted Backscatter Communications
10 pages
Sustainable Energy Technologies and Assessments: Jiwen Guan, Yanzhao Su, Ling Su, C.B. Sivaparthipan, Balaanand Muthu
No ratings yet
Sustainable Energy Technologies and Assessments: Jiwen Guan, Yanzhao Su, Ling Su, C.B. Sivaparthipan, Balaanand Muthu
8 pages
479 B.E. Computer Science Technology Scheme Syllabus
No ratings yet
479 B.E. Computer Science Technology Scheme Syllabus
1 page
Meta Federated Reinforcement Learning For Distributed Resource Allocation
No ratings yet
Meta Federated Reinforcement Learning For Distributed Resource Allocation
11 pages
A Survey of Reinforcement Learning For Optimization in Automation
No ratings yet
A Survey of Reinforcement Learning For Optimization in Automation
8 pages
Paper 062
No ratings yet
Paper 062
6 pages
Semisupervised Deep Reinforcement Learning in Support of IoT and Smart City Services
No ratings yet
Semisupervised Deep Reinforcement Learning in Support of IoT and Smart City Services
12 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
6 pages
Federated Learning in Robotics
No ratings yet
Federated Learning in Robotics
8 pages
METO
No ratings yet
METO
7 pages
SCH Wager 2017
No ratings yet
SCH Wager 2017
14 pages
Task-Oriented Communications For 6G: Vision, Principles, and Technologies
No ratings yet
Task-Oriented Communications For 6G: Vision, Principles, and Technologies
8 pages
Deep Reinforcement Learning Algorithms For Dynamic Pricing
No ratings yet
Deep Reinforcement Learning Algorithms For Dynamic Pricing
18 pages
Computer Networks Notes
67% (3)
Computer Networks Notes
23 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
9 pages
Reinforcement Learning For Cyber-Physical Systems: Xing Liu, Hansong Xu, Weixian Liao, and Wei Yu
No ratings yet
Reinforcement Learning For Cyber-Physical Systems: Xing Liu, Hansong Xu, Weixian Liao, and Wei Yu
10 pages
Soft Actor-Critic:: Off-Policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor
No ratings yet
Soft Actor-Critic:: Off-Policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor
14 pages
Multi-Objective Application Placement in Fog Compu
No ratings yet
Multi-Objective Application Placement in Fog Compu
23 pages
Deep Learning for Transport & Health
No ratings yet
Deep Learning for Transport & Health
9 pages
6G Federated Learning for Autonomous Cars
No ratings yet
6G Federated Learning for Autonomous Cars
12 pages
Internship Project.1
100% (1)
Internship Project.1
32 pages
Future Generation Computer Systems: Luca Bedogni Federico Chiariotti
No ratings yet
Future Generation Computer Systems: Luca Bedogni Federico Chiariotti
13 pages
1 en 19 Chapter Author
No ratings yet
1 en 19 Chapter Author
12 pages
Intelligent Trajectory Design For RIS-NOMA
No ratings yet
Intelligent Trajectory Design For RIS-NOMA
30 pages
Robotics For Manufacturing: A Michigan Robotics Focus Area
No ratings yet
Robotics For Manufacturing: A Michigan Robotics Focus Area
8 pages
Cooperative Formation Control of A Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning
No ratings yet
Cooperative Formation Control of A Multi-Agent Khepera IV Mobile Robots System Using Deep Reinforcement Learning
19 pages
Gupta Cognitive Mapping and CVPR 2017 Paper
No ratings yet
Gupta Cognitive Mapping and CVPR 2017 Paper
10 pages
DRL Optimization for CR-NOMA Networks
No ratings yet
DRL Optimization for CR-NOMA Networks
16 pages
RIS Assisted UAV D2D Communications
No ratings yet
RIS Assisted UAV D2D Communications
9 pages
Cooperative Computation Offloading and Resource Allocation For Blockchain-Enabled Mobile Edge Computing: A Deep Reinforcement Learning Approach
No ratings yet
Cooperative Computation Offloading and Resource Allocation For Blockchain-Enabled Mobile Edge Computing: A Deep Reinforcement Learning Approach
15 pages
Lecture 5 - ModelFreePrediction
No ratings yet
Lecture 5 - ModelFreePrediction
79 pages
Reconfigurable Intelligent Surface-Assisted
No ratings yet
Reconfigurable Intelligent Surface-Assisted
30 pages
Control Strategies For Physically Simulated Characters Performing Two Player Competitive Sports
No ratings yet
Control Strategies For Physically Simulated Characters Performing Two Player Competitive Sports
11 pages
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
25 pages
Joint Accuracy and Latency Optimization For Quantized Federated Learning in Vehicular Networks
No ratings yet
Joint Accuracy and Latency Optimization For Quantized Federated Learning in Vehicular Networks
15 pages
Ai 4 All
No ratings yet
Ai 4 All
18 pages
Dynamic Scheduling For Over-The-Air Federated Edge Learning With Energy Constraints
No ratings yet
Dynamic Scheduling For Over-The-Air Federated Edge Learning With Energy Constraints
16 pages
Deep Reinforcement Learning Based Transmission Scheduling For Sensing Aware Control
No ratings yet
Deep Reinforcement Learning Based Transmission Scheduling For Sensing Aware Control
15 pages
A Graph Neural Network Learning Approach To Optimize RIS-Assisted Federated Learning
No ratings yet
A Graph Neural Network Learning Approach To Optimize RIS-Assisted Federated Learning
16 pages
Path Planning For Automatic Berthing Using Ship-Ma
No ratings yet
Path Planning For Automatic Berthing Using Ship-Ma
16 pages
Lecture 161
No ratings yet
Lecture 161
46 pages
RIS-Assisted UAV For Timely Data Collection in
No ratings yet
RIS-Assisted UAV For Timely Data Collection in
12 pages
M .Ai A: A I 101 F W - C O AI A Vip Ai 101 C S
No ratings yet
M .Ai A: A I 101 F W - C O AI A Vip Ai 101 C S
22 pages
4.ED-DQN An Event-Driven Deep Reinforcement Learning Control Method For Multi-Zone Residential Buildings
No ratings yet
4.ED-DQN An Event-Driven Deep Reinforcement Learning Control Method For Multi-Zone Residential Buildings
17 pages
Relay-Assisted Federated Edge Learning
No ratings yet
Relay-Assisted Federated Edge Learning
17 pages
Sensors 23 05974
No ratings yet
Sensors 23 05974
15 pages
Federated DRL for Digital Twin Edge Networks
No ratings yet
Federated DRL for Digital Twin Edge Networks
12 pages
Dynamic Datasets and Market Environments For Financial Reinforcement Learning
No ratings yet
Dynamic Datasets and Market Environments For Financial Reinforcement Learning
49 pages
Adaptive Navigation in Collaborative Robots A Rein
No ratings yet
Adaptive Navigation in Collaborative Robots A Rein
20 pages
Product Brochure2
No ratings yet
Product Brochure2
22 pages
2023an Energy-Efficient Routing Protocol With Reinforcement Learning in Software-Defined Wireless Sensor Networks
No ratings yet
2023an Energy-Efficient Routing Protocol With Reinforcement Learning in Software-Defined Wireless Sensor Networks
22 pages
Federated Low-Rank Adaptation For Large Models Fine-Tuning Over Wireless Networks
No ratings yet
Federated Low-Rank Adaptation For Large Models Fine-Tuning Over Wireless Networks
17 pages
Sensors 23 08041 v3
No ratings yet
Sensors 23 08041 v3
17 pages
Urban AV Platoon Control Strategy
No ratings yet
Urban AV Platoon Control Strategy
16 pages
Ai 4 All
No ratings yet
Ai 4 All
32 pages
RP Publish
No ratings yet
RP Publish
17 pages
Progress
No ratings yet
Progress
30 pages
Computation 12 00116
No ratings yet
Computation 12 00116
17 pages
1.machine Learning For User Partitioning and
No ratings yet
1.machine Learning For User Partitioning and
30 pages
Deep Reinforcement Learning MultiAgent System For Resource Allocation in Industrial Internet of ThingsSensors
No ratings yet
Deep Reinforcement Learning MultiAgent System For Resource Allocation in Industrial Internet of ThingsSensors
23 pages
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
No ratings yet
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
14 pages
Collision Avoidance Using RL
No ratings yet
Collision Avoidance Using RL
19 pages
Federated Learning For Future Intelligent Wireless Networks Yao Sun Updated 2025
No ratings yet
Federated Learning For Future Intelligent Wireless Networks Yao Sun Updated 2025
153 pages
Energies 15 06392 With Cover
No ratings yet
Energies 15 06392 With Cover
38 pages
Electronics 14 01686 v2
No ratings yet
Electronics 14 01686 v2
23 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
16 pages
ML - Module 2
No ratings yet
ML - Module 2
41 pages
Mathematics 11 02376 v2
No ratings yet
Mathematics 11 02376 v2
23 pages
Accelerating DNN Training in Wireless Federated Edge Learning
No ratings yet
Accelerating DNN Training in Wireless Federated Edge Learning
30 pages
1877677750
No ratings yet
1877677750
128 pages
Papr 4
No ratings yet
Papr 4
41 pages
RIS-NOMA Networks: Deployment & Beamforming
No ratings yet
RIS-NOMA Networks: Deployment & Beamforming
30 pages
Towards Smart Wireless Communications Via Intelligent Reflecting Surfaces: A Contemporary Survey
No ratings yet
Towards Smart Wireless Communications Via Intelligent Reflecting Surfaces: A Contemporary Survey
33 pages
Marco Baglietto, Giorgio Cannata, Francesco Capezio, Antonio Sgorbissa (Auth.), Hajime Asama, Haruhisa Kurokawa, Jun Ota, Kosuke Sekiyama (Eds.) - Distributed Autonomous Robotic Systems 8 (2009, Springer
No ratings yet
Marco Baglietto, Giorgio Cannata, Francesco Capezio, Antonio Sgorbissa (Auth.), Hajime Asama, Haruhisa Kurokawa, Jun Ota, Kosuke Sekiyama (Eds.) - Distributed Autonomous Robotic Systems 8 (2009, Springer
581 pages
Explainable Machine Learning For Task Planning in Robotics: Thomas Leech
No ratings yet
Explainable Machine Learning For Task Planning in Robotics: Thomas Leech
73 pages
Swarm Robotics PDF
100% (1)
Swarm Robotics PDF
312 pages
DRL For WSN Book
No ratings yet
DRL For WSN Book
78 pages

Federated Deep Reinforcement Learning For RIS-Assisted

Uploaded by

Federated Deep Reinforcement Learning For RIS-Assisted

Uploaded by

1

Federated Deep Reinforcement Learning for RIS-Assisted

ronment caused by robot mobility. To address these issues, we

is the first semi-distributed F-DRL algorithm that combines AP

aware trajectory design. The time-coupling and discrete qD , K Robot K

problem with a time-coupling objective. Thus, it is necessary ĸ

Algorithm 1 Proposed F-DRL Approach TABLE I: Parameter Settings

between the robots and the AP. The entire bandwidth

You might also like