Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views6 pages

Federated Deep Reinforcement Learning For RIS-Assisted

Uploaded by

tahirnaquash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Federated Deep Reinforcement Learning For RIS-Assisted

Uploaded by

tahirnaquash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1

Federated Deep Reinforcement Learning for RIS-Assisted


Indoor Multi-Robot Communication Systems
Ruyu Luo, Wanli Ni, Graduate Student Member, IEEE, Hui Tian, Senior Member, IEEE,
and Julian Cheng, Senior Member, IEEE

Abstract—Indoor multi-robot communications face two key as the sudden collision, efficiency reduction and operation
challenges: one is the severe signal strength degradation caused restriction. To avoid these potential problems, the reconfig-
by blockages (e.g., walls) and the other is the dynamic envi- urable intelligent surface (RIS) can be deployed to create a
arXiv:2207.08056v1 [cs.RO] 17 Jul 2022

ronment caused by robot mobility. To address these issues, we


consider the reconfigurable intelligent surface (RIS) to overcome smart propagation environment in an enclosed room [4], while
the signal blockage and assist the trajectory design among reducing the hardware cost and system complexity compared
multiple robots. Meanwhile, the non-orthogonal multiple access with active relays [5]. On the other hand, due to the simulta-
(NOMA) is adopted to cope with the scarcity of spectrum neous motion of multiple robots, the traditional deterministic
and enhance the connectivity of robots. Considering the limited strategy is challenging to maintain satisfactory performance
battery capacity of robots, we aim to maximize the energy
efficiency by jointly optimizing the transmit power of the access of such a highly dynamic system [6]. Furthermore, the non-
point (AP), the phase shifts of the RIS, and the trajectory of orthogonal multiple access (NOMA) has been deemed as
robots. A novel federated deep reinforcement learning (F-DRL) a promising technique for enhancing the robot connectivity
approach is developed to solve this challenging problem with and throughput under limited spectrum resources [7], [8]. By
one dynamic long-term objective. Through each robot planning superimposing user signals in different power levels, it is
its path and downlink power, the AP only needs to determine
the phase shifts of the RIS, which can significantly save the of great significance to jointly optimize the power allocation
computation overhead due to the reduced training dimension. for interference reduction in NOMA networks [9], while the
Simulation results reveal the following findings: i) the proposed incorporation of mobile robots and RIS leads to a challenging
F-DRL can reduce at least 86% convergence time compared to energy efficiency maximization problem.
the centralized DRL; ii) the designed algorithm can adapt to the Recently, artificial intelligence has played a critical role
increasing number of robots; iii) compared to traditional OMA-
based benchmarks, NOMA-enhanced schemes can achieve higher in realizing smart resource management and automatic net-
energy efficiency. work control in 6G networks [10]. To deal with the un-
certainty and dynamics, deep reinforcement learning (DRL)
Index Terms—Federated deep reinforcement learning, indoor
robot communication, reconfigurable intelligent surface. is acknowledged as a promising method with a high level
of intelligence in wireless communications [11]. However,
the ever-increasing network scale brings huge communication
I. I NTRODUCTION overhead and unbearable training delay to centralized methods.
WING to their prominent features of flexible deploy-
O ment and high efficiency, intelligent robots have gained
widespread popularity and large-scale implementation in in-
To speed up training and leverage computing capabilities at
the network edge, an innovative paradigm is to implement
DRL in a federated manner [12], which can protect user
door environments, e.g., healthcare surveillance, packet de- privacy and alleviate traffic transmission by only exchanging
livery, house cleaning and automatic industrial production parameters over wireless networks. However, the quality of
[1]. So far, it is still impractical to deploy all intelligent federated training is affected by the channel conditions with all
applications on mobile indoor robots with limited resources training parameters transmitted over wireless networks, thus
such as computing, storage, and batteries [2]. Besides, indoor the wireless network needs to be reliable over the limited
environment presents several challenges in designing energy- spectrum and power resources [13]. Meanwhile, the distributed
efficient trajectories for robots. On the one hand, the line- method may obtain a worse solution due to the loss of global
of-sight (LoS) paths may be severely shields by obstacles information. Therefore, it is necessary to develop an intelligent
that are likely to have non-analytic shapes [3]. The resulting method to maximize energy efficiency in dynamic RIS-assisted
signal strength degradation can lead to undesirable effects such wireless systems.
In this paper, we focus on the energy efficiency problem of
Copyright (c) 2015 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be an RIS-assisted indoor system having multiple mobile robots.
obtained from the IEEE by sending a request to [email protected]. By jointly optimizing the transmit power at the AP, the phase
This work was supported by the National Natural Science Foundation of shifts of the RIS, and the trajectory of robots, a time-coupling
China under Grant 62071068. (Corresponding author: Hui Tian.)
R. Luo, W. Ni and H. Tian are with the State Key Laboratory of resource allocation problem is formulated. Considering the
Networking and Switching Technology, Beijing University of Posts and trade-off between performance and scalability, a federated
Telecommunications, Beijing 100876, China (e-mail: [email protected]; deep reinforcement learning (F-DRL) approach is proposed,
[email protected]; [email protected]).
J. Cheng is with the School of Engineering, The University of British which can accelerate convergence and is robust to the num-
Columbia, Kelowna, BC V1V 1V7, Canada (email: [email protected]). ber of robots. To the best of the authors’ knowledge, this
2

is the first semi-distributed F-DRL algorithm that combines AP


Sub-surface
the centralized RIS configuration with the federated robotic
RIS element
communications. The main contributions of this paper can be RIS controller
summarized as follows: NLoS link h A,k h A,1 h A, K g
LoS link
1) We incorporate RIS into indoor robot communication sys- Obstacle
tems to overcome signal blockage and avoid motion col- Destination
qS , k qD,1
lision. For the maximized energy efficiency of all robots, Starting point hR , K
hR ,k
a non-convex problem is formulated for communication- hR,1
RIS

aware trajectory design. The time-coupling and discrete qD , K Robot K


nature make this problem challenging to solve directly. Robot k
2) We develop an F-DRL method to optimize the AP trans- qS ,1 Robot 1 qD , k qS , K

mit power, RIS phase shifts, and robot trajectory in a Fig. 1: RIS-assisted indoor multi-robot communications
semi-distributed manner. The reduction of control dimen-
sion greatly accelerates the convergence at the training where sk is the transmit symbol for the k-th robot, pk > 0
stage. Benefiting from the decentralized implementation, is the downlink power allocated to the k-th robot, and nk ∼
F-DRL can easily adapt to changes in the robot number. CN (0, σ 2 ) is the additive white Gaussian noise.
3) We conduct numerical experiments to show the superior- To alleviate the interference among robots, we apply the
ity of the proposed F-DRL. Compared to the centralized successive interference cancellation (SIC) technique. Without
DRL, our method takes about 86% less training time and loss of optimality, the channel coefficients of all robots are
is more robust to the dynamic multi-robot environment. ranked by |hK | ≤ · · · ≤ |h2 | ≤ |h1 |. Then, to perform
Simulation results also show that the designed F-DRL can SIC successfully, the transmit power at the AP satisfies the
outperform benchmarks in terms of energy efficiency. following constraint:
2
Xk−1 2
∆k = pk |hk−1 | − pi |hk−1 | ≥ ρmin , ∀k ≥ 2, (2)
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION i=1
where ρmin > 0 is the required gap to distinguish the decoded
A. System Model signal. When the above power constraint is met, the achievable
As illustrated in Fig. 1, we consider an indoor multi-robot downlink data rate at the k-th robot can be obtained by
!
communication system aided by an RIS having M passive 2
|hk | pk
reflecting elements. Using downlink NOMA techniques, the Rk = log2 1 + 2 Pk−1
, ∀k. (3)
|hk | 2
i=1 pi + σ
AP serves K mobile robots1 , denoted by K = {1, 2, . . . , K}.
Since the energy consumed by motion is much larger than
To complete given tasks, we require the k-th robot to move
that consumed by communication, this paper mainly focuses
from a starting position qS,k to a destination qD,k within a
on the motion energy cost. Therefore, the total motion energy
given deadline Tmax . We define qtk as the position of the k-
consumed by the k-th robot is expressed as [2]
th robot at the t-th time slot, where t ∈ Tk = {1, 2, . . . , Tk }
and Tk ≤ Tmax is the total traveling time at the speed v. Ek = E1 Tk v + E2 Tk , ∀k, (4)
For brevity, the time index t is omitted in some parameters. where E1 and E2 are two constants related to the mechanical
We assume that robots update the trajectory each time slot. output power and the transforming loss, respectively [16].
The RIS is divided into N sub-surfaces, denoted by N = Their values depend on the exact robot motion model.
{1, 2, . . . , N }. Let θn ∈ [0, 2π) denote the phase shift of the
n-th sub-surface. Then, the RIS reflection matrix is denoted B. Problem Formulation
by Θ = diag(θN ×1 ⊗ 1(M/N )×1 ) = diag(ejθ1 , . . . , ejθM ) ∈ By optimizing the transmit power at the AP, the phase shifts
CM ×M with M = MR × MR , while MR is the element of the RIS, and the trajectory of robots, this paper aims to
number in the vertical or horizontal direction. In view of maximize the total energy efficiency of all robots during the
the hardware implementation, we consider the practical RIS mission. Subject to the constraints of transmit power, phase
with limited NR = 2b phase shifts [14], where θm ∈ R = shifts and robot mobility, a long-term optimization problem is
{ 12 ∆R , . . . , 2NR2 −1 ∆R }, ∀m, and ∆R = 2π/NR is the phase formulated as
resolution [15]. 1 XTk XK Rkt
Let h̄k ∈ C1×1 , hk ∈ C1×M and g ∈ CM ×1 denote the max (5a)
Θ,Q,p Tk t=1 k=1 Ek
channel coefficients from the AP to the k-th robot, from the s.t. q1k = qS,k , qTk k = qD,k , ∀k, (5b)
RIS to the k-th robot, from the AP to the RIS, respectively.
Then the combined channel coefficient experienced by the k- |htK | ≤ · · · ≤ |ht2 | ≤ |ht1 |, ∀t, (5c)
th robot is given by hk = hk Θg + h̄k . Thus, the received ∆tk ≥ ρmin , ptk > 0, ∀k, ∀t, (5d)
signal at the k-th robot is given by xmin ≤ xtk ≤ xmax , ∀k, ∀t, (5e)
√ X √
yk = hk pk sk + hk pi si + nk , ∀k, (1) ymin ≤ ykt ≤ ymax , ∀k, ∀t, (5f)
i6=k
XK
ptk ≤ Pmax , ∀t, (5g)
1 Withthe results obtained in this paper, the considered system can be easily k=1
extended to multi-antenna cases, which will be included in our future work. θnt ∈ R, ∀n, ∀t, (5h)
3

where Q = [q1 , q2 , . . . , qK ]T denotes the trajectory design of Global state sG Local state sL,1 Local state sL,K
all robots and p = [p1 , p2 , . . . , pK ]T is the power allocation ķ
strategy at the AP. However, the formulated problem (5) is Global decision stage Local
Experience decision
difficult to be solved by existing optimization methods and is Loss function replay memory
stage
Experience replay memory L(w L,1 )
also challenging to be solved optimally, due to the following
ˆ L,1 = w L,1 Experience
reasons. First, multiple optimization variables, {Θ, Q, p}, are Loss function
Gradient
descent
update
w L,1
w
replay
L(w G ) For every
memory
closely coupled in the objective function (5a). Second, the Q-network N Q time steps
Gradient
achievable data rate Rkt is not a continuous function due to Q( sL,1 , aL,1 ; w L,1 ) Fixed Q-targets
descent ˆ G = wG
w Robot 1

wG
the discrete phase shifts and the position-dependent channel update
Q-network
For every
coefficients. Third, the simultaneous motion of multiple robots Q-network N Q time steps
For every
N F time steps
Q( sL, K , aL, K ; w L, K ) Robot K

also makes problem (5) hard to solve even if only the subprob- Q( sG , aG ; w G ) ˆ L,1
w L,1 , w ˆL
wL , w ˆL
wL , w ˆ L, K
w L, K , w
NOMA
lem of trajectory design is considered. To sum up, traditional decoding AP w L = å k =1
K w L,k K
ˆ L = å k =1
,w
ˆ L,k
w
.
K K
one-shot optimization methods do not apply to this dynamic RIS phase shifts
Influence
order

problem with a time-coupling objective. Thus, it is necessary ĸ


Reward Action Robot trajectory & AP transmit power
to develop an intelligent method to address this challenging rG aG
Ĺ Action aL,k for "k ĺ Reward rL,k for "k
problem in an efficient manner.
RIS-assisted Indoor multi-robot communication system

III. P ROPOSED F-DRL A PPROACH Fig. 2: Proposed F-DRL approach for communication-aware
In this section, we develop an F-DRL approach that is trajectory design
capable of accelerating the training process and obtaining high
performance in terms of energy efficiency. As shown in Fig. function, because it is necessary for NOMA to maintain
2, the F-DRL approach is split into two stages: the global the distinctness among different signals.
stage for RIS configuration and the local stage for joint robot B. Local Decision Stage
trajectory and transmit power control. At the local decision stage, each robot determines its trajec-
A. Global Decision Stage tory and downlink transmit power with local state information.
At the global decision stage, the AP adjusts the RIS config- Because the control dimension of centralized DRL multiplies
uration with global state information. Specifically, we define with the increase of robots, we propose to train robots locally
the phase shift design problem as a Markov decision process and then aggregate a global model in a federated manner. The
(MDP), denoted by a transition tuple having three elements: MDP of local transition tuple hSL,k , AL,k , RL,k i maintained
hSG , AG , RG i, where SG is the state space, AG is the action by the k-th robot is defined as follows.
t
• State space: Let sL,k ∈ SL,k , ∀t. Then, the local state is
space, and RG is the reward.
t defined as
• State space: Let sG ∈ SG , ∀t. Since the combined
stL,k = qtk , h̄A,k
t

channel coefficients (hk )k∈K remain unknown before , ∀k, ∀t, (9)
the RIS phase shifts where the local state stL,k is a part of the global state stG .
 are designed, the coefficients of
AP-robot links h̄k k∈K are considered as the channel • Action space: Let atL,k ∈ AL,k , ∀k, ∀t. Then, the local
features. Thus, the global state is defined as action for trajectory design and transmit power control is
stG = qtk , h̄tk | ∀k ∈ K , ∀t,
 
(6) defined as
atL,k = otk , ptk , ∀k, ∀t,

(10)
where the position qtk can be obtained by the simultane-
ous localization and mapping algorithm [17]. Meanwhile, where the k-th robot orientation ok ∈ {n, s, e, w} intends
the continuous 2D map is discretized into grids with the that robots move in four directions, i.e., north, south,
length of ∆S , while sampling positions are in the center east or west. To satisfy constraints in (5c), (5d) and
of each grid and satisfy constraints in (5e) and (5f).2 (5g), the first robot must guarantee p1 < Pmax /2K−1 .
• Action space: Let atG ∈ AG , ∀t. Then, the action for RIS Inspired by the discrete power control, we have pk ∈
phase shift design is defined as {Pmax /2, . . . , Pmax /2NP } and NP ≥ K.
atG = θnt | ∀n ∈ N , ∀t,

(7) • Reward: To maximize the energy efficiency, we define
t
the local reward rL,k as
where θn ∈ R is the discrete phase shift adopted by the t
n-th RIS sub-surface. rL,k = φRkt + ψRD,k
t
+ Rtime + Rgoal , ∀k, ∀t, (11)
t t−1 t
• Reward: With the aim of maximizing the sum rate, the where the guidance reward = − RD,kfor dD,k dD,k
reward is defined as t
XK t ≥ 2 and dD,k is the distance between the k-th robot
t
rG = τ1 Rkt , ∀t, (8) and its destination at the t-th time slot. The guidance
k=1 t
t reward RD,k leads the k-th robot to reach its destination.
where τ1 is a constant. Let rG < 0 to avoid robots
Moreover, the time cost Rtime is a constant and Rtime <
wondering. Additionally, it is inappropriate to put the sum
0. If the k-th robot arrives at its destination, it will gain a
of combined channel coefficients (hk )k∈K into the reward
positive reward Rgoal ; otherwise we have Rgoal = 0. In
2 Using the default track curve [18], the discrete sampling points can be this paper, the parameter φ must guarantee Rtime +φRkt <
reconstructed into continuous curves. 0 in most cases to prevent robots from wandering.
4

Algorithm 1 Proposed F-DRL Approach TABLE I: Parameter Settings


1: Initialize the environment E and DQN agents. Parameter Value Parameter Value
2: for episode e = 1 : Ne do ∆S 0.5 m α 0.0001
3: for time step t = 1 : Te do |D0 | 128 NF 25
4: Select RIS phase shifts atG with QG (stG , atG ; wG t
). Rtime −1 Rgoal {0, 100}
t t t
5: Robot k∈K selects aL,k with QL,k (sL,k , aL,k ; wL,k ). τ1 0.1 v 0.5 m/s
6: for DQN agent i ∈ K ∪ {kG } do E1 7.4 E2 0.29
7: Execute action, get reward and reach next state.
8: Store the transition in reply memory Di .
9: Sample random mini-batch Di,0 from Di .
10: Perform the gradient descent step to update wit .
11: Reset ŵit = wit every NQ time steps.
12: end for
13: for each robot k ∈ K do
t t
14: Upload wL,k ,ŵL,k to the AP every NF time steps.
15: if receive new global weights wLt , ŵLt then
t
16: Download weights wL,k = wLt ,ŵL,k
t
= ŵLt .
17: end if
18: end for
19: AP aggregates global weights every NF time steps.
20: end for
21: end for
Fig. 3: Convergence comparison versus episodes

C. Global Aggregation (4) Experience storage: agents obtain rewards and store tran-
Take training deep Q-network (DQN) as an example. All sitions. Algorithm 1 shows the detailed training procedure of
agents collaboratively build a shared DQN, where the replay the proposed F-DRL approach. On account of the interaction
memory D and -greedy policy are considered. For each DQN between the local agents and the global agent, the proposed
agent i ∈ K ∪ {kG }, the online Q-network and the target F-DRL approach operates in a semi-distributed manner.
Q-network are defined as Q(sti , ati ; wit ) and Q(sti , ati ; ŵit ), 2) Complexity Analysis: By reducing the control dimen-
respectively. To update the online Q-network, each agent sion, the complexity of F-DRL is lower than that of cen-
performs the gradient descent step with a learning rate α > 0 tralized learning. More precisely, the complexity for DQN
on the loss function. Meanwhile, the target Q-network reset using one-dimensional replay memory is O(1). The com-
ŵit = wit every NQ time steps. putational complexity of each agent mainly depends on the
Besides, the k-th robot trains networks locally and uploads transition and back-propagation, which can be calculated by
relevant weights wL,k , ŵL,k every NF time steps during O (|D| + abE|D0 |), where a, b and E denote the number of
the local decision stage. At each aggregation step, all robots layers, the transitions in each layer and the number of episodes,
upload local weights to the AP at the t-th time slot, and the respectively. Moreover, the action space size of F-DRL at
AP aggregates the global weights wLt and ŵLt as the global and local decision stage are (NR )N and (4NP )K ,
1 XK 1 XK respectively, but that of centralized DRL is (4NP )K ×(NR )N .
wLt = t
wL,k , ŵLt = t
ŵL,k , ∀k, ∀t. (12) Therefore, the proposed F-DRL has a lower complexity as
K k=1 K k=1
Then, the updated global weights are sent back to local robots compared to centralized DRL. The theoretical analysis of F-
at the next time step until convergence. DRL convergence has been completed in [19]. A detailed
Compared to traditional optimization algorithms, the pro- proof is omitted here for brevity. In the following, we conduct
posed intelligent approach can adapt to the uncertainty and experiments to show the convergence behavior of F-DRL.
dynamics of indoor systems. Moreover, due to the semi-
distributed training and decentralized execution, the proposed
IV. N UMERICAL R ESULTS
F-DRL approach can significantly reduce the communication
overhead and effectively alleviate privacy leakage. In this section, we verify the efficiency and robustness of the
1) Overall Training Methodology: As shown in Fig. 2, the proposed F-DRL approach for the considered communication
proposed F-DRL approach has four steps. (1) State observa- system. In the simulation, the robots are randomly located,
tion: agents observe the environmental states. (2) RIS action while the AP and the RIS are located at (15, 30, 2) and
execution: the AP controls RIS phase shifts atG according to (30, 7.5, 2), respectively. The maximum transmit power of the
QG (sG , aG ; wG ) obtained at the global decision stage, and AP is Pmax = 20 dBm and the noise power spectral density
determines the NOMA decoding order. (3) Robot action exe- is N0 = −100 dBm/Hz. The channel model is the same as
cution: the k-th robot decides its action atL,k of the orientation the settings in [20]. Other parameters are given in Table I. For
and downlink transmit power based on QL,k (sL,k , aL,k ; wL,k ). comparison, we consider the following baselines:
5

(a) MR = 0 (b) MR = 20

(c) MR = 30 (d) MR = 40

Fig. 4: Trajectory of robots under different values of MR , where the red, blue and yellow points denote the robot trajectory
using QoS-based energy efficiency (EE) policy, and the black markers denote the trajectory using QoE-based EE policy.

between the robots and the AP. The entire bandwidth


is equally divided by robots, and Rk = 1/K log2 (1 +
|hk |2 pk
σ 2 /K ) is the downlink data rate of the k-th robot.
• Baseline 3 (QoE-based energy efficiency policy [22]): Us-
ing the quality of experience (QoE) metric to evaluate the
performance of each robot, we have ηk = C1 lg(Rk )+C2 ,
where C1 and C2 are constants. Meanwhile, we replace
Rk with ηk in the reward returned back to each agent.
In Fig. 3, the convergence performance of the proposed
F-DRL is shown, where the total rewards versus training
episodes under different schemes are compared. We consider
the system with MR = 30, NR = 4, NP = 6, N = 1 but
different K. When K = 2, we find the proposed F-DRL
Fig. 5: Energy efficiency versus power budget Pmax takes at least 86% less training time than Baseline 1. More
significantly, the performance gain of the proposed F-DRL
• Baseline 1 (Centralized DRL [21]): All decisions of grows with the increase of K, while Baseline 1 cannot work
problem (5) are output by a centralized DQN. The global when K > 3. This is due to the fact that the global action
state is defined as stDQN = stG , the global action is stDQN increases exponentially with K. In contrast, F-DRL is
atDQN = {Θt , (otk , ptk )| ∀k ∈ K}, and the reward is set as robust to the changes in the number of robots. On the whole,
t
PK t t
rDQN = k=1 rL,k + rG /10 to prevent robot wandering. compared with Baseline 1, one can observe that our proposed
• Baseline 2 (OMA-RIS-based scheme [3]): In this scheme, F-DRL can converge faster and obtain higher rewards with
the orthogonal multiple access (OMA) is considered smaller fluctuations in the training process.
6

Fig. 4 demonstrates the trajectory of robots versus different [2] Y. Yan and Y. Mostofi, “To go or not to go: On energy-aware and
MR , where the performance of QoS-based energy efficiency communication-aware robotic operation,” IEEE Trans. Control Netw.
Syst., vol. 1, no. 3, pp. 218–231, Sep. 2014.
(EE) policy and QoE-based EE policy is compared. The [3] X. Mu, Y. Liu, L. Guo, J. Lin, and R. Schober, “Intelligent reflecting
parameters are set as NR = 4, NP = 6, N = 1 and K = 3. surface enhanced indoor robot path planning: A radio map-based ap-
The background in Fig. 4 reflects the communication quality proach,” IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4732–4747,
Jul. 2021.
of downlink channels. As expected, we find that the RIS [4] B. Di, H. Zhang, L. Song, Y. Li, Z. Han, and H. V. Poor, “Hybrid
enhances channel conditions, especially alleviating the severe beamforming for reconfigurable intelligent surface based multi-user
signal strength degradation caused by the walls. The QoS- communications: Achievable rates with limited discrete phase shifts,”
IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1809–1822, Aug. 2020.
based EE policy maintains better channel conditions rather [5] H. Yang, Z. Xiong, J. Zhao, D. Niyato, Q. Wu, H. V. Poor, and
than Baseline 3, especially when MR > 20. It is because that M. Tornatore, “Intelligent reflecting surface assisted anti-jamming com-
Baseline 3 cares more about the bad channel coefficients, while munications: A fast reinforcement learning approach,” IEEE Trans.
Wireless Commun., vol. 20, no. 3, pp. 1963–1974, Mar. 2021.
QoS-based EE policy cares more about the sum of channel [6] J. Park, S. Samarakoon, A. Elgabli, J. Kim, M. Bennis, S.-L. Kim,
conditions. Moreover, the result shows that QoS-based EE and M. Debbah, “Communication-efficient and distributed learning over
policy in the considered system can achieve higher energy wireless networks: Principles and applications,” Proc. IEEE, vol. 109,
no. 5, pp. 796–819, Feb. 2021.
efficiency, while the robot with the worst channel condition [7] W. Ni, Y. Liu, Z. Yang, H. Tian, and X. Shen, “Federated learning in
always maintains a required data rate in NOMA-based systems multi-RIS-aided systems,” IEEE Internet Things J., vol. 9, no. 12, pp.
under Baseline 3, because the logarithmic function is more 9608–9624, Jun. 2022.
[8] W. Ni, Y. Liu, Y. C. Eldar, Z. Yang, and H. Tian, “STAR-RIS integrated
sensitive to small data rate changes. non-orthogonal multiple access and over-the-air federated learning:
In Fig. 5, the energy efficiency under different environmen- Framework, analysis, and optimization,” IEEE Internet Things J., Jul.
tal parameters is illustrated. When NR = 4 and NP = 6, 2022, early access, doi: 10.1109/JIOT.2022.3188544.
[9] W. Ni, X. Liu, Y. Liu, H. Tian, and Y. Chen, “Resource allocation for
the energy efficiency is evaluated versus Pmax by changing multi-cell IRS-aided NOMA networks,” IEEE Trans. Wireless Commun.,
the number of robots K, multiple access technologies z ∈ vol. 20, no. 7, pp. 4253–4268, Jul. 2021.
{NOMA, OMA}, and the number of RIS elements MR . We [10] H. Yang, A. Alphones, Z. Xiong, D. Niyato, J. Zhao, and K. Wu,
“Artificial-intelligence-enabled intelligent 6G networks,” IEEE Netw.,
find that RIS is helpful to obtain higher energy efficiency. This vol. 34, no. 6, pp. 272–280, Nov. 2020.
is mainly because that the RIS can overcome signal blockage [11] H. Yang, Z. Xiong, J. Zhao, D. Niyato, L. Xiao, and Q. Wu, “Deep
by adjusting the radio environment. Meanwhile, when K = 3, reinforcement learning-based intelligent reflecting surface for secure
wireless communications,” IEEE Trans. Wireless Commun., vol. 20,
energy efficiency significantly increases with MR , and main- no. 1, pp. 375–388, Jan. 2020.
tains smaller improvement with 20 ≤ MR ≤ 30. Nevertheless, [12] Y. Nie, J. Zhao, F. Gao, and F. R. Yu, “Semi-distributed resource
the energy efficiency increases with 0 ≤ MR ≤ 30 when management in uav-aided mec systems: A multi-agent federated rein-
forcement learning approach,” IEEE Trans. Veh. Technol., vol. 70, no. 12,
K = 4. Such phenomenon reveals that there exists the suitable pp. 13 162–13 173, Dec. 2021.
transmit power budget Pmax and RIS elements MR satisfying [13] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint
the communication demands with lower values. Moreover, learning and communications framework for federated learning over
wireless networks,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp.
NOMA-RIS-based system gains higher energy efficiency than 269–283, Jan. 2021.
OMA-RIS-based benchmarks, because NOMA signals are su- [14] H. Zhang, B. Di, L. Song, and Z. Han, “Reconfigurable intelligent
perimposed in the same time-frequency resources and obtains surfaces assisted communications with limited phase shifts: How many
phase shifts are enough?” IEEE Trans. Veh. Technol., vol. 69, no. 4, pp.
enhanced bandwidth efficiency. In addition, fewer robots and 4498–4502, Apr. 2020.
smaller Pmax lead to lower energy efficiency. [15] W. Ni, Y. Liu, Z. Yang, H. Tian, and X. Shen, “Integrating over-the-air
federated learning and non-orthogonal multiple access: What role can
RIS play?” IEEE Trans. Wireless Commun., Jun. 2022, early access, doi:
V. C ONCLUSION 10.1109/TWC.2022.3181214.
[16] Y. Mei, Y.-H. Lu, Y. C. Hu, and C. G. Lee, “Deployment of mobile
We studied a long-term energy efficiency maximization robots with energy and timing constraints,” IEEE Trans. Robot., vol. 22,
no. 3, pp. 507–522, Jun. 2006.
problem of RIS-assisted indoor multi-robot systems. Through [17] X. Gao, Y. Liu, and X. Mu, “SLARM: Simultaneous localization and
training agents in a semi-distributed manner, we developed radio mapping for communication-aware connected robot,” in Proc. ICC
a novel methodology for the communication-aware design Workshops, Virtual, Jun. 2021, pp. 1–6.
[18] D. Rau, J. Rodina, and F. Štec, “Generating instant trajectory of an
problem by controlling the trajectory and downlink transmit indoor UAV with respect to its dynamics,” in Proc. ISMCR, Budapest,
power at local robots, and designing the RIS phase shifts at Hungary, Oct. 2020, pp. 1–5.
the AP. Owing to the decentralized nature of the proposed [19] X. Wang, C. Wang, X. Li, V. C. Leung, and T. Taleb, “Federated
deep reinforcement learning for internet of things with decentralized
F-DRL, the dynamics in such a multi-robot system can be cooperative edge caching,” IEEE Internet Things J., vol. 7, no. 10, pp.
well handled. Numerical simulations demonstrated that our 9441–9455, Apr. 2020.
designed F-DRL converges faster than the centralized method [20] R. Luo, H. Tian, and W. Ni, “Communication-aware path design for
indoor robots exploiting federated deep reinforcement learning,” in Proc.
and adapts to the changes in the number of robots, while PIMRC, Helsinki, Finland, Sept. 2021, pp. 1197–1202.
maintaining high performance in NOMA-RIS design. [21] T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement
learning for multiagent systems: A review of challenges, solutions, and
applications,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3826–3839, Sept.
R EFERENCES 2020.
[22] X. Liu, Y. Liu, Y. Chen, and H. V. Poor, “RIS enhanced massive non-
[1] M. Afrin, J. Jin, A. Rahman, A. Rahman, J. Wan, and E. Hossain, “Re- orthogonal multiple access networks: Deployment and passive beam-
source allocation and service provisioning in multi-agent cloud robotics: forming design,” IEEE J. Sel. Areas Commun., vol. 39, no. 4, pp. 1057–
A comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 23, no. 2, 1071, Apr. 2021.
pp. 842–870, 2nd Quart. 2021.

You might also like