Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views35 pages

Fed Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views35 pages

Fed Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Federated Online Prediction from Experts

with Differential Privacy: Separations


and Regret Speed-ups

Fengyu Gao, Ruiquan Huang, Jing Yang


arXiv:2409.19092v1 [cs.LG] 27 Sep 2024

School of EECS, The Pennsylvania State University, USA


{fengyugao, rzh5514, yangjing}@psu.edu

Abstract

We study the problems of differentially private federated online prediction from


experts against both stochastic adversaries and oblivious adversaries. We aim to
minimize the average regret on m clients working in parallel over time horizon T
with explicit differential privacy (DP) guarantees. With stochastic
√ adversaries, we
propose a Fed-DP-OPE-Stoch algorithm that achieves m-fold speed-up of the
per-client regret compared to the single-player counterparts under both pure DP and
approximate DP constraints, while maintaining logarithmic communication costs.
With oblivious adversaries, we establish non-trivial lower bounds indicating that
collaboration among clients does not lead to regret speed-up with general oblivious
adversaries. We then consider a special case of the oblivious adversaries setting,
where there exists a low-loss expert. We design a new algorithm Fed-SVT and show
that it achieves an m-fold regret speed-up under both pure DP and approximate
DP constraints over the single-player counterparts. Our lower bound indicates that
Fed-SVT is nearly optimal up to logarithmic factors. Experiments demonstrate the
effectiveness of our proposed algorithms. To the best of our knowledge, this is the
first work examining the differentially private online prediction from experts in the
federated setting.

1 Introduction

Federated Learning (FL) (McMahan et al., 2017) is a distributed machine learning framework, where
numerous clients collaboratively train a model by exchanging model update through a server. Owing
to its advantage in protecting the privacy of local data and reducing communication overheads,
FL is gaining increased attention in the research community, particularly in the online learning
framework (Mitra et al., 2021; Park et al., 2022; Kwon et al., 2023; Gauthier et al., 2023). Noticeable
advancements include various algorithms in federated multi-armed bandits (Shi et al., 2021; Huang
et al., 2021; Li and Wang, 2022; Yi and Vojnovic, 2022, 2023), federated online convex optimization
(Patel et al., 2023; Kwon et al., 2023; Gauthier et al., 2023), etc.
Meanwhile, differential privacy (DP) has been integrated into online learning, pioneered by Dwork
et al. (2010). Recently, Asi et al. (2022b) studied different types of adversaries, developing some of
the best existing algorithms and establishing lower bounds. Within the federated framework, although
differentially private algorithms have been proposed for stochastic bandits (Li et al., 2020b; Zhu et al.,
2021; Dubey and Pentland, 2022) and linear contextual bandits (Dubey and Pentland, 2020; Li et al.,
2022a; Zhou and Chowdhury, 2023; Huang et al., 2023), to the best of our knowledge, federated
online learning in the adversarial setting with DP considerations remain largely unexplored.

Preprint.
In this work, we focus on federated online prediction from experts (OPE) with rigorous differential
privacy (DP) guarantees1 . OPE (Arora et al., 2012) is a classical online learning problem under
which, a player chooses one out of a set of experts at each time slot and an adversary chooses a
loss function. The player incurs a loss based on its choice and observes the loss function. With all
previous observations, the player needs to decide which expert to select each time to minimize the
cumulative expected loss. We consider two types of adversaries in the context of OPE. The first
type, stochastic adversary, chooses a distribution over loss functions and samples a loss function
independently and identically distributed (IID) from this distribution at each time step. The second
type, oblivious adversary, chooses a sequence of loss functions in advance. 2
When extending the OPE problem to the federated setting, we assume that the system consists of a
central server and m local clients, where each client chooses from d experts to face an adversary at
each time step over time horizon T . The server coordinates the behavior of the clients by aggregating
the clients’ updates to form a global update (predicting a new global expert), while the clients use
the global expert prediction to update its local expert selection and compute local updates. The
local updates will be sent to the server periodically. In the Federated OPE framework, clients face
either stochastic adversaries, receiving loss functions from the same distribution in an IID fashion,
or oblivious adversaries, which arbitrarily select loss functions for each client at each time step
beforehand (Yi and Vojnovic, 2023). Specifically, we aim to answer the following question:

Can we design differentially private federated OPE algorithms to achieve regret speed-up against
both stochastic and oblivious adversaries?

In this paper, we give definitive answers to the question. Our contributions are summarized as follows.
• Speed-up for stochastic adversaries. We develop a communication-efficient algorithm Fed-DP-
OPE-Stoch for stochastic adversaries with DP guarantees. The algorithm features the following
elements in its design: 1) Local loss function gradient estimation for global expert determination.
To reduce communication cost, we propose to estimate the gradient of each client’s previous loss
functions locally, and only communicate these estimates to the server instead of all previous loss
functions. 2) Local privatization process. Motivated by the need for private communication in FL,
we add noise to client communications (local gradient estimates) adhering to the DP principles,
thereby building differentially private algorithms.

We show that Fed-DP-OPE-Stoch achieves 1/ m-fold reduction of the per-client regret compared
to the single-player counterparts (Asi et al., 2022b) under both pure DP and approximate DP
constraints, while maintaining logarithmic communication costs.
• Lower bounds for oblivious adversaries. We establish new lower bounds for federated OPE
with oblivious adversaries. Our findings reveal a critical insight: collaboration among clients does
not lead to regret speed-up in this context. Moreover, these lower bounds highlight a separation
between oblivious adversaries and stochastic adversaries, as the latter is necessary to reap the
benefits of collaboration in this framework.
Formulating an instance of oblivious adversaries for the corresponding lower bounds is a non-trivial
challenge, because it requires envisioning a scenario where the collaborative nature of FL does not
lead to the expected improvements in regret minimization. To deal with this challenge, we propose
a policy reduction approach in FL. By defining an “average policy” among all clients against a
uniform loss function generated by an oblivious adversary, we reduce the federated problem to a
single-player one, highlighting the equivalence of per-client and single-player regret. Our lower
bounds represent the first of their kind for differentially private federated OPE problems.
• Speed-up for oblivious adversaries under realizability assumption. We design a new algorithm
Fed-SVT that obtains near-optimal regret when there is a low-loss expert. We show that Fed-SVT
achieves an m-fold speed-up in per-client regret when compared to single-player models (Asi et al.,
2023). This shows a clear separation from the general setting where collaboration among clients
does not yield benefits in regret reduction when facing oblivious adversaries. Furthermore, we
establish a new lower bound in the realizable case. This underlines that our upper bound is nearly
optimal up to logarithmic factors.
1
We provide some applications of differentially private federated OPE in Appendix B.
2
We don’t consider adaptive adversaries in this work because they can force DP algorithms to incur a linear
regret, specifically when ε ≤ 1/10 for ε-DP (Theorem 10 in Asi et al. (2022b)).

2
Table 1: Comparisons for Online Prediction from Experts under DP Constraints

Adversaries Algorithm (Reference) Model DP √ Regret  Communication cost


Limited Updates (Asi et al., 2022b) SING ε-DP, (ε, δ)-DP O T log d + log dεlog T -
Stochastic q 
T log d
Fed-DP-OPE-Stoch (Corollary 1) FED ε-DP, (ε, δ)-DP O m + log√dmε
log T
O (md log T )
 2

Sparse-Vector (Asi et al., 2023) SING ε-DP O log T logεd+log d -
 
log T log d+log2 d
Oblivious (Realizable) Fed-SVT (Theorem 4) FED ε-DP O + N log d O (mdT /N )
 mε 3/2

Sparse-Vector (Asi et al., 2023) SING (ε, δ)-DP O log T log d+log
ε
d
-
 
log T log d+log3/2 d
Fed-SVT (Theorem 4) FED (ε, δ)-DP O mε  + N log d O (mdT /N )

Lower bound (Theorem 3) FED ε-DP, (ε, δ)-DP Ω logmε
d
-
  
log d
Oblivious Lower bound (Theorem 2) FED ε-DP, (ε, δ)-DP Ω min ε ,T -

m: number of clients; T : time horizon; d: number of experts; ε, δ: DP parameters; SING and FED stand for single-client and federated
settings, respectively.

A summary of our main results and how they compare with the state of the art is shown in Table 1. A
comprehensive literature review is provided in Appendix A.

2 Preliminaries
Federated online prediction from experts. Federated OPE consists of a central server, m clients
and an interactive T -round game between an adversary and an algorithm. At time step t, each
client i ∈ [m] first selects an expert xi,t ∈ [d], and then, the adversary releases a loss function
li,t . Stochastic adversaries choose a distribution over loss functions and sample a sequence of loss
functions l1,1 , . . . , lm,T in an IID fashion from this distribution, while oblivious adversaries choose a
sequence of loss functions l1,1 , . . . , lm,T at the beginning of the game.
For federated OPE, the utility of primary interest is the expected cumulative regret among all clients
defined as: "m T #
m X T
1 XX X

Reg(T, m) = li,t (xi,t ) − min li,t (x ) .
m i=1 t=1 x⋆ ∈[d]
i=1 t=1

Differential privacy. We define differential privacy in the online setting following (Dwork et al.,
2010). If an adversary chooses a loss sequence S = (l1,1 , . . . , lm,T ), we denote A(S) =
(x1,1 , . . . , xm,T ) the output of the interaction between the federated online algorithm A and the
adversary. We say S = (l1,1 , . . . , lm,T ) and S ′ = l1,1
′ ′

, . . . , lm,T are neighboring datasets if S and
S ′ differ in a one element. 3 .
Definition 1 ((ε, δ)-DP). A randomized federated algorithm A is (ε, δ)-differentially private against
an adversary if, for all neighboring datasets S and S ′ and for all events O in the output space of A,
we have
P[A(S) ∈ O] ≤ eε P [A (S ′ ) ∈ O] + δ.
Communication model. Our setup involves a central server facilitating periodic communication with
zero latency with all clients. Specifically, the clients can send “local model updates” to the central
server, which then aggregates and broadcasts the updated “global model” to the clients. We assume
full synchronization between clients and the server (McMahan et al., 2017). Following Wang et al.
(2019), we define the communication cost of an algorithm as the number of scalars (integers or real
numbers) communicated between the server and clients.

3 Federated OPE with Stochastic Adversaries


In this section, we aim to design an algorithm for Fed-DP-OPE with stochastic adversaries that
achieves regret speed-up compared to the single-player setting under DP constraints with low
3
We note that our definition is slightly different from that in Asi et al. (2022b), where the loss functions are
explicitly dependent on a sequence of variables zi , . . . , zT ∈ Z. This is because we focus on stochastic and
oblivious adversaries in this work, and our definition is essentially equivalent to that in Asi et al. (2022b) for
those cases.

3
communication cost. We consider the loss functions l1,1 (·), . . . , lm,T (·) to be convex, α-Lipschitz
and β-smooth w.r.t. ∥ · ∥1 in this section.

3.1 Intuition behind Algorithm Design

To gain a better understanding of our algorithm design, we first elaborate the difficulties encountered
when extending prevalent OPE models to the FL setting under DP constraints. It is worth noting
that all current OPE models with stochastic adversaries rely on gradient-based optimization methods.
The central task in designing OPE models with stochastic adversaries lies in leveraging past loss
functions to guide the generation of expert predictions. Specifically, we focus on the prominent
challenges associated with the widely adopted Frank-Wolfe-based methods (Asi et al., 2022b). This
algorithm iteratively moves the expert selection xt towards a point that minimizes the gradient
estimate
n derived from thePpast loss functions
o l1 , . . . , lt−1 over the decision space X , where X =
d d
∆d = x ∈ R : xi ≥ 0, i=1 xi = 1 , and each x ∈ X represents a probability distribution over
d experts. With DP constraints, a tree-based method is used for private aggregation of the gradients
of loss functions (Asi et al., 2021b).
In the federated setting, it is infeasible for the central server to have full access to past loss functions
due to the high communication cost. To overcome this, we use a design of local loss function gradient
estimation for global expert determination. Our solution involves locally estimating the gradient of
each client’s loss functions, and then communicating these estimates to the server, which globally
generates a new prediction. This strategy bypasses the need for full access to all loss functions,
reducing the communication overhead while maintaining efficient expert selection.
To enhance privacy in the federated system, we implement a local privatization process. When
communication is triggered, clients send “local estimates” of the gradients of their loss functions to
the server. These local estimates include strategically added noise, adhering to DP principles. The
privatization method is crucial as it extends beyond privatizing the selection of experts; it ensures the
privacy of all information exchanged between the central server and clients within the FL framework.

3.2 Algorithm Design

To address the aforementioned challenges, we propose the Fed-DP-OPE-Stoch algorithm. The


Fed-DP-OPE-Stoch algorithm works in phases. In total, it has P phases, and each phase p ∈ [P ]
contains 2p−1 time indices. Fed-DP-OPE-Stoch contains a client-side subroutine (Algorithm 1) and a
server-side subroutine (Algorithm 2). The framework of our algorithm is outlined as follows.
At the initialization phase, the server selects an arbitrary point z ∈ X and broadcasts to all clients.
Subsequently, each client initializes its expert selection xi,1 = z, pays cost li,1 (xi,1 ) and observes
the loss function.
Starting at phase p = 2, each client uses loss functions from the last phase to update its local loss
function gradient estimation, and then coordinates with the server to update its expert selection. After
that, it sticks with its decision throughout the current phase and observes the loss functions. We
elaborate this procedure as follows.
Private Local Loss Function Gradient Estimation. At the beginning of phase p, each client privately
estimates the gradient using local loss functions from the last phase Bi,p = {li,2p−2 , . . . , li,2p−1 −1 }.
We employ the tree mechanism for the private aggregation at each client, as in the DP-FW algorithm
from Asi et al. (2021b). Roughly speaking, DP-FW involves constructing binary trees and allocating
sets of loss functions to each vertex. The gradient at each vertex is then estimated using the loss
functions of that vertex and the gradients along the path to the root. Specifically, we run a DP-FW
subroutine at each client i, with sample set Bi,p , parameter T1 and batch size b. In DP-FW, each
vertex s in the binary tree j corresponds to a gradient estimate vi,j,s . DP-FW iteratively updates
gradient estimates by visiting the vertices of binary trees. The details of DP-FW can be found in
Algorithm 3 in Appendix C.1.
Intuitively, in the DP-FW subroutine, reaching a leaf vertex marks the completion of gradient
refinement for a specific tree path. At these critical points, we initiate communication between the
central server and individual clients. Specifically, as the DP-FW subroutine reaches a leaf vertex
s of tree j, each client sends the server a set of noisy inner products for each decision set vertex

4
cn with their gradient estimate vi,j,s . In other words, each client communicates with the server by
sending {⟨cn , vi,j,s ⟩ + ξi,n }n∈[d] , where c1 , . . . , cd represents d vertices of decision set X = ∆d ,
j
ξi,n ∼ Lap(λi,j,s ), and λi,j,s = 4α2 bε .
Private Global Expert Prediction. After receiving {⟨cn , vi,j,s ⟩ + ξi,n }n∈[d] from all clients, the
central server privately predicts a new expert:
" m
#
1 X 
w̄j,s = arg min ⟨cn , vi,j,s ⟩ + ξi,n . (1)
cn :1≤n≤d m i=1

Subsequently, the server broadcasts the “global prediction” w̄j,s to all clients.
Local Expert Selection Updating. Denote the index of the leaf s of tree j as k. Then, upon receiving
the global expert selection w̄j,s , each client i updates its expert prediction for leaf k + 1, denoted as
xi,p,k+1 ∈ X , as follows:
xi,p,k+1 = (1 − ηi,p,k )xi,p,k + ηi,p,k w̄j,s , (2)
2
where ηi,p,k = k+1 .

After updating all leaf vertices of the trees, the client obtains xi,p,K , the final state of the expert
prediction in phase p. Then, each client i sticks with expert selection xi,p,K throughout phase p and
collects loss functions li,2p−1 , . . . , li,2p −1 .
Remark 1 (Comparison with Asi et al. (2022b)). The key difference between Fed-DP-OPE-Stoch and
non-federated algorithms (Asi et al., 2022b) lies in our innovative approach to centralized coordination
and communication efficiency. Unlike the direct application of DP-FW in each phase in Asi et al.
(2022b), our algorithm employs local loss function gradient estimation, global expert prediction
and local expert selection updating. Additionally, our strategic communication protocol, where
clients communicate with the server only when DP-FW subroutine reaches leaf vertices, significantly
reduces communication costs. Moreover, the integration of DP at the local level in our algorithm
distinguishes it from non-federated approaches.

Algorithm 1 Fed-DP-OPE-Stoch: Client i


1: Input: Phases P , trees T1 , decision set X = ∆d with vertices {c1 , . . . , cd }, batch size b.
2: Initialize: Set xi,1 = z ∈ X and pay cost li,1 (xi,1 ).
3: for p = 2 to P do
4: Set Bi,p = {li,2p−2 , . . . , li,2p−1 −1 }
5: Set k = 1 and xi,p,1 = xi,p−1,K
6: {vi,j,s }j∈[T1 ],s∈{0,1}≤j = DP-FW (Bi,p , T1 , b)
7: for all leaf vertices s reached in DP-FW do
8: Communicate to server: {⟨cn , vi,j,s ⟩ + ξi,n }n∈[d] , where ξi,n ∼ Lap(λi,j,s )
9: Receive from server: w̄j,s
10: Update xi,p,k according to Equation (2)
11: Update k = k + 1
12: end for
13: Final iterate outputs xi,p,K
14: for t = 2p−1 to 2p − 1 do
15: Receive loss li,t : X → R and pay cost li,t (xi,p,K )
16: end for
17: end for

3.3 Theoretical Guarantees

Now we are ready to present theoretical guarantees for Fed-DP-OPE-Stoch.


Theorem 1. Assume that loss function li,t(·) is convex, α-Lipschitz, β-smooth w.r.t. ∥ · ∥1 . Setting
j p−1
 √
bεβ m
λi,j,s = 4α2 2
bε , b = (p−1)2 and T1 =
1
2 log α log d , Fed-DP-OPE-Stoch (i) satisfies ε-DP and (ii)
achieves the per-client regret of
r √ √ !
T log d αβ log d T log T
O (α + β) log T + 1√
m m4 ε

5
Algorithm 2 Fed-DP-OPE-Stoch: Central server
1: Input: Phases P , number of clients m, decision set X = ∆d with vertices {c1 , . . . , cd }.
2: Initialize: Pick any z ∈ X and broadcast to clients.
3: for p = 2 to P do
4: Receive from clients: {⟨cn , vi,j,s ⟩ + ξi,n }n∈[d]
5: Update w̄j,s according to Equation (1)
6: Communicate to clients: w̄j,s
7: end for

 5 q 
with (iii) a communication cost of O m 4 d αTlog
εβ
d .

A similar result characterizing the performance of Fed-DP-OPE-Stoch under (ε, δ)-DP constraint is
shown in Theorem 6.

Proof sketch of DP guarantees and communication costs. Given neighboring datasets S and S ′ dif-
fering in li1 ,t1 or li′ 1 ,t1 , we first note that li1 ,t1 or li′ 1 ,t1 is used in only one phase, denoted as p1 .
Furthermore, note that li1 ,t1 or li′ 1 ,t1 is used in the gradient computation for at most 2j−|s| iterations,
corresponding to descendant leaf nodes. This insight allows us to set noise levels λi,j,s sufficiently
large to ensure each of these iterations is ε/2j−|s| -DP regarding output {⟨cn , vi1 ,j,s ⟩}n∈[d],j∈[T1 ],|s|=j .
By basic composition and post-processing, we can make sure the final output is ε-DP.
The communication cost is obtained by observing that there are p phases, and within
P each phase,
there are O(2T1 ) leaf vertices. Thus, the communication frequency scales in O( p 2T1 ) and the
communication cost scales in O(md p 2T1 ).
P

Proof sketch of regret upper bound. We first give bounds for the total regret in phase p ∈ [P ]. We
can show that for every t in phase p, we have

Lt (xi,p,k+1 ) ≤ Lt (xi,p,k ) + ηi,p,k [Lt (x⋆ ) − Lt (xi,p,k )] + ηi,p,k ∥∇Lt (xi,p,k ) − v̄p,k ∥∞
 
1
+ ηi,p,k ⟨v̄p,k , w̄p,k ⟩ − min ⟨v̄p,k , w⟩ + βηi,p,k 2 .
w∈X 2

To upper bound the regret, we bound the following two quantities separately.
Step 1: bound ∥∇Lt (xi,p,k ) − v̄p,k ∥∞ in Lemma 3. We show that every index of the d-dimensional
2
+β 2
vector ∇Lt (xi,p,k )− v̄p,k is O( α bm )-sub-Gaussian by induction
 on the depth ofvertex in Lemma 2.
q
Therefore, E [∥∇Lt (xi,p,k ) − v̄p,k ∥∞ ] is upper bounded by O (α + β) log bm
d
.

Step 2: bound ⟨v̄p,k , w̄p,k ⟩ − min ⟨v̄p,k , w⟩ in Lemma 5. We establish that ⟨v̄p,k , w̄p,k ⟩ −
w∈X
2
Pm
min ⟨v̄p,k , w⟩ is upper bounded by m max i=1 ξi,n . This bound incorporates the maxi-
w∈X n:1≤n≤d
mum absolute sum of m IID Laplace random variables ξi,n . To quantify the bound on the
 of m IID Laplace random
sum  variables, we refer

to Lemma 4. Consequently, we have

λi,j,s ln d
E ⟨v̄p,k , w̄p,k ⟩ − min ⟨v̄p,k , w⟩ upper bounded by O √
m
.
w∈X

With the two steps above, we can show that for each! time step t in phase p, E[Lt (xi,p,K ) − Lt (x⋆ )]
q √
is upper bounded by O (α + β) log d
2p m +
αβ log d
1√
p
. Summing up all the phases gives us the final
m4 2 ε
upper bound. The full proof of Theorem 1 can be found in Appendix C.2.

When β is small, Theorem 1 reduces to the following corollary.

6
Corollary 1. If β = O( √log d
mT ε
), then, Fed-DP-OPE-Stoch (i) satisfies ε-DP and (ii) achieves the
per-client regret of !
r
T log d log T log d
Reg(T, m) = O + √ ,
m mε
with (iii) a communication cost of O (md log T ).

The full proof of Corollary 1 can be found in Appendix C.2.


√ 2. We note that the regret for the single-player counterpart Asi et al. (2022b) scales in
Remark
O T log d + log dεlog T when β = O( log d
T ε ). Compared with this upper bound, Fed-DP-OPE-

Stoch achieves m-fold speed-up, with a communication cost of O(md log T ). This observation
underscores the learning performance acceleration due to collaboration. Furthermore, our approach
extends beyond the mere privatization of selected experts; we ensure the privacy of all information
exchanged between the central server and clients.
Remark 3. We remark that Fed-DP-OPE-Stoch can be slightly modified into a centrally differentially
private algorithm, assuming client-server communication is secure. Specifically, we change the local
privatization process to a global privatization process on the server side. This mechanism results in
less noise added and thus better utility performance. The detailed design is deferred to Appendix C.4
and the theoretical guarantees are provided in Theorem 7.

4 Federated OPE with Oblivious Adversaries: Lower bounds


In this section, we shift our attention to the more challenging oblivious adversaries. We establish new
lower bounds for general oblivious adversaries.
To provide a foundational understanding, we start with some intuition behind FL. FL can potentially
speed up the learning process by collecting more data at the same time to gain better insights to
future predictions. In the stochastic setting, the advantage of collaboration lies in the ability to collect
more observations from the same distribution, which leads to variance reduction. However, when
facing oblivious adversaries, the problem changes fundamentally. Oblivious adversaries can select
loss functions arbitrarily, meaning that having more data does not necessarily help with predicting
their future selections.
The following theorem formally establishes the aforementioned intuition.
Theorem 2. For any federated
√ OPE algorithm against oblivious adversaries, the per-client regret is
lower bounded by Ω( T log d). Let ε ∈ (0, 1] and δ =
 o(1/T
 ), for any
 (ε, δ)-DP federated OPE
log d
algorithm, the per-client regret is lower bounded by Ω min ε ,T .
Remark 4. Theorem 2 states that with oblivious adversaries, the per-client regret under any federated
OPE is fundamentally independent of the number of clients m, indicating that, the collaborative
effort among multiple clients does not yield a reduction in the regret lower bound. This is contrary to
typical scenarios where collaboration can lead to shared insights and improve overall performance.
The varied and unpredictable nature of oblivious adversaries nullifies the typical advantages of
collaboration. Theorem 2 also emphasizes the influence of the DP guarantees. Our lower bounds
represent the first non-trivial impossibility results for the federated OPE to the best of our knowledge.

Proof sketch. We examine the case when all clients receive the same loss function from the oblivious
adversary at each time step, i.e. li,tP= lt′ . Within this framework, we define the “average policy"
m
among all clients, i.e., p′t (k) = m 1
pi,t (k), ∀k ∈ [d]. This leads us to express the per-client
PT Pd i=1′ PT
regret as: Reg(T, m) = t=1 k=1 pt (k) · lt′ (k) − t=1 lt′ (x⋆ ). Note that p′t (k) is defined by
p1,t (k), . . . , pm,t (k), which in turn are determined by l1,1 , . . . , lm,t−1 . According to our choice
of li,t = lt′ , p′t (k) is determined by l1′ , l2′ , . . . , lt−1

. Therefore, p′1 , p′2 , . . . , p′t are generated by a
legitimate algorithm for online learning with expert advice problems. Through our policy reduction
approach in FL, we can reduce the federated problem to a single-player setting, showing the
equivalence of per-client and single-player regret against the oblivious adversary we construct, thus
obtaining the regret lower bound. We believe that this technique will be useful in future analysis of
other FL algorithm. Incorporating DP constraints, we refer to Lemma 9, which transforms the DP

7
online learning problem to a well-examined DP batch model. The full proof of Theorem 2 can be
found in Appendix D.

5 Federated OPE with Oblivious Adversaries: Realizable Setting


Given the impossibility results in the general oblivious adversaries setting, one natural question we
aim to answer is, is there any special scenarios where we can still harness the power of federation
even in presence of oblivious adversaries? Towards this end, in this section, we focus on the (near)
realizable setting, formally defined below.
Definition 2 (Realizability). A federated OPE problem is realizable if there exists a feasible solution
PT
x⋆ ∈ [d] such that t=1 li,t (x⋆ ) = 0, ∀i ∈ [m]. If the best expert achieves small loss L⋆ ≪ T , i.e.,
PT
there exists x⋆ ∈ [d] such that t=1 li,t (x⋆ ) ≤ L⋆ , ∀i ∈ [m], the problem is near-realizable.

Intuitively, collaboration is notably advantageous in this context, as all clients share the same goal of
reaching the zero-loss solution x⋆ . As more clients participate, the shared knowledge pool expands,
making the identification of the optimal solution more efficient. In the following, we first provide
regret lower bounds to quantify this benefit formally, and then show that a sparse vector based
federated algorithm can almost achieve such lower bounds thus is nearly optimal.

5.1 Lower Bound

Theorem 3. Let ε ≤ 1 and δ ≤ ε/d. For any (ε, δ)-DP federated OPE algorithm against
 oblivious
adversaries in the realizable setting, the per-client regret is lower bounded by Ω log(d)
mε .
 
Remark 5. In the single-player setting, the regret bound is Ω log(d) ε (Asi et al., 2023). In the
1
federated setting, our results imply that the per-client regret is reduced to m times the single-player
regret. This indicates a possible m-fold speed-up in the federated setting.

Proof sketch. To begin, we consider a specific oblivious adversary. We introduce two prototype
loss functions: l0 (x) = 0 for all x ∈ [d] and for j ∈ [d] lj (x) = 0 if x = j and otherwise
lj (x) = 1. An oblivious adversary picks one of the d sequences S 1 , . . . , S d uniformly at random,
j j j j
where S j = (l1,1 , . . . , lm,T ) such that li,t = l0 if t = 1, . . . , T − k, otherwise, li,t = lj , and
log(d)
k = log d
2mε . Assume there exists an algorithm such that the per-client regret is upper bounded by 32mε .
log(d)
This would imply that for at least d/2 of S 1 , . . . , S d , the per-client regret is upper bounded by 16mε .
Assume without loss of generality these sequences are S 1 , . . . , S d/2 . We let Bj be the set of expert
selections that has low regret on S j . We can show that Bj and Bj ′ are disjoint for j ̸= j ′ , implying

choosing any expert j leads to low regret for loss  sequence
 S jbut high regret for S j . By group

privacy, we have P A S j ∈ Bj ′ ≥ e−mkε P A S j ∈ Bj ′ − mkδ, leading to a contradiction
 

when d ≥ 32. The full proof of Theorem 3 can be found in Appendix E.1.

5.2 Algorithm Design

We develop a new algorithm Fed-SVT. Our algorithm operates as follows:


Periodic communication. We adopt a fixed communication schedule in our federated setting,
splitting the time horizon T into T /N phases, each with length N . In Fed-SVT, every client selects
the same expert, i.e., xi,t = xt at each time step t. Initially, each client starts with a randomly chosen
expert x1 . At the beginning of each phase n, each client sends the accumulated loss of the last phase
PnN −1
t′ =(n−1)N li,t (x) to the central server.

PnN −1
Global expert selection. The server, upon receiving t′ =(n−1)N li,t′ (x) from all clients, decides
whether to continue with the current expert or switch to a new one. This decision is grounded in the
Sparse-Vector algorithm (Asi et al., 2023), where the accumulated loss from all clients over a phase is
treated as a single loss instance in the Sparse-Vector algorithm. Based on the server’s expert decision,
clients update their experts accordingly. The full algorithm is provided in Appendix E.2.

8
5.3 Theoretical Guarantees
Theorem 4. Let li,t ∈ [0, 1]d be chosen by an oblivious adversary under near-
realizability assumption. Fed-SVT is ε-DP, the communication cost scales in O (mdT /N ),
and with probability
 2 
at least 1 − O(ρ), the pre-client regret is upper bounded by
 2  
ρ)
log (d)+log NT2 ρ log( d
O mε + (N + L ) log dρ

. Moreover, Fed-SVT is (ε, δ)-DP, the communi-
cation cost scales in O (mdT /N ), and with probability at least 1 − O(ρ), the ! pre-client regret is
3 √  2 
log 2 (d) log( δ1 )+log NT2 ρ log( d
ρ)
 
upper bounded by O mε + (N + L⋆ ) log dρ .
 
Remark 6. Note that when N + L⋆ = O log T
mε , then, under Fed-SVT, the per-client regret upper
 √ 
log T log d+log3/2 d log(1/δ)
 2 
log d+log T log d
bound scales in O mε for ε-DP and O mε for (ε, δ)-DP.
 √ 
log T log d+log3/2 d log(1/δ)
 2 
log d+log T log d
Compared to the best upper bound O ε and O ε for

 ourresults indicate an m-fold regret speed-up. Note that


the single-player scenario (Asi et al., 2023),
our lower bound (Theorem 3) scales in Ω log d
mε . Our upper bound matches with the lower bound in
terms of m and ϵ, and is nearly optimal up to logarithmic factors.
The proof of Theorem 4 is presented in Appendix E.3.

6 Numerical Experiments
Fed-DP-OPE-Stoch. We conduct experiments in a synthetic environ-
ment to validate the theoretical performances of Fed-DP-OPE-Stoch, and
compare them with its single-player counterpart Limited Updates (Asi
et al., 2022b).
We first generate true class distributions for each client i ∈ [m] and
timestep t ∈ [T ], which are sampled IID from Gaussian distributions with
means and variances sampled uniformly. Following Gaussian sampling, Figure 1: Per-client regret.
we apply a softmax transformation and normalization to ensure the outputs
are valid probability distributions. Then we generate a sequence of IID loss functions l1,1 , . . . , lm,T
for m clients over T timesteps using cross-entropy between true class distributions and predictions.
We set T1 = 1 in Fed-DP-OPE-Stoch, therefore the communication cost scales in O(md log T ). We
set m = 10, T = 214 , ε = 10, δ = 0 and d = 100. The per-client cumulative regret as a function of
T is plotted in Figure 1. We see that Fed-DP-OPE-Stoch outperforms Limited Updates significantly,
indicating the regret speed-up due to collaboration. Results with different seeds are provided in
Appendix F.
Fed-SVT. We conduct experiments in a synthetic environment, comparing
Fed-SVT with the single-player model Sparse-Vector (Asi et al., 2023).
We generate random losses for each expert at every timestep and for each
client, ensuring that one expert always has zero loss to simulate an optimal
choice. We set m = 10, T = 29 , ε = 10, δ = 0 and d = 100. In Fed-
SVT, we experiment with communication intervals N = 1, 30, 50, where
communication cost scales in O(mdT /N ). The per-client cumulative Figure 2: Per-client regret.
regret as a function of T is plotted in Figure 2. Our results show that
Fed-SVT significantly outperforms the Sparse-Vector, highlighting the benefits of collaborative
expert selection in regret speed-up, even at lower communication costs (notably in the N = 50 case).
Results with different seeds are provided in Appendix F. Additionally, we evaluate the performances
of Fed-SVT on the MovieLens-1M dataset (Harper and Konstan, 2015) in Appendix G.

7 Conclusions
In this paper, we have advanced the state-of-the-art of differentially private federated online prediction
from experts, addressing both stochastic and oblivious adversaries. Our Fed-DP-OPE-Stoch algorithm

9

showcases a significant m-fold regret speed-up compared to single-player models with stochastic
adversaries, while effectively maintaining logarithmic communication costs. For oblivious adversaries,
we established non-trivial lower bounds, highlighting the limited benefits of client collaboration.
Additionally, our Fed-SVT algorithm demonstrates an m-fold speed-up, indicating near-optimal
performance in settings with low-loss experts. One limitation of this work is the lack of experiments
on real-world federated learning scenarios such as recommender systems and healthcare. We leave
this exploration to future work.

References
Agarwal, A., Foster, D. P., Hsu, D. J., Kakade, S. M., and Rakhlin, A. (2011). Stochastic convex
optimization with bandit feedback. Advances in Neural Information Processing Systems, 24.
Agarwal, N., Kale, S., Singh, K., and Thakurta, A. (2023a). Differentially private and lazy online
convex optimization. In The Thirty Sixth Annual Conference on Learning Theory, pages 4599–4632.
PMLR.
Agarwal, N., Kale, S., Singh, K., and Thakurta, A. G. (2023b). Improved differentially private and
lazy online convex optimization. arXiv preprint arXiv:2312.11534.
Agarwal, N. and Singh, K. (2017). The price of differential privacy for online learning. In Interna-
tional Conference on Machine Learning, pages 32–40. PMLR.
Altschuler, J. and Talwar, K. (2018). Online learning over a finite action set with limited switching.
In Conference On Learning Theory, pages 1569–1573. PMLR.
Arora, S., Hazan, E., and Kale, S. (2012). The multiplicative weights update method: a meta-algorithm
and applications. Theory of computing, 8(1):121–164.
Asi, H., Chadha, K., Cheng, G., and Duchi, J. (2022a). Private optimization in the interpolation
regime: faster rates and hardness results. In International Conference on Machine Learning, pages
1025–1045. PMLR.
Asi, H., Duchi, J., Fallah, A., Javidbakht, O., and Talwar, K. (2021a). Private adaptive gradient
methods for convex optimization. In International Conference on Machine Learning, pages
383–392. PMLR.
Asi, H., Feldman, V., Koren, T., and Talwar, K. (2021b). Private stochastic convex optimization:
Optimal rates in l1 geometry. In International Conference on Machine Learning, pages 393–403.
PMLR.
Asi, H., Feldman, V., Koren, T., and Talwar, K. (2022b). Private online prediction from experts:
Separations and faster rates. arXiv preprint arXiv:2210.13537.
Asi, H., Feldman, V., Koren, T., and Talwar, K. (2023). Near-optimal algorithms for private online
optimization in the realizable regime. arXiv preprint arXiv:2302.14154.
Azize, A. and Basu, D. (2022). When privacy meets partial information: A refined analysis of
differentially private bandits. Advances in Neural Information Processing Systems, 35:32199–
32210.
Bar-On, Y. and Mansour, Y. (2019). Individual regret in cooperative nonstochastic multi-armed
bandits. Advances in Neural Information Processing Systems, 32.
Bassily, R., Feldman, V., Talwar, K., and Guha Thakurta, A. (2019). Private stochastic convex
optimization with optimal rates. Advances in neural information processing systems, 32.
Bassily, R., Smith, A., and Thakurta, A. (2014). Private empirical risk minimization: Efficient
algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of
computer science, pages 464–473. IEEE.
Bubeck, S., Li, Y., Luo, H., and Wei, C.-Y. (2019). Improved path-length regret bounds for bandits.
In Conference On Learning Theory, pages 508–528. PMLR.

10
Buccapatnam, S., Eryilmaz, A., and Shroff, N. B. (2014). Stochastic bandits with side observations on
networks. In The 2014 ACM international conference on Measurement and modeling of computer
systems, pages 289–300.
Caron, S., Kveton, B., Lelarge, M., and Bhagat, S. (2012). Leveraging side observations in stochastic
bandits. arXiv preprint arXiv:1210.4839.
Cesa-Bianchi, N., Gentile, C., Mansour, Y., and Minora, A. (2016). Delay and cooperation in
nonstochastic bandits. In Conference on Learning Theory, pages 605–622. PMLR.
Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, learning, and games. Cambridge university
press.
Cesa-Bianchi, N., Mansour, Y., and Stoltz, G. (2007). Improved second-order bounds for prediction
with expert advice. Machine Learning, 66:321–352.
Charisopoulos, V., Esfandiari, H., and Mirrokni, V. (2023). Robust and private stochastic linear
bandits. In International Conference on Machine Learning, pages 4096–4115. PMLR.
Chen, L., Yu, Q., Lawrence, H., and Karbasi, A. (2020). Minimax regret of switching-constrained
online convex optimization: No phase transition. Advances in Neural Information Processing
Systems, 33:3477–3486.
Cheng, D., Zhou, X., and Ji, B. (2023). Understanding the role of feedback in online learning with
switching costs. arXiv preprint arXiv:2306.09588.
Chowdhury, S. R. and Zhou, X. (2022a). Distributed differential privacy in multi-armed bandits.
arXiv preprint arXiv:2206.05772.
Chowdhury, S. R. and Zhou, X. (2022b). Shuffle private linear contextual bandits. arXiv preprint
arXiv:2202.05567.
Cotter, A., Shamir, O., Srebro, N., and Sridharan, K. (2011). Better mini-batch algorithms via
accelerated gradient methods. Advances in neural information processing systems, 24.
Dubey, A. and Pentland, A. (2020). Differentially-private federated linear bandits. Advances in
Neural Information Processing Systems, 33:6003–6014.
Dubey, A. and Pentland, A. (2022). Private and byzantine-proof cooperative decision-making. arXiv
preprint arXiv:2205.14174.
Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive subgradient methods for online learning and
stochastic optimization. Journal of machine learning research, 12(7).
Duchi, J. C., Chaturapruek, S., and Ré, C. (2015). Asynchronous stochastic convex optimization.
arXiv preprint arXiv:1508.00882.
Duchi, J. C., Lafferty, J., Zhu, Y., et al. (2016). Local minimax complexity of stochastic convex
optimization. Advances in Neural Information Processing Systems, 29.
Dwork, C., Naor, M., Pitassi, T., and Rothblum, G. N. (2010). Differential privacy under continual
observation. In Proceedings of the forty-second ACM symposium on Theory of computing, pages
715–724.
Dwork, C., Roth, A., et al. (2014). The algorithmic foundations of differential privacy. Foundations
and Trends® in Theoretical Computer Science, 9(3–4):211–407.
Feldman, V., Koren, T., and Talwar, K. (2020). Private stochastic convex optimization: optimal
rates in linear time. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of
Computing, pages 439–449.
Fichtenberger, H., Henzinger, M., and Upadhyay, J. (2023). Constant matters: Fine-grained error
bound on differentially private continual observation. In International Conference on Machine
Learning, pages 10072–10092. PMLR.

11
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an
application to boosting. Journal of computer and system sciences, 55(1):119–139.
Garcelon, E., Chaudhuri, K., Perchet, V., and Pirotta, M. (2022). Privacy amplification via shuffling
for linear contextual bandits. In International Conference on Algorithmic Learning Theory, pages
381–407. PMLR.
Gauthier, F., Gogineni, V. C., Werner, S., Huang, Y.-F., and Kuh, A. (2023). Asynchronous online
federated learning with reduced communication requirements. arXiv preprint arXiv:2303.15226.
Geulen, S., Vöcking, B., and Winkler, M. (2010). Regret minimization for online buffering problems
using the weighted majority algorithm. In COLT, pages 132–143. Citeseer.
Golowich, N. and Livni, R. (2021). Littlestone classes are privately online learnable. Advances in
Neural Information Processing Systems, 34:11462–11473.
Gonen, A., Hazan, E., and Moran, S. (2019). Private learning implies online learning: An efficient
reduction. Advances in Neural Information Processing Systems, 32.
Guha Thakurta, A. and Smith, A. (2013). (nearly) optimal algorithms for private online learning in
full-information and bandit settings. Advances in Neural Information Processing Systems, 26.
Hanna, O. A., Girgis, A. M., Fragouli, C., and Diggavi, S. (2022). Differentially private stochastic
linear bandits:(almost) for free. arXiv preprint arXiv:2207.03445.
Harper, F. M. and Konstan, J. A. (2015). The movielens datasets: History and context. Acm
transactions on interactive intelligent systems (tiis), 5(4):1–19.
He, J., Wang, T., Min, Y., and Gu, Q. (2022). A simple and provably efficient algorithm for
asynchronous federated contextual linear bandits. Advances in neural information processing
systems, 35:4762–4775.
Hong, S. and Chae, J. (2021). Communication-efficient randomized algorithm for multi-kernel online
federated learning. IEEE transactions on pattern analysis and machine intelligence, 44(12):9872–
9886.
Hu, B. and Hegde, N. (2022). Near-optimal thompson sampling-based algorithms for differentially
private stochastic bandits. In Uncertainty in Artificial Intelligence, pages 844–852. PMLR.
Hu, C., Pan, W., and Kwok, J. (2009). Accelerated gradient methods for stochastic optimization and
online learning. Advances in Neural Information Processing Systems, 22.
Huang, R., Wu, W., Yang, J., and Shen, C. (2021). Federated linear contextual bandits. Advances in
neural information processing systems, 34:27057–27068.
Huang, R., Zhang, H., Melis, L., Shen, M., Hejazinia, M., and Yang, J. (2023). Federated linear
contextual bandits with user-level differential privacy. In International Conference on Machine
Learning, pages 14060–14095. PMLR.
Jain, P., Kothari, P., and Thakurta, A. (2012). Differentially private online learning. In Conference on
Learning Theory, pages 24–1. JMLR Workshop and Conference Proceedings.
Jain, P., Raskhodnikova, S., Sivakumar, S., and Smith, A. (2023). The price of differential privacy
under continual observation. In International Conference on Machine Learning, pages 14654–
14678. PMLR.
Jain, P. and Thakurta, A. G. (2014). (near) dimension independent risk bounds for differentially
private learning. In International Conference on Machine Learning, pages 476–484. PMLR.
Kalai, A. and Vempala, S. (2005). Efficient algorithms for online decision problems. Journal of
Computer and System Sciences, 71(3):291–307.
Kaplan, H., Mansour, Y., Moran, S., Nissim, K., and Stemmer, U. (2023a). Black-box differential
privacy for interactive ml. In Thirty-seventh Conference on Neural Information Processing Systems.

12
Kaplan, H., Mansour, Y., Moran, S., Nissim, K., and Stemmer, U. (2023b). On differentially private
online predictions. arXiv preprint arXiv:2302.14099.
Kwon, D., Park, J., and Hong, S. (2023). Tighter regret analysis and optimization of online federated
learning. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Li, C. and Wang, H. (2022). Asynchronous upper confidence bound algorithms for federated linear
bandits. In International Conference on Artificial Intelligence and Statistics, pages 6529–6553.
PMLR.
Li, C., Zhou, P., Xiong, L., Wang, Q., and Wang, T. (2018). Differentially private distributed online
learning. IEEE Transactions on Knowledge and Data Engineering, 30(8):1440–1453.
Li, F., Zhou, X., and Ji, B. (2022a). Differentially private linear bandits with partial distributed
feedback. In 2022 20th International Symposium on Modeling and Optimization in Mobile, Ad
hoc, and Wireless Networks (WiOpt), pages 41–48. IEEE.
Li, F., Zhou, X., and Ji, B. (2023). (private) kernelized bandits with distributed biased feedback.
Proceedings of the ACM on Measurement and Analysis of Computing Systems, 7(1):1–47.
Li, S., Chen, W., Wen, Z., and Leung, K.-S. (2020a). Stochastic online learning with probabilistic
graph feedback. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34,
pages 4675–4682.
Li, T., Song, L., and Fragouli, C. (2020b). Federated recommendation system via differential privacy.
In 2020 IEEE international symposium on information theory (ISIT), pages 2592–2597. IEEE.
Li, W., Song, Q., Honorio, J., and Lin, G. (2022b). Federated x-armed bandit. arXiv preprint
arXiv:2205.15268.
Littlestone, N. and Warmuth, M. K. (1994). The weighted majority algorithm. Information and
computation, 108(2):212–261.
Liu, C. and Belkin, M. (2018). Accelerating sgd with momentum for over-parameterized learning.
arXiv preprint arXiv:1810.13395.
M Ghari, P. and Shen, Y. (2022). Personalized online federated learning with multiple kernels.
Advances in Neural Information Processing Systems, 35:33316–33329.
Ma, S., Bassily, R., and Belkin, M. (2018). The power of interpolation: Understanding the effec-
tiveness of sgd in modern over-parametrized learning. In International Conference on Machine
Learning, pages 3325–3334. PMLR.
Mahdavi, M., Yang, T., and Jin, R. (2013). Stochastic convex optimization with multiple objectives.
Advances in neural information processing systems, 26.
McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017). Communication-
efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics,
pages 1273–1282. PMLR.
Mishra, N. and Thakurta, A. (2015). (nearly) optimal differentially private stochastic multi-arm
bandits. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence,
pages 592–601.
Mitra, A., Hassani, H., and Pappas, G. J. (2021). Online federated learning. In 2021 60th IEEE
Conference on Decision and Control (CDC), pages 4083–4090. IEEE.
Park, J., Kwon, D., et al. (2022). Ofedqit: Communication-efficient online federated learning via
quantization and intermittent transmission. arXiv preprint arXiv:2205.06491.
Patel, K. K., Wang, L., Saha, A., and Srebro, N. (2023). Federated online and bandit convex
optimization.
Rakhlin, A., Sridharan, K., and Tewari, A. (2011). Online learning: Stochastic, constrained, and
smoothed adversaries. Advances in neural information processing systems, 24.

13
Ren, W., Zhou, X., Liu, J., and Shroff, N. B. (2020). Multi-armed bandits with local differential
privacy. arXiv preprint arXiv:2007.03121.
Sajed, T. and Sheffet, O. (2019). An optimal private stochastic-mab algorithm based on optimal
private stopping rule. In International Conference on Machine Learning, pages 5579–5588. PMLR.
Shalev-Shwartz, S. et al. (2012). Online learning and online convex optimization. Foundations and
Trends® in Machine Learning, 4(2):107–194.
Shalev-Shwartz, S., Shamir, O., Srebro, N., and Sridharan, K. (2009). Stochastic convex optimization.
In COLT, volume 2, page 5.
Shariff, R. and Sheffet, O. (2018). Differentially private contextual linear bandits. Advances in
Neural Information Processing Systems, 31.
Sherman, U. and Koren, T. (2021). Lazy oco: Online convex optimization on a switching budget. In
Conference on Learning Theory, pages 3972–3988. PMLR.
Shi, C. and Shen, C. (2021). Federated multi-armed bandits. In Proceedings of the AAAI Conference
on Artificial Intelligence, volume 35, pages 9603–9611.
Shi, C., Shen, C., and Yang, J. (2021). Federated multi-armed bandits with personalization. In
International conference on artificial intelligence and statistics, pages 2917–2925. PMLR.
Srebro, N., Sridharan, K., and Tewari, A. (2010). Smoothness, low noise and fast rates. Advances in
neural information processing systems, 23.
Steinhardt, J. and Liang, P. (2014). Adaptivity and optimism: An improved exponentiated gradient
algorithm. In International conference on machine learning, pages 1593–1601. PMLR.
Tao, Y., Wu, Y., Zhao, P., and Wang, D. (2022). Optimal rates of (locally) differentially private heavy-
tailed multi-armed bandits. In International Conference on Artificial Intelligence and Statistics,
pages 1546–1574. PMLR.
Tenenbaum, J., Kaplan, H., Mansour, Y., and Stemmer, U. (2021). Differentially private multi-armed
bandits in the shuffle model. Advances in Neural Information Processing Systems, 34:24956–
24967.
Tossou, A. and Dimitrakakis, C. (2016). Algorithms for differentially private multi-armed bandits. In
Proceedings of the AAAI Conference on Artificial Intelligence, volume 30.
Tossou, A., Dimitrakakis, C., and Dubhashi, D. (2017). Thompson sampling for stochastic bandits
with graph feedback. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31.
Vaswani, S., Bach, F., and Schmidt, M. (2019). Fast and faster convergence of sgd for over-
parameterized models and an accelerated perceptron. In The 22nd international conference on
artificial intelligence and statistics, pages 1195–1204. PMLR.
Wang, C.-H., Li, W., Cheng, G., and Lin, G. (2022a). Federated online sparse decision making. arXiv
preprint arXiv:2202.13448.
Wang, H., Zhao, D., and Wang, H. (2022b). Dynamic global sensitivity for differentially private
contextual bandits. In Proceedings of the 16th ACM Conference on Recommender Systems, pages
179–187.
Wang, H., Zhao, Q., Wu, Q., Chopra, S., Khaitan, A., and Wang, H. (2020). Global and local
differential privacy for collaborative bandits. In Proceedings of the 14th ACM Conference on
Recommender Systems, pages 150–159.
Wang, Y., Hu, J., Chen, X., and Wang, L. (2019). Distributed bandit learning: Near-optimal regret
with efficient communication. arXiv preprint arXiv:1904.06309.
Wei, C.-Y. and Luo, H. (2018). More adaptive algorithms for adversarial bandits. In Conference On
Learning Theory, pages 1263–1291. PMLR.

14
Woodworth, B. E. and Srebro, N. (2021). An even more optimal stochastic optimization algorithm:
minibatching and interpolation learning. Advances in Neural Information Processing Systems,
34:7333–7345.
Wu, Y., György, A., and Szepesvári, C. (2015). Online learning with gaussian payoffs and side
observations. Advances in Neural Information Processing Systems, 28.
Wu, Y., Zhou, X., Tao, Y., and Wang, D. (2023). On private and robust bandits. arXiv preprint
arXiv:2302.02526.
Yi, J. and Vojnovic, M. (2022). On regret-optimal cooperative nonstochastic multi-armed bandits.
arXiv preprint arXiv:2211.17154.
Yi, J. and Vojnovic, M. (2023). Doubly adversarial federated bandits. In International Conference on
Machine Learning, pages 39951–39967. PMLR.
Zheng, K., Cai, T., Huang, W., Li, Z., and Wang, L. (2020). Locally differentially private (contextual)
bandits learning. Advances in Neural Information Processing Systems, 33:12300–12310.
Zhou, X. and Chowdhury, S. R. (2023). On differentially private federated linear contextual bandits.
arXiv preprint arXiv:2302.13945.
Zhu, Z., Zhu, J., Liu, J., and Liu, Y. (2021). Federated bandit: A gossiping approach. Proceedings of
the ACM on Measurement and Analysis of Computing Systems, 5(1):1–29.
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In
Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936.

15
A Related Work

Online learning with stochastic adversaries. Online learning with stochastic adversaries has been
extensively studied (Rakhlin et al., 2011; Duchi et al., 2011; Hu et al., 2009; Li et al., 2020a; Caron
et al., 2012; Buccapatnam et al., 2014; Tossou et al., 2017; Wu et al., 2015). This problem is also
closely related to stochastic convex optimization (Shalev-Shwartz et al., 2009; Mahdavi et al., 2013;
Duchi et al., 2015; Agarwal et al., 2011; Duchi et al., 2016) since online learning with stochastic
adversaries can be transformed into the stochastic convex optimization problem using online-to-
batch conversion. In the federated setting, Mitra et al. (2021) proposed a federated online mirror
descent method. Patel et al. (2023) presented a federated projected online stochastic gradient descent
algorithm. Research on online learning with stochastic adversaries incorporating DP has also seen
significant advancements (Bassily et al., 2014, 2019; Feldman et al., 2020; Asi et al., 2021a,b, 2022b).
Online learning with non-stochastic adversaries. Online learning with non-stochastic adversaries
has been extensively studied (Cesa-Bianchi and Lugosi, 2006; Littlestone and Warmuth, 1994; Freund
and Schapire, 1997; Zinkevich, 2003). Federated online learning with non-stochastic adversaries, a
more recent development, was introduced by Hong and Chae (2021). Mitra et al. (2021) developed
a federated online mirror descent method. Park et al. (2022) presented a federated online gradient
descent algorithm. Kwon et al. (2023) studied data heterogeneity. Furthermore, Li et al. (2018) and
Patel et al. (2023) explored distributed online and bandit convex optimization. Gauthier et al. (2023)
focused on the asynchronous settings. Research into online learning with limited switching has also
been explored by Kalai and Vempala (2005); Geulen et al. (2010); Altschuler and Talwar (2018);
Chen et al. (2020); Sherman and Koren (2021); He et al. (2022); Li et al. (2022b); Cheng et al. (2023).
Differentially private online learning with non-stochastic adversaries was pioneered by Dwork
et al. (2010). Several studies have explored OPE problem with DP constraints (Jain et al., 2012;
Guha Thakurta and Smith, 2013; Jain and Thakurta, 2014; Agarwal and Singh, 2017; Asi et al.,
2022b). Specifically, Jain and Thakurta (2014) and Agarwal and Singh (2017) focused on methods
based on follow the regularized leader. Asi et al. (2022b) proposed an algorithm using the shrinking
dartboard method.
Studies addressing differentially private online learning with non-stochastic adversaries in other
contexts have also made significant strides. Kaplan et al. (2023b) focused on online classification
problem with joint DP constraints. Fichtenberger et al. (2023) introduced a constant improvement.
Gonen et al. (2019) and Kaplan et al. (2023a) studied the relationship between private learning and
online learning. Agarwal et al. (2023a) and Agarwal et al. (2023b) studied the DP online convex
optimization problem. Research on differentially private federated online learning with non-stochastic
adversaries is very limited to the best of our knowledge.
Online learning under realizability assumption. Several works have studied online learning in the
realizable setting. Srebro et al. (2010) studied mirror descent for online optimization. Shalev-Shwartz
et al. (2012) studied the weighted majority algorithm. There is also a line of work studying stochastic
convex optimization in the realizable setting (Cotter et al., 2011; Ma et al., 2018; Vaswani et al.,
2019; Liu and Belkin, 2018; Woodworth and Srebro, 2021; Asi et al., 2022a). Additionally, some
researchers have recently focused on the variance or the path length of the best expert (Cesa-Bianchi
et al., 2007; Steinhardt and Liang, 2014; Wei and Luo, 2018; Bubeck et al., 2019). When DP
constraint is considered, Asi et al. (2023) introduced algorithms using the sparse vector technique.
Golowich and Livni (2021) studied DP online classification.
Multi-armed bandits with DP. Research on DP multi-armed bandits has developed along various
lines (Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016; Sajed and Sheffet, 2019; Azize
and Basu, 2022). Further, Hu and Hegde (2022) proposed a Thompson-sampling-based approach,
Tao et al. (2022) explored the heavy-tailed rewards scenario, and Chowdhury and Zhou (2022a)
achieved optimal regret in a distributed setting. Additionally, Ren et al. (2020) and Zheng et al. (2020)
investigated local DP constraints. For linear contextual bandits with DP, significant studies were
conducted by Shariff and Sheffet (2018); Wang et al. (2020, 2022b) and Hanna et al. (2022). Further-
more, Hanna et al. (2022); Chowdhury and Zhou (2022b); Garcelon et al. (2022) and Tenenbaum
et al. (2021) focused on the shuffle model. Li et al. (2023) studied kernelized bandits with distributed
biased feedback. Wu et al. (2023) and Charisopoulos et al. (2023) tackled privacy and robustness
simultaneously.

16
Federated multi-armed bandits. Federated bandits have been studied extensively recently. Shi
and Shen (2021) and Shi et al. (2021) investigated federated stochastic multi-armed bandits without
and with personalization, respectively. Wang et al. (2019) considered the distributed setting. Huang
et al. (2021), Wang et al. (2022a) and Li and Wang (2022) studied federated linear contextual bandits.
Li et al. (2022b) focused on federated X -armed bandits problem. Additionally, Cesa-Bianchi et al.
(2016), Bar-On and Mansour (2019), Yi and Vojnovic (2022) and Yi and Vojnovic (2023) have studied
cooperative multi-armed bandits problem with data exchange among neighbors. M Ghari and Shen
(2022) have explored online model selection where each client learns a kernel-based model, utilizing
the specific characteristics of kernel functions.
When data privacy is explicitly considered, Li et al. (2020b) and Zhu et al. (2021) studied federated
bandits with DP guarantee. Dubey and Pentland (2022) investigated private and byzantine-proof
cooperative decision-making in the bandits setting. Dubey and Pentland (2020) and Zhou and
Chowdhury (2023) considered the linear contextual bandit model with joint DP guarantee. Li et al.
(2022a) studied private distributed linear bandits with partial feedback. Huang et al. (2023) studied
federated linear contextual bandits with user-level DP guarantee.

B Applications of Differentially Private Federated OPE


Differentially private federated online prediction has many important real-world applications. We
provide three examples below.
Personalized Healthcare: Consider a federated online prediction setting where patients’ wearable
devices collect and process health data locally, and the central server aggregates privacy-preserving
updates from devices to provide health recommendations or alerts. DP federated online prediction
can speed up the learning process and improve prediction accuracy without exposing individual users’
health data, thus ensuring patient privacy.
Financial Fraud Detection: DP federated online prediction can also enhance fraud detection systems
across banking and financial services. Each client device (e.g. PC) locally analyzes transaction
patterns and flags potential fraud without revealing sensitive transaction details to the central server.
The server’s role is to collect privacy-preserving updates from these clients to improve the global
fraud detection model. This method ensures that the financial company can dynamically adapt to new
fraudulent tactics, improving detection rates while safeguarding customers’ financial privacy.
Personalized Recommender Systems: Each client (e.g. smartphone) can personalize content
recommendations by analyzing user interactions and preferences locally. The central server (e.g.
company) aggregates privacy-preserving updates from all clients to refine the recommendation model.
Thus, DP federated online prediction improves the whole recommender system performance while
maintaining each client’s privacy.

C Algorithms and Proofs for Fed-DP-OPE-Stoch


C.1 DP-FW

In Fed-DP-OPE-Stoch, we run a DP-FW subroutine (Asi et al., 2021b) at each client i in each
phase p. DP-FW maintains T1 binary trees indexed by 1 ≤ j ≤ T1 , each with depth j, where T1
is a predetermined parameter. An example of the tree structure is shown below. We introduce the
notation s ∈ {0, 1}≤j to denote vertices within binary tree j. ∅ signifies the tree’s root. For any
s, s′ ∈ {0, 1}≤j , if s = s′ 0, then s denotes the left child of s′ . Conversely, when s = s′ 1, s is
attributed as the right child of s′ . For each client i, each vertex s in the binary tree j corresponds to a
parameter xi,j,s and a gradient estimate vi,j,s . Each client iteratively updates parameters and gradient
estimates by visiting the vertices of a binary tree according to the Depth-First Search (DFS) order: as it
visits a left child vertex s, the algorithm maintains the parameter xi,j,s and the gradient estimate vi,j,s
identical to those of its parent vertex s′ , i.e. vi,j,s = vi,j,s′ and xi,j,s = xi,j,s′ . As it proceeds to a right
child vertex, we uniformly select 2−|s| b loss functions from set Bi,p = {li,2p−2 , . . . , li,2p−1 −1 } to
subset Bi,j,s without replacement, where b denoting the predetermined batch size and |s| representing
the depth of the vertex s. Then the algorithm improves the gradient estimate vi,j,s at the current
vertex s of tree j using the estimate vi,j,s′ at the parent vertex s′ , i.e.
vi,j,s = vi,j,s′ + ∇l(xi,j,s ; Bi,j,s ) − ∇l(xi,j,s′ ; Bi,j,s ). (3)

17
Algorithm 3 DP-FW at client i (Asi et al., 2021b)
1: Input: Sample set B, number of trees T1 , batch size b.
2: for j = 1 to T1 do
3: Set xi,j,∅ = xi,j−1,Lj−1
4: Uniformly select b samples to Bi,j,∅
5: vi,j,∅ = ∇li,t (xi,j,∅ ; Bi,j,∅ )
6: for s ∈ DFS[j] do
7: Let s = s′ a where a ∈ {0, 1}
8: if a == 0 then
9: vi,j,s = vi,j,s′ ; xi,j,s = xi,j,s′
10: else
11: Uniformly select 2−|s| b samples to Bi,j,s
12: Update vi,j,s according to Equation (3)
13: end if
14: end for
15: end for
16: Return {vi,j,s }j∈[T1 ],s∈{0,1}≤j

1
P
where ∇l(·; Bi,j,s ) = |Bi,j,s | l∈Bi,j,s ∇l(·), and Bi,j,s is the subset of loss functions at vertex s in
the binary tree j for client i. The full algorithm is shown in Algorithm 3.

0 1

00 01 10 11

Figure 3: Binary tree with depth j = 2 in Algorithm 3.

C.2 Proof of Theorem 1 and Corollary 1 (for pure DP)

Theorem 5 (Restatement of Theorem 1). Assume that loss function li,t (·) is convex, α-Lipschitz, β-
j p−1
 √
bεβ m
smooth w.r.t. ∥ · ∥1 . Setting λi,j,s = 4α2 2 1
bε , b = (p−1)2 and T1 = 2 log α log d , Fed-DP-OPE-Stoch
(i) satisfies ε-DP and (ii) achieves the per-client regret of
r √ √ !
T log d αβ log d T log T
O (α + β) log T + 1√
m m4 ε
 5 q 
with (iii) a communication cost of O m 4 d αTlog εβ
d .

1
Pm
We define population loss as Lt (x) = E[ m
 i=1 li,t (xi,t )]. The per-client regret can be expressed as
PT PT ⋆
E t=1 Lt (xi,t ) − min

x ∈X t=1 Lt (x ) . We use vi,p,k , wi,p,k , xi,p,k and ηi,p,k to denote quantities

1
Pm to phase p, iteration1 k,
corresponding Pand
m
client i. Also we introduce some average quantities
v̄p,k = m v
i=1 i,p,k and w̄ p,k = m w
i=1 i,p,k .
To prove the theorem, we start with Lemma 1 that gives pure privacy guarantees.
4α2j
Lemma 1. Assume that 2T1 ≤ b. Setting λi,j,s = bε , Fed-DP-OPE-Stoch is ε-DP.

Proof. Let S and S ′ be two neighboring datasets and assume that they differ in sample li1 ,t1 or
li′ 1 ,t1 , where 2p1 −1 ≤ t1 < 2p1 . Let Bi1 ,j,s be the set that contains li1 ,t1 or li′ 1 ,t1 . Recall that
|Bi1 ,j,s | = 2−|s| b. The key point is that this set is used in the calculation of vi1 ,p1 ,k for at most 2j−|s|

18
iterates, i.e. the leaves that are descendants of the vertex. Let k0 and k1 be the first and last iterate
such that Bi1 ,j,s is used for the calculation of vi1 ,p1 ,k , hence k1 − k0 + 1 ≤ 2j−|s| .
′ ′
First, we show that (⟨cn , v1,1,1 ⟩, . . . , ⟨cn , vm,P,K ⟩) ≈(ε,0) (⟨cn , v1,1,1 ⟩, . . . , ⟨cn , vm,P,K ⟩) holds for
1 ≤ n ≤ d. For our purpose, it suffices to establish that (⟨cn , vi1 ,p1 ,k0 ⟩, . . . , ⟨cn , vi1 ,p1 ,k1 ⟩) ≈(ε,0)
(⟨cn , vi′1 ,p1 ,k0 ⟩, . . . , ⟨cn , vi′1 ,p1 ,k1 ⟩) holds for 1 ≤ n ≤ d, since li1 ,t1 or li′ 1 ,t1 is only used in client
i1 , at phase p1 and iteration k0 , . . . , k1 . Therefore it is enough to show that ⟨cn , vi1 ,p1 ,k ⟩ ≈( j−|s| ε
,0)
2
⟨cn , vi′1 ,p1 ,k ⟩ holds for k0 ≤ k ≤ k1 and 1 ≤ n ≤ d, because k1 − k0 + 1 ≤ 2j−|s| . Note
j
that |⟨cn , vi1 ,p1 ,k − vi′1 ,p1 ,k ⟩| ≤ 2−|s|

b
. Setting λi,j,s = 4α2 bε and applying standard results of
Laplace mechanism (Dwork et al. (2014)) lead to our intended results. Finally, we can show that
(x1,1,K , . . . , xm,P,K ) ≈(ε,0) (x′1,1,K , . . . , x′m,P,K ) by post-processing.

To help prove the upper bound of the regret, we introduce some lemmas first.
Lemma 2. Let t be the time-step, p be the index of phases, i be the index of the clients, and (j, s) be
a vertex. For every index 1 ≤ k ≤ d of the vectors, we have
O(1)c2 (α2 + β 2 )
 
E [exp {c(v̄p,j,s,k − ∇Lt,k (xi,p,j,s ))}] ≤ exp ,
bm
1
Pm
where v̄p,j,s = m i=1 vi,p,j,s .

Proof. Let us fix p, t, k and i for simplicity and let Aj,s = v̄p,j,s,k − ∇Lt,k (xi,p,j,s ). We prove the
lemma by induction on the depth of the vertex, i.e., |s|. If |s| = 0, then vi,p,j,∅ = ∇l(xi,p,j,∅ ; Bi,p,j,∅ )
where Bi,p,j,∅ is a sample set of size b. Therefore we have
E[exp cAj,s ] = E[exp c(v̄p,j,∅,k − ∇Lt,k (xi,p,j,∅ )]
  
m
1 X X
= E exp c  ∇lk (xi,p,j,∅ ; s) − ∇Lt,k (xi,p,j,∅ )
mb i=1
s∈Bi,p,j,∅
Y Y h c i
= E exp (∇lk (xi,p,j,∅ ; s) − ∇Lt,k (xi,p,j,∅ ))
bm
s∈Bi,p,j,∅ i∈[m]
 2 2
c α
≤ exp ,
2bm
where the last inequality holds because for a random variable X ∈ [−α, α], we have
2 2
E[exp c(X − E[X])] ≤ exp c 2α .

Assume the depth of the vertex |s| ≥ 0 and let s = s′ a where a ∈ {0, 1}. If a = 0, clearly the lemma
holds. If a = 1, recall that vi,p,j,s = vi,p,j,s′ + ∇l(xi,p,j,s ; Bi,p,j,s ) − ∇l(xi,p,j,s′ ; Bi,p,j,s ), then
Aj,s = v̄p,j,s,k − ∇Lt,k (xi,p,j,s )
m m
1 X 1 X
= Aj,s′ + ∇lk (xi,p,j,s ; Bi,p,j,s ) − ∇lk (xi,p,j,s′ ; Bi,p,j,s )
m i=1 m i=1
− ∇Lt,k (xi,p,j,s ) + ∇Lt,k (xi,p,j,s′ )
Let Bi,p,<(j,s) = ∪(j1 ,s1 )<(j,s) Bi,p,j1 ,s1 be the set containing all the samples used up to vertex (j, s)
in phase p at client i. We have
E [exp cAj,s ]
  m m 
1 X 1 X
= E exp c(Aj,s′ + ∇lk (xi,p,j,s ; Bi,p,j,s ) − ∇lk (xi,p,j,s′ ; Bi,p,j,s )
m i=1 m i=1
 
× exp − ∇Lt,k (xi,p,j,s ) + ∇Lt,k (xi,p,j,s′ )
   m m
1 X 1 X
= E E exp c(Aj,s +
′ ∇lk (xi,p,j,s ; Bi,p,j,s ) − ∇lk (xi,p,j,s′ ; Bi,p,j,s )
m i=1 m i=1

19
 
− ∇Lt,k (xi,p,j,s ) + ∇Lt,k (xi,p,j,s′ ) Bi,p,<(j,s)

 
= E E exp (cAj,s′ ) Bi,p,<(j,s)
 m m
1 X 1 X
× E[exp c( ∇lk (xi,p,j,s ; Bi,p,j,s ) − ∇lk (xi,p,j,s′ ; Bi,p,j,s )
m i=1 m i=1
 
− ∇Lt,k (xi,p,j,s ) + ∇Lt,k (xi,p,j,s′ ) Bi,p,<(j,s) .

Since li,t (·; s) is β-smooth w.r.t. ∥ · ∥1 , we have


|∇lk (xi,p,j,s ; Bi,p,j,s ) − ∇lk (xi,p,j,s′ ; Bi,p,j,s )| ≤ β∥xi,p,j,s − xi,p,j,s′ ∥1 .
Since vertex (j, s) is the right son of vertex (j, s′ ), the number of updates between xi,p,j,s and xi,p,j,s′
is at most the number of leafs visited between these two vertices i.e. 2j−|s| . Therefore we have
∥xi,p,j,s − xi,p,j,s′ ∥1 ≤ ηi,p,j,s′ 2j−|s| ≤ 22−|s| .
By using similar arguments to the case |s| = 0, we can get
  m m 
1 X 1 X
E exp c( ∇lk (xi,p,j,s ; Bi,p,j,s ) − ∇lk (xi,p,j,s′ ; Bi,p,j,s )
m i=1 m i=1
  
× exp − ∇Lt,k (xi,p,j,s ) + ∇Lt,k (xi,p,j,s ) Bi,p,<(j,s)

O(1)c2 β 2 2−2|s|
 
≤ exp
m|Bi,p,j,s |
O(1)c2 β 2 2−|s|
 
≤ exp .
bm
Then we get
O(1)c2 β 2 2−|s|
 
E[exp cAj,s ] ≤ E[exp cAj,s′ ] exp .
bm
Applying this inductively, we have for every index 1 ≤ k ≤ d
O(1)c2 (α2 + β 2 )
 
E[exp cAj,s ] ≤ exp .
bm

Lemma 3 upper bounds the variance of the average gradient.


Lemma 3. At phase p, for each vertex (j, s) and 2p−1 ≤ t < 2p − 1, we have
r !
log d
E [∥v̄p,j,s − ∇Lt (xi,p,j,s )∥∞ ] ≤ (α + β)O ,
bm
1
Pm
where v̄p,j,s = m i=1 vi,p,j,s .

2 2
Proof. Lemma 2 implies that v̄p,j,s,k − ∇Lt,k (xi,p,j,s ) is O( α bm

)-sub-Gaussian for every index
1 ≤ k ≤ d of the vectors. Applying standard results of the maximum of d sub-Gaussian random
variables, we get
r !
α2 + β 2 p
E[∥v̄p,j,s − ∇Lt (xi,p,j,s )∥∞ ] ≤ O log d.
bm

20
Lemma 4 gives the tail bound of the sum of i.i.d. random variables following Laplace distribution.
Lemma 4. Let ξi,n be IID random variables following the distribution Lap(λi,j,s ). Then we have
" m
#
X √
E max ξi,n ≤ O( mλi,j,s ln d).
n:1≤n≤d
i=1

Proof. ξi,n ’s are md IID random variables following the distribution Lap(λi,j,s ). We note that
1 1
E [exp(uξi,n )] = , |u| ≤ .
1 − λi,j,s 2 u2 λi,j,s
Since 1
1−λi,j,s 2 u2
≤ 1 + 2λi,j,s 2 u2 ≤ exp(2λi,j,s 2 u2 ), when |u| ≤ 1
2λi,j,s , ξi,n is sub-exponential
2
with parameter (4λi,j,s , 2λi,j,s ). Applying
Pmstandard results of linear combination of sub-exponential
random variables, we can conclude that i=1 ξi,n , denoted as Yn , is sub-exponential with parameter
(4mλi,j,s 2 , 2λi,j,s ). From standard results of tail bounds of sub-exponential random variables, we
have
c2
 
P (|Yn | ≥ c) ≤ 2 exp − 2 , if 0 ≤ c ≤ 2mλi,j,s ,
8mλ
 i,j,s 
c
P(|Yn | ≥ c) ≤ 2 exp − , if c ≥ 2mλi,j,s .
4λi,j,s
 
P
Since P max |Yn | ≥ c ≤ n:1≤n≤d P (|Yn | ≥ c), we have
n:1≤n≤d

c2
 
P( max |Yn | ≥ c) ≤ 2d exp − , if 0 ≤ c ≤ 2mλi,j,s ,
n:1≤n≤d 8mλi,j,s 2
 
c
P( max |Yn | ≥ c) ≤ 2d exp − , if c ≥ 2mλi,j,s .
n:1≤n≤d 4λi,j,s
Then we have

  Z ∞  
E max |Yn | = P max |Yn | ≥ c dc
n:1≤n≤d 0 n:1≤n≤d
√ 2mλi,j,s
Z  
= 8m ln dλi,j,s + √ P max |Yn | ≥ c dc
8m ln dλi,j,s n:1≤n≤d
Z ∞  
+ P max |Yn | ≥ c dc
2mλi,j,s n:1≤n≤d

√ 2mλi,j,s
c2
Z  
≤ 8m ln dλi,j,s + √ 2 exp ln d − dc
8m ln dλi,j,s 8mλi,j,s 2
Z ∞  
c
+ 2 exp ln d − dc
2mλi,j,s 4λ i,j,s
√ √ Z +∞
exp −u2 du

≤ 8m ln dλi,j,s + 8mλi,j,s
−∞
Z ∞
+ 8λi,j,s exp(−u)du
2/m−ln d
√ √
Z +∞
exp −u2 du

≤ 8m ln dλi,j,s +8mλi,j,s
−∞
Z ∞
+ 8λi,j,s ln d + 8λi,j,s exp(−u)du
0
√ √
= 8m ln dλi,j,s + 8mπλi,j,s + 8λi,j,s ln d + 8λi,j,s

= O( mλi,j,s ln d).

21
4α2j
Lemma 5. Setting λi,j,s = bε , we have
   j 
α2 ln d
E[⟨v̄p,k , w̄p,k ⟩] ≤ E min ⟨v̄p,k , w⟩ + O √ .
w∈X bε m
 1 Pm 
Proof. Since w̄p,k = arg mincn :1≤n≤d m i=1 ⟨cn , vi,p,k ⟩ + ξi,n , where ξi,n ∼ Lap(λi,j,s ),
we denote w̄p,k as cn⋆ and we have

⟨w̄p,k , v̄p,k ⟩ = ⟨cn⋆ , v̄p,k ⟩


m
! m
1 X 1 X
= min ⟨cn , v̄p,k ⟩ + ξi,n − ξi,n⋆
n:1≤n≤d m i=1 m i=1
m m
1 X 1 X
≤ min ⟨cn , v̄p,k ⟩ + max ξi,n − min ξi,n
n:1≤n≤d m n:1≤n≤d i=1 m n:1≤n≤d i=1
m
2 X
≤ min ⟨cn , v̄p,k ⟩ + max ξi,n .
n:1≤n≤d m n:1≤n≤d i=1

Applying Lemma 4, we get


   
λi,j,s ln d
E[⟨v̄p,k , w̄p,k ⟩] ≤ E min ⟨v̄p,k , w⟩ + O √ .
w∈X m

With the lemmas above, we are ready to prove Theorem 1.

Proof. Lemma 1 implies the claim about privacy. We proceed to prove the regret upper bound.
(a) ∥xi,p,k+1 − xi,p,k ∥21
Lt (xi,p,k+1 ) ≤ Lt (xi,p,k ) + ⟨∇Lt (xi,p,k ), xi,p,k+1 − xi,p,k ⟩ + β
2
(b) 1
≤ Lt (xi,p,k ) + ηi,p,k ⟨∇Lt (xi,p,k ), w̄p,k − xi,p,k ⟩ + βηi,p,k 2
2
= Lt (xi,p,k ) + ηi,p,k ⟨∇Lt (xi,p,k ), x⋆ − xi,p,k ⟩ + ηi,p,k ⟨∇Lt (xi,p,k ) − v̄p,k , w̄p,k − x⋆ ⟩
1
+ ηi,p,k ⟨v̄p,k , w̄p,k − x⋆ ⟩ + βηi,p,k 2
2
(c)
≤ Lt (xi,p,k ) + ηi,p,k [Lt (x⋆ ) − Lt (xi,p,k )] + ηi,p,k ∥∇Lt (xi,p,k ) − v̄p,k ∥∞ + ηi,p,k (⟨v̄p,k , w̄p,k ⟩
1
− min ⟨v̄p,k , w⟩) + βηi,p,k 2 ,
w∈X 2
where (a) is due to β-smoothness, (b) is because of the updating rule of xi,p,k , and (c) follows from
the convexity of the loss function and Hölder’s inequality.
Subtracting Lt (x⋆ ) from each side and taking expectations, we have
E[Lt (xi,p,k+1 ) − Lt (x⋆ )] ≤ (1 − ηi,p,k )E[Lt (xi,p,k ) − Lt (x⋆ )] + ηi,p,k E[∥∇Lt (xi,p,k ) − v̄p,k ∥∞ ]
1
+ ηi,p,k E[⟨v̄p,k , w̄p,k ⟩ − min ⟨v̄p,k , w⟩] + βηi,p,k 2 .
w∈X 2
Applying Lemma 5 and Lemma 3, we have
r !
⋆ ⋆ log d
E[Lt (xi,p,k+1 ) − Lt (x )] ≤ (1 − ηi,p,k )E[Lt (xi,p,k ) − Lt (x )] + ηi,p,k (α + β)O
bm
ηi,p,k α2j
 
ln d 1
+ O √ + βηi,p,k 2 .
bε m 2

22
q 
ηi,p,k α2j
 
log d ln d
Let αk = ηi,p,k (α + β)O bm + bε O √
m
+ 12 βηi,p,k 2 . We simplify the notion of
ηi,p,k to ηk . Then we have
K
X K−1
Y
E[Lt (xi,p,k ) − Lt (x⋆ )] ≤ αk (1 − ηk′ )
k′ =1 i>k
K
X (k + 1)k
= αk
K(K − 1)
k=1
K
X (k + 1)2
≤ αk ,
(K − 1)2
k=1

2
where ηk = k+1 .

Since K = 2T1 , simply algebra implies that


r !
log d α2T1 ln d β
E[Lt (xi,p,K ) − Lt (x⋆ )] ≤ O (α + β) + √ + T1 . (4)
bm bε m 2

 √ 
2p−1 1 bεβ m
At iteration 2p−1 ≤ t < 2p , setting b = (p−1)2 and T1 = 2 log α log d in Equation (4), we have
r √ !
⋆ log d p αβ log d
E[Lt (xi,p,K ) − Lt (x )] ≤ O (α + β)p + 1√ .
2p m m 4 2p ε

Therefore, the total regret from time step 2p to 2p+1 − 1 is at most


√ √ !
 p+1 
2X −1 2p+1
X−1
r
p log d
2 p αβ log d 2p
E Lt (xi,t ) − min Lt (u) ≤ O (α + β)p + 1√ .
t=2p
u∈X
t=2p
m m4 ε

Summing over p, we can get

log √ √ !
XT
" T T
# r
X X 2p log d p αβ log d 2p
E Lt (xi,t ) − min Lt (u) ≤ O (α + β)p + 1√
t=1
u∈X
t=1 p=1
m m4 ε
log
XT
r √ log T
!
2p log d αβ log d X √ p
≤ O (α + β) p + 1√ p 2
p=1
m m 4 ε p=1
r √ √ !
T log d αβ log d T log T
≤ O (α + β) log T + 1√ .
m m4 ε

Now we turn our focus to communication cost. Since there are log T phases, and within each phase,
there are O(2T1 ) leaf vertices where communication is initiated, the communication
p frequency scales
in O( p 2T1 ). Therefore, the communication cost scales in O(m5/4 d T εβ/(α log d)).
P

Corollary 2 (Restatement of Corollary 1). If β = O( √log d


mT ε
), then, Fed-DP-OPE-Stoch (i) satisfies
ε-DP and (ii) achieves the per-client regret of
r !
T log d log T log d
Reg(T, m) = O + √ ,
m mε

with (iii) a communication cost of O (md log T ).

23
Proof. Similar to the proof of Theorem 1, we can show that
r !
⋆ log d α2T1 ln d β
E[Lt (xi,p,K ) − Lt (x )] ≤ O (α + β) + √ + T1 . (5)
bm bε m 2

At iteration 2p−1 ≤ t < 2p , setting b = 2p−1 and T1 = 1 in Equation (5), we have


r !
⋆ log d α ln d
E[Lt (xi,p,K ) − Lt (x )] ≤ O (α + β) + p √ +β .
2p m 2 ε m

Since β = O( √log d
mT ε
), we have
r !
log d α ln d
E[Lt (xi,p,K ) − Lt (x⋆ )] ≤ O (α + β) + p √ .
2p m 2 ε m

Therefore, the total regret from time step 2p to 2p+1 is at most


 p+1 
−1 2p+1
X−1
2X r !
2p log d log d
E Lt (xi,t ) − min Lt (u) ≤ O +√ .
t=2p
u∈X
t=2p
m mε

Summing over p, we can get

log
XT
" T T
# r !
X X 2p log d α log d
E Lt (xi,t ) − min Lt (u) ≤ O + √
t=1
u∈X
t=1 p=1
m mε
log
XT 2p log d log d log T
r !
≤O + √
p=1
m mε
r !
T log d log d log T
≤O + √ .
m mε

The proof of the DP guarantee and communication costs follows from the proof of Theorem 1.

C.3 Theorem 6 (for approximate DP)

Theorem 6. Let δ ≤ 1/T . Assume that loss function li,t (·) is convex, α-Lipschitz, β-smooth w.r.t.
p−1 1/4 √ T1 /2 p−1 p−1
∥ · ∥1 . Assume ε ≤ (α log(2 (β2/δ) log d)
p−1 1/4
p−1
. Setting λi,j,s = α2 log(2

/δ) 2
, b = (p−1) 2 , and
 √  )
T1 = 23 log α log(2bεp−1mβ
/δ) log d , Fed-DP-OPE-Stoch is (ε, δ)-DP, the communication cost scales in
T1

O md2 log T and the per-client regret is upper bounded by
r !
T log d α2/3 β 1/3 log2/3 (d) log2/3 (1/δ)T 1/3 log4/3 (T )
O (α + β) log T + .
m m1/3 ε2/3

Proof. We apply the following lemma to prove privacy in this setting.


p
Lemma 6 (Asi et al. (2021b), Lemma 4.4). Let b ≥ 2T1 , δ ≤ 1/T and ε ≤ 2−T1 log(1/δ). Setting
α2T1 /2 log(2p−1 /δ)
λi,j,s = bε , we have Fed-DP-OPE is (O(ε), O(δ))-DP.

Theorem 6 then follows using similar arguments to the proof of Theorem 1.

24
C.4 Extension to Central DP setting

In this section, we extend our Fed-DP-OPE-Stoch algorithm to the setting where client-server
communication is secure with some modifications, achieving better utility performance.
We present the algorithm design in Algorithm 4 and Algorithm 5. We change the local privatization
process to a global privatization process on the server side. Specifically, in Line 8, when communi-
cation is triggered, each client sends {⟨cn , vi,j,s ⟩}n∈[d] to the server, where c1 , . . . , cd represents d
vertices of decision set X = ∆d . In Line 5, after receiving {⟨cn , vi,j,s ⟩}n∈[d] from all clients, the
central server privately predicts a new expert:
" m
#
1 X
w̄j,s = arg min ⟨cn , vi,j,s ⟩ + ζn , (6)
cn :1≤n≤d m i=1

4α2j
where ζn ∼ Lap(µj,s ) and µj,s = bmε . Other components remain the same.

Algorithm 4 Fed-DP-OPE-Stoch (CDP): Client i


1: Input: Phases P , trees T1 , decision set X = ∆d with vertices {c1 , . . . , cd }, batch size b.
2: Initialize: Set xi,1 = z ∈ X and pay cost li,1 (xi,1 ).
3: for p = 2 to P do
4: Set Bi,p = {li,2p−2 , . . . , li,2p−1 −1 }
5: Set k = 1 and xi,p,1 = xi,p−1,K
6: {vi,j,s }j∈[T1 ],s∈{0,1}≤j = DP-FW (Bi,p , T1 , b)
7: for all leaf vertices s reached in DP-FW do
8: Communicate to server: {⟨cn , vi,j,s ⟩}n∈[d]
9: Receive from server: w̄j,s
10: Update xi,p,k according to Equation (2)
11: Update k = k + 1
12: end for
13: Final iterate outputs xi,p,K
14: for t = 2p−1 to 2p − 1 do
15: Receive loss li,t : X → R and pay cost li,t (xi,p,K )
16: end for
17: end for

Algorithm 5 Fed-DP-OPE-Stoch (CDP): Central server


1: Input: Phases P , number of clients m, decision set X = ∆d with vertices {c1 , . . . , cd }.
2: Initialize: Pick any z ∈ X and broadcast to clients.
3: for p = 2 to P do
4:
 1 P{⟨c
Receive from clients:
m
n , vi,j,s ⟩}n∈[d] 
5: w̄j,s = arg min m i=1 ⟨cn , vi,j,s ⟩ + ζn
cn :1≤n≤d
6: Communicate to clients: w̄j,s
7: end for

Now we present the theoretical guarantees in this setting.


Theorem 7. Assume that loss function li,t (·) is convex, α-Lipschitz, β-smooth w.r.t. ∥ · ∥1 . Fed-DP-
OPE-Stoch (i) satisfies ε-DP and (ii) achieves the per-client regret of
r √ √ !
T log d αβ log d T log T
O (α + β) log T + √ √ ,
m m ε
 3 q 
with (iii) a communication cost of O m 2 d αTlog εβ
d .

To prove the theorem, we start with Lemma 7 that gives pure privacy guarantees.
4α2j
Lemma 7. Assume that 2T1 ≤ b. Setting µj,s = bmε , Fed-DP-OPE-Stoch is ε-DP.

25
Proof. Let S and S ′ be two neighboring datasets and assume that they differ in sample li1 ,t1 or
li′ 1 ,t1 , where 2p1 −1 ≤ t1 < 2p1 . The algorithm is ε-DP if we have (x1,1,K , . . . , xm,P,K ) ≈(ε,0)
(x′1,1,K , . . . , x′m,P,K ).
Let Bi1 ,j,s be the set that contains li1 ,t1 or li′ 1 ,t1 . Recall that |Bi1 ,j,s | = 2−|s| b. The key point is
that this set is used in the calculation of vi1 ,p1 ,k for at most 2j−|s| iterates, i.e. the leaves that are
descendants of the vertex. Let k0 and k1 be the first and last iterate such that Bi1 ,j,s is used for the
calculation of vi1 ,p1 ,k , hence k1 − k0 + 1 ≤ 2j−|s| . For a sequence ai , . . . , aj , we use the shorthand
ai:j = {ai , . . . , aj }.
Step 1: x1:m,p1 ,k0 :k1 ≈(ε,0) x′1:m,p1 ,k0 :k1 and w̄p1 ,k0 :k1 ≈(ε,0) w̄p′ 1 ,k0 :k1 by basic composition,
post-processing and report noisy max
We will show that (x1,p1 ,k0 , . . . , xm,p1 ,k1 ) and (x′1,p1 ,k0 , . . . , x′m,p1 ,k1 ) are ε-indistinguishable. Since
Bi1 ,j,s is used to calculate vi1 ,p1 ,k for at most 2j−|s| iterates, i.e. k1 − k0 + 1 ≤ 2j−|s| , it is enough

to show that w̄p1 ,k ≈( j−|s|
ε
,0) w̄p1 ,k for k0 ≤ k ≤ k1 and then apply basic composition and post-
2
processing. Note that for every k0 ≤ k ≤ k1 , the sensitivity |⟨cn , vi1 ,p1 ,k − vi′1 ,p1 ,k ⟩| ≤ 2−|s|

b
,
1
Pm 1
Pm ′ 4α
therefore, | m i=1 ⟨cn , vi,p1 ,k ⟩ − m i=1 ⟨cn , vi,p1 ,k ⟩| ≤ 2−|s| mb . Using privacy guarantees of
4α2j
report noisy max (Lemma 15), we have w̄p1 ,k ≈( ε
,0) w̄p′ 1 ,k for k0 ≤ k ≤ k1 with µj,s = bmε .
2j−|s|

Step 2: x1:m,1:P,K ≈(ε,0) x′1:m,1:P,K by post-processing


In order to show (x1,1,K , . . . , xm,P,K ) ≈(ε,0) (x′1,1,K , . . . , x′m,P,K ), we only need to prove that
(x1,p1 ,K , . . . , xm,p1 ,K ) ≈(ε,0) (x′1,p1 ,K , . . . , x′m,p1 ,K ) and apply post-processing. It is enough to
show that iterates (x1,p1 ,1 , . . . , xm,p1 ,K ) and (x′1,p1 ,1 , . . . , x′m,p1 ,K ) is ε-indistinguishable.
The iterates x1:m,p1 ,1:k0 −1 and x′1:m,p1 ,1:k0 −1 do not depend on li1 ,t1 or li′ 1 ,t1 , hence 0-
indistinguishable. Moreover, given that (x1,p1 ,k0 , . . . , xm,p1 ,k1 ) and (x′1,p1 ,k0 , . . . , x′m,p1 ,k1 ) are
ε-indistinguishable, it is clear that (x1,p1 ,k1 +1 , . . . , xm,p1 ,K ) and (x′1,p1 ,k1 +1 , . . . , x′m,p1 ,K ) are ε-
indistinguishable by post-processing.

To help prove the upper bound of the regret, we introduce some lemmas first.
4α2j
Lemma 8. Setting µj,s = mbε , we have
   j 
α2 ln d
E[⟨v̄p,k , w̄p,k ⟩] ≤ E min ⟨v̄p,k , w⟩ + O .
w∈X mbε
 1 Pm 
Proof. Since w̄p,k = arg mincn :1≤n≤d m i=1 ⟨cn , vi,p,k ⟩ + ζn , where ζn ∼ Lap(µj,s ), we de-
note w̄p,k as cn⋆ and we have

⟨w̄p,k , v̄p,k ⟩ = ⟨cn⋆ , v̄p,k ⟩


= min (⟨cn , v̄p,k ⟩ + ζn ) − ζn⋆
n:1≤n≤d
≤ min ⟨cn , v̄p,k ⟩ + 2 max |ζn |.
n:1≤n≤d n:1≤n≤d

Standard
 results for the expectation of maximum of d i.i.d. Laplace random variables imply that
E max |ζn | ≤ O(µj,s ln d). Therefore,
n:1≤n≤d
 
E[⟨v̄p,k , w̄p,k ⟩] ≤ E min ⟨v̄p,k , w⟩ + O (µj,s ln d) .
w∈X

Then we are ready to prove Theorem 7.

26
Proof. Lemma 7 implies the claim about privacy. Following the same arguments in the proof of
Theorem 1, we can get
E[Lt (xi,p,k+1 ) − Lt (x⋆ )] ≤ (1 − ηi,p,k )E[Lt (xi,p,k ) − Lt (x⋆ )] + ηi,p,k E[∥∇Lt (xi,p,k ) − v̄p,k ∥∞ ]
1
+ ηi,p,k E[⟨v̄p,k , w̄p,k ⟩ − min ⟨v̄p,k , w⟩] + βηi,p,k 2 .
w∈X 2
Applying Lemma 8 and Lemma 3, we have
r !
⋆ ⋆ log d
E[Lt (xi,p,k+1 ) − Lt (x )] ≤ (1 − ηi,p,k )E[Lt (xi,p,k ) − Lt (x )] + ηi,p,k (α + β)O
bm
ηi,p,k α2j
 
ln d 1
+ O + βηi,p,k 2 .
bε m 2
q 
log d η α2j
O lnmd + 21 βηi,p,k 2 . We simplify the notion of

Let αk = ηi,p,k (α + β)O bm + i,p,k bε
ηi,p,k to ηk . Then we have
K
X K−1
Y
E[Lt (xi,p,k ) − Lt (x⋆ )] ≤ αk (1 − ηk′ )
k′ =1 i>k
K
X (k + 1)k
= αk
K(K − 1)
k=1
K
X (k + 1)2
≤ αk ,
(K − 1)2
k=1
2
where ηk = k+1 .

Since K = 2T1 , simply algebra implies that


r !
⋆ log d α2T1 ln d β
E[Lt (xi,p,K ) − Lt (x )] ≤ O (α + β) + + T1 . (7)
bm bεm 2
 
2p−1
At iteration 2p−1 ≤ t < 2p , setting b = (p−1)2 log αbεβm
and T1 = 1
log d , we have
2
r √ !
⋆ log d p αβ log d
E[Lt (xi,p,K ) − Lt (x )] ≤ O (α + β)p + √ √ .
2p m m 2p ε
Summing over all the timesteps, we have
" T T
# r √ √ !
X X T log d αβ log d T log T
E Lt (xi,t ) − min Lt (u) ≤ O (α + β) log T + √ .
t=1
u∈X
t=1
m mε

The proof of communication cost is similar to that in proof of Theorem 1.

D Proof of Lower Bounds


Theorem 8 (Restatement of Theorem 2). For any √ federated OPE algorithm against oblivious adver-
saries, the per-client regret is lower bounded by Ω( T log d). Let ε ∈ (0, 1] and δ= o(1/T
 ), for
any
log d
(ε, δ)-DP federated OPE algorithm, the per-client regret is lower bounded by Ω min ε ,T .

Proof. Consider the case when all clients receive the same loss function from the oblivious adversary
′ ′
Pm time step, i.e. li,t = lt . Then we define the average policy among all clients pt (k) =
at each
1
m i=1 pi,t (k), ∀k ∈ [d]. Now the regret is
" m T
# m T
" m X T
# T
1 XX 1 XX 1 X

X
E li,t (xi,t ) − ⋆
li,t (x ) = E l (xi,t ) − lt′ (x⋆ )
m i=1 t=1 m i=1 t=1 m i=1 t=1 t t=1

27
m T d T
1 XXX X
= pi,t (k) · lt′ (k) − lt′ (x⋆ )
m i=1 t=1 t=1
k=1
T d m
! T
XX 1 X X
= pi,t (k) · lt′ (k) − lt′ (x⋆ )
t=1
m i=1 t=1
k=1
T X
X d T
X
= p′t (k) · lt′ (k) − lt′ (x⋆ ).
t=1 k=1 t=1

Note that p′t (k)


is defined by p1,t (k), . . . , pm,t (k), which in turn are determined by l1,1 , . . . , lm,t−1 .
According to our choice of li,t = lt′ , p′t (k) is determined by l1′ , l2′ , . . . , lt−1

. Therefore p′1 , p′2 , . . . , p′t
are generated by a legitimate algorithm for online learning with expert advice problems.
There exists a sequence of losses l1′ , l2′ , . . . , lt′ such that for any algorithm for online learning with
expert advice problem, the expected regret satisfies (Cesa-Bianchi and Lugosi (2006), Theorem 3.7)

T X
X d T
X p
p′t (k) · lt′ (k) − lt′ (x⋆ ) ≥ Ω( T log d).
t=1 k=1 t=1

Therefore, we have
" m T
# m T
1 XX 1 XX p
E li,t (xi,t ) − li,t (x⋆ ) ≥ Ω( T log d).
m i=1 t=1 m i=1 t=1

From Lemma 9, if ε ∈ (0, 1] and δ = o(1/T ), then there exists a sequence of losses l1′ , l2′ , . . . , lt′
such that for any (ε, δ)-DP algorithm for online learning with expert advice problem against oblivious
adversaries, the expected regret satisfies
T X d T   
X
′ ′
X
′ ⋆ log d
pt (k) · lt (k) − lt (x ) ≥ Ω min ,T .
t=1 t=1
ε
k=1

Therefore, we have for any (ε, δ)-DP algorithm,


" m T
# m T   
1 XX 1 XX log d
E li,t (xi,t ) − li,t (x⋆ ) ≥ Ω min ,T .
m i=1 t=1 m i=1 t=1 ε

Lemma 9. Let ε ∈ (0, 1] and δ = o(1/T ). There exists a sequence of losses l1 , . . . , lT such that for
any (ε, δ)-DP algorithm A for DP-OPE against oblivious adversaries satisfies
T T   
X X
⋆ log d
lt (xt ) − min lt (x ) ≥ Ω min ,T .
t=1
x⋆ ∈[d]
t=1
ε

Proof. Let n, d ∈ N. Define y ∈ Y n containing n records, where Y = {0, 1}d . The function
1-Selectd : Y n → [d] corresponds to selecting a coordinate b ∈ [d] in the batch model.
Then we define the regret of 1-Selectd . For a batched algorithm M with input dataset y and output
b ∈ [d], define
" n n
!#
1 X X
Reg1-Selectd (M(y)) = yi (b) − minx⋆ ∈[d] yi (x⋆ ) .
n i=1 i=1

T
Let A be an (ε, δ)-DP algorithm {0, 1}d → ([d])T for DP-OPE against oblivious adversaries
PT PT
with regret t=1 lt (xt ) − minx⋆ ∈[d] t=1 lt (x⋆ ) ≤ α. Setting T = n, we can use A to construct

28
an (ε, δ)-DP algorithm M for 1-Selectd in the batch model. The details of the algorithm appear in
Algorithm 6.

Algorithm 6 Batch algorithm M for 1-Select (Jain et al. (2023), Algorithm 2 with k = 1)
1: Input: y = (y1 , . . . , yn ) ∈ Y n , where Y = {0, 1}d , and black-box access to a DP-OPE
algorithm A for oblivious adversaries.
2: Construct a stream z ← y with n records.
3: for t = 1 to n do
4: Send the record zt to A and get the corresponding output xt .
5: end for
6: Output b = xn .

T
Let ε > 0, α ∈ R+ , and T, d, n ∈ N, where T = n. If a DP-OPE algorithm A: {0, 1}d → ([d])T
PT
for oblivious adversaries is (ε, δ)-DP and the regret is upper bounded by α, i.e.
PT t=1 lt (xt ) −
minx⋆ ∈[d] t=1 lt (x⋆ ) ≤ α, then by Lemma 10, we have the batch algorithm M for 1-Selectd is
(ε, δ)-DP and Reg1-Selectd (M) ≤ α n.
     
If δ = o(1/T ), then n = Ω n log d
(Lemma 11). We have α = min Ω log d
,n =
     εα   ε
log d log d
min Ω ε , T . So α ≥ Ω min ε , T . Therefore, if an algorithm for DP-OPE against
PT PT
oblivious adversaries is (ε,δ)-differentially
 private and t=1 lt (xt ) − minx⋆ ∈[d] t=1 lt (x⋆ ) ≤ α
holds, then α ≥ Ω min logε d , T . This means that there exists a sequence of loss functions
PT PT   
l1 , . . . , lT such that t=1 lt (xt ) − minx⋆ ∈[d] t=1 lt (x⋆ ) ≥ Ω min logε d , T .

Lemma 10. Let M be the batch algorithm for 1-Selectd . For all ε > 0, δ ≥ 0, α ∈ R+ ,
T
and T, d, n ∈ N, where T = n, if a DP-OPE algorithm A: {0, 1}d → ([d])T for
oblivious adversaries is (ε, δ)-differentially private and the regret is upper bounded by α, i.e.
PT PT ⋆
t=1 lt (xt ) − minx⋆ ∈[d] t=1 lt (x ) ≤ α, then batch algorithm M for 1-Selectd is (ε, δ)-
differentially private and Reg1-Selectd (M) ≤ α
n.

Proof. DP guarantee: Fix neighboring datasets y and y′ that are inputs to algorithm M. According
to the algorithm design for 1-Selectd , we stream y and y′ to a DP-OPE algorithm A. Since A is
(ε, δ)-DP, and M only post-processes the outputs received from A, therefore M is (ε, δ)-DP.
Regret upper bound: Fix a dataset y. Note that if α ≥ n, the accuracy guarantee for M is vacuous.
Now assume α < n. Let γ be the regret of M, that is,

γ = Reg1-Selectd (M)
" n n
!#
1 X X
= yi (b) − minx⋆ ∈[d] yi (x⋆ ) ,
n i=1 i=1

where b is the output of M.


The main observation in the regret analysis is that if α is small, so is γ. Specifically, the regret of M
is at most α α
n . Therefore, γ ≤ n .

 1
Lemma 11 (Jain et al. (2023), Lemma 4.2). For all d, n ∈ N, ε ∈ (0, 1], δ = o(1/n), γ ∈ 0, 20 ,
 n
and (ε,δ)-DP 1-Selectd algorithms M: {0, 1}d → [d] with Reg1-Selectd (M) ≤ γ, we have
log d
n=Ω εγ .

29
E Algorithms and Proofs for Fed-SVT
E.1 Proof of Theorem 3

Theorem 9 (Restatement of Theorem 3). Let ε ≤ 1 and δ ≤ ε/d. For any (ε, δ)-DP federated
OPE algorithmagainst
 oblivious adversaries in the realizable setting, the per-client regret is lower
bounded by Ω log(d)
mε .

Proof. I introduce two prototype loss functions first: let l0 (x) = 0 for all x ∈ [d] and for j ∈ [d] let
lj (x) be the function that has lj (x) = 0 for x = j and otherwise lj (x) = 1. Then we define d loss
j j
sequences S j = (l1,1 , . . . , lm,T ), such that
(
j l0 if t = 1, . . . , T − k,
li,t = j
l else;
log d
where k = 2mε and j ∈ [d].
The oblivious adversary picks one of theh d sequences S 1 , . . . , S d uniformly at iran-
Pm PT Pm PT ⋆
dom. Assume towards a contradiction that E i=1 t=1 li,t (xi,t ) − i=1 t=1 li,t (x ) ≤
log(d)
hP. This
32ε implies that there exists d/2 sequences such that the expected regret satisfies
m PT j Pm PT j ⋆ i log(d)
E i=1 t=1 li,t (xi,t ) − i=1 t=1 li,t (x ) ≤ 16ε . Assume without loss of generality these
sequences are S , . . . , S . Let Bj be the set of outputs that has low regret on S j , that is,
1 d/2

m T −k+1
( )
X X log(d)
Bj = (x1,1 , . . . , xm,T ) ∈ [d]mT : ℓj (xi,t ) ≤ .
i=1 t=1

Note that Bj ∩ Bj ′ = ∅ since if x1:m,1:T ∈ Bj then among the mk outputs x1:m,T −k+1:T at least
3mk 3 log(d)
4 = 8ε of them must be equal to j. Now Markov inequality implies that

P A S j ∈ Bj ≥ 1/2.
 

Moreover, group privacy gives


  ′ 
P A S j ∈ Bj ′ ≥ e−mkε P A S j ∈ Bj ′ − mkδ
 

1 log d
≥ √ − δ
2 d 2ε
1
≥ √
4 d
where the last inequality is due to δ ≤ ε/d. Overall we get that
d/2 − 1  1
≤ P A Sj ∈

√ / Bj ≤
4 d 2
which is a contradiction for d ≥ 32.

E.2 Algorithm Design of Fed-SVT

Fed-SVT contains a client-side subroutine (Algorithm 7) and a server-side subroutine (Algorithm 8).

E.3 Proof of Theorem 4

Theorem 10 (Restatement of Theorem 4). Let li,t ∈ [0, 1]d be chosen by an oblivious ad-
versary under near-realizability assumption. Set 0 < ρ < 1/2, κ = O(log(d/ρ)), L =
2 2
mL⋆ + 8 log(2T ε/(N ρ)) + 4/η, and η = ε/2κ. Then the algorithm is ε-DP, the communication cost

30
Algorithm 7 Fed-SVT: Client i
1: Input: Number of Iterations T
2: Initialize: Set current expert x0 = Unif[d].
3: for t = 1 to T do
4: if t == nN for some integer n ≥ 1 then
Pt−1
5: Communicate to server: t′ =t−N li,t′ (x)
6: Receive from server: xt
7: else
8: Set xt = xt−1
9: end if
10: Each client receives local loss li,t : [d] → [0, 1] and pays cost li,t (xt )
11: end for

Algorithm 8 Fed-SVT: Central server


1: Input: Number of Iterations T , number of clients m, optimal loss L⋆ , switching budget κ,
sampling parameter η > 0, threshold parameter L, failure probability ρ, privacy parameters ε
2: Initialize: Set k = 0, τ = 0 and L̂ = L + Lap 4ε
3: while not reaching the time horizon T do
4: if t == nN for some integer n ≥ 1 then
Pt−1
5: Receive from clients: t′ =t−N li,t′ (x)
6: if k < κ then Pm Pt−1
7: Server defines a new query qt = i=1 t′ =τ li,t′ (xt′ )
8: Let γt = Lap 8ε
9: if qt + γt ≤ L̂ then
10: Communicate to clients: xt = xt−1
11: else P 
m Pt−1 ⋆
12: Sample xt with scores st (x) = max i=1 t′ =1 li,t (xt ), mL
′ ′ for x ∈ [d]:
P(xt = x) ∝ e−ηst (x)/2
13: Communicate to clients: xt
Set k = k + 1, τ = t and L̂ = L + Lap 4ε

14:
15: end if
16: else
17: Server broadcasts xt = xt−1 to all the clients
18: end if
19: end if
20: end while

scales in O (mdT /N ), and with probability at least 1 − O(ρ), the pre-client regret is upper bounded
 2 
 2  
log (d)+log NT2 ρ log( d
ρ)
by O mε + (N + L ) log dρ

.
p
Moreover, setting η = ε/ κ log(1/δ), we have the algorithm is (ε, δ)-DP, the communication cost
scales in O (mdT /N ), and with probability at least 1 − O(ρ),!the pre-client regret is upper bounded
3 √  2 
log 2 (d) log( δ1 )+log NT2 ρ log( d
ρ)
 
⋆ d
by O mε + (N + L ) log ρ .

Proof. DP guarantee: There are κ applications of exponential mechanism with privacy parameter
η. Moreover, sparse vector technique is applied over each sample once, hence thep κ applications
of sparse-vector are ε/2-DP. Overall, the algorithm is (ε/2 + κη)-DP and (ε/2 p
+ 2κ log(1/δ)η +
κη(eη − 1), δ)-DP (Lemma 14). Setting η = ε/2κ results in ε-DP and η = ε/ κ log(1/δ) results
in (ε, δ)-DP.

31
Communication cost: The number of communication between the central server and clients scales
in O(mT /N ). Moreover, within each communication, the number of scalars exchanged scales in
O(d). Therefore the communication cost is O(mdT /N ).
Regret upper bound: We define a potential at phase n ∈ [T /N ] :
X
ϕn = e−ηLn (x)/2
x∈[d]
P PnN −1  ⋆
m
where Ln (x) = max li,t′ (x), mL⋆ . Note that ϕ0 = de−ηmL /2 and ϕn ≥
i=1 t′ =1
⋆ Pm PT
e−ηmL /2 for all n ∈ [T /N ] since there is x ∈ [d] such that i=1 t=1 li,t (x) ≤ mL⋆ . We
split the iterates to s = ⌈log d⌉ rounds n0 N, n1 N, . . . , ns N where np is the largest n ∈ [T /N ] such
that ϕnp ≥ ϕ0 /2p . Let Zp be the number of switches in [np N, (np+1 − 1)N ] (number of times the
Ps−1
exponential mechanism is used to pick xt ). Let Z = p=0 Zp be the total number of switches. Note
Ps−1
that Z ≤ 3s + p=0 max (Zp − 3, 0) and Lemma 12 implies max (Zp − 3, 0) is upper bounded
by a geometric random variable with success probability 1/3. Therefore, using concentration of
geometric random variables (Lemma 13), we get that
P (Z ≥ 3s + 24 log(1/ρ)) ≤ ρ.

Since K ≥ 3s + 24 log(1/ρ), the algorithm does not reach the switching budget with probability
1−O(ρ). So the total number of switching scales as O(log(d/ρ)). Now we analyze the regret. Define
T1 N, . . . TC N as the switching time steps with TC N = T , where C = O(log(d/ρ)). Lemma 16
implies that with probability at least 1 − ρ,

T X
X m C
X T
X cN m
X
li,t (xi,t ) = li,t (xi,t )
t=1 i=1 c=1 t=Tc−1 N +1 i=1
 
C
X N −N
TcX m
X T
X cN m
X
=  li,t (xi,t ) + li,Tn (xt )
c=1 t=Tc−1 N +1 i=1 t=Tc N −N +1 i=1
C 
8 log(2T 2 /(N 2 ρ))
X 
≤ L+ + mN
c=1
ϵ
C 
16 log(2T 2 /(N 2 ρ))
X 
= mL⋆ + + 4/η + mN
c=1
ϵ
log(T 2 /(N 2 ρ)) log(d/ρ) 4 log(d/ρ)
 

= O mL log(d/ρ) + + + mN log(d/ρ) .
ε η
(8)

Case 1: Setting η = ε/2κ in Equation (8), we have


m X
T
log2 d + log(T 2 /(N 2 ρ)) log(d/ρ)
X  
li,t (xi,t ) ≤ O mL⋆ log d + + mN log(d/ρ) .
i=1 t=1
ε

p
Case 2: Setting η = ε/ κ log(1/δ) in Equation (8), we have
m X
T
!
log3/2 d log(1/δ) + log(T 2 /(N 2 ρ)) log(d/ρ)
p
X

li,t (xi,t ) ≤ O mL log d + + mN log(d/ρ) .
i=1 t=1
ε

Lemma 12. Fix 0 ≤ p ≤ s − 1. Then for any 1 ≤ k ≤ T /N , it holds that

P (Zp = k + 3) ≤ (2/3)k+2 < (2/3)k−1 (1/3).

32
Proof. Let np N ≤ nN ≤ np+1 N be a time-step when a switch happens (exponential mecha-
nism is used to pick xt ). Note that ϕnp+1 ≥ ϕn /2. We prove that the probability that xt is
switched between nN and np+1 N is at most 2/3. To this end, note that if xt is switched before
Pm Pn N −1 2 2
np+1 N then i=1 t′p+1 =nN li,t′ (xt′ ) ≥ L − 8 log(2T ε/(N ρ)) , therefore Lnp+1 (x) − Ln (x) ≥
8 log(2T 2 /(N 2 ρ))
L− ε ≥ 4/η. Thus we have that
X 
P (xt is switched before np+1 N ) ≤ P (xt = x) 1 Lnp+1 (x) − Ln (x) ≥ 4/η
x∈[d]
X e−ηLn (x)/2 
= · 1 Lnp+1 (x) − Ln (x) ≥ 4/η
ϕn
x∈[d]

X e−ηLn (x)/2 1 − e−η(Lnp+1 (x)−Ln (x))/2


≤ ·
ϕn 1 − e−2
x∈[d]

≤ 4/3 1 − ϕnp+1 /ϕn
≤ 2/3.
−ηa
where the second inequality follows the fact that 1{a ≥ b} ≤ 1−e
1−e−ηb
for a, b, η ≥ 0, and the last
inequality since ϕnp+1 /ϕn ≥ 1/2. This argument shows that after the first switch inside the range
[np N, np+1 N − 1], each additional switch happens with probability at most 2/3. So we have

P (Zp = k + 3) ≤ (2/3)k+2 < (2/3)k−1 (1/3).

Lemma 13 (Asi et al. (2023), LemmaPA.2). Let W1 , . . . , Wn be i.i.d. geometric random variables
n
with success probability p. Let W = i=1 Wi . Then for any k ≥ n

P(W > 2k/p) ≤ exp(−k/4).

F Experimental Supplementary

The simulations were conducted on a system with a 2.3 GHz Dual-Core Intel Core i5, Intel Iris Plus
Graphics 640 with 1536 MB, and 16 GB of 2133 MHz LPDDR3 RAM. Approximately 10 minutes
are required to reproduce the experiments.
We present our numerical results with different seeds in Figure 4 and Figure 5.

G Experiments on MovieLens-1M

We use the MovieLens-1M dataset (Harper and Konstan, 2015) to evaluate the performances of Fed-
SVT, comparing it with the single-player model Sparse-Vector (Asi et al., 2023). We first compute the
rating matrix of 6040 users to 18 movie genres (experts) R = [ru,g] ∈ R6040×18 , and
 then calculate
6040×18 ⋆ 1
P6040
L = [max(0, ru,g⋆ − ru,g )] ∈ R where g = arg maxg 6040 u=1 ru,g . We generate
the sequence of loss functions {lu }u∈[6040] where lu = Lu,: . In our experiments, we set m = 10,
T = 604, ε = 10, δ = 0 and run 10 trials. In Fed-SVT, we experiment with communication intervals
N = 1, 30, 50, where communication cost scales in O(mdT /N ). The per-client cumulative regret
as a function of T is plotted in Figure 6. Our results show that Fed-SVT significantly outperforms
Sparse-Vector with low communication costs (notably in the N = 50 case). These results demonstrate
the effectiveness of our algorithm in real-world applications.

33
Figure 4: Comparison between Fed-DP-OPE-Stoch and Limited Updates with different random
seeds.

Figure 5: Comparison between Fed-SVT and Sparse-Vector with different random seeds.

H Background on Differential Privacy


H.1 Advanced Composition

Lemma 14 (Advanced composition, Dwork et al. (2014)). If A1 , . . . , Akp are randomized algorithms
that each is (ε, δ)-DP, then their composition (A1 (S), . . . , Ak (S)) is ( 2k log(1/δ ′ )ε + kε(eε −
1), δ ′ + kδ)-DP.

H.2 Report Noisy Max

The "Report Noisy Max" mechanism can be used to privately identify the counting query among m
queries with the highest value. This mechanism achieves this by adding Laplace noise independently
generated from Lap(∆/ε) to each count and subsequently determining the index corresponding to

34
Figure 6: Regret performance with MovieLens dataset. Shaded area indicates the standard deviation.

the largest noisy count (we ignore the possibility of a tie), where ∆ is the sensitivity of the queries.
Report noisy max gives us the following guarantee.
Lemma 15 (Dwork et al. (2014), Claim 3.9). The Report Noisy Max algorithm is ε-differentially
private.

H.3 Sparse Vector Technique

We recall the sparse-vector technique here. Given an input S = (z1 , . . . , zn ) ∈ Z n , the algorithm
takes a stream of queries q1 , q2 , . . . , qT in an online manner. We assume that each qi is 1-sensitive,
i.e., |qi (S) − qi (S ′ )| ≤ 1 for neighboring datasets S, S ′ ∈ Z n that differ in a single element. We
have the following guarantee.
Lemma 16 (Dwork et al. (2014), Theorem 3.24). Let S = (z1 , . . . , zn ) ∈ Z n . For a threshold L
and ρ > 0, there is an ε-DP algorithm (AboveThreshold) that halts at time k ∈ [T + 1] such that
for α = 8(log T +log(2/ρ))
ε with probability at least 1 − ρ, we have qi (S) ≤ L + α for all t < k, and
qk (S) ≥ L − α or k = T + 1.

I Broader Impacts
This work improves online learning and decision-making through collaboration among multiple users
without exposing personal information, which helps balance the benefits of big data with the need to
protect individual privacy, promoting ethical data usage and fostering societal trust in an increasingly
data-driven world.

35

You might also like