Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
100 views30 pages

Wireless Channel Estimation with GANs

1) The document proposes using deep generative networks for high dimensional wireless channel estimation to reduce pilot overhead. It trains a generative model to capture channel structure instead of assuming a sparse basis. 2) Conventional techniques like compressed sensing and approximate message passing require solving complex optimizations and many pilots. Deep learning approaches also need large labeled datasets which are time-consuming to collect. 3) The proposed approach optimizes the input to the generative network to maximize correlation between the received signal and estimated channel, while minimizing channel rank. This allows channel estimation from few pilots without knowledge of the sparse basis.

Uploaded by

Jamie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views30 pages

Wireless Channel Estimation with GANs

1) The document proposes using deep generative networks for high dimensional wireless channel estimation to reduce pilot overhead. It trains a generative model to capture channel structure instead of assuming a sparse basis. 2) Conventional techniques like compressed sensing and approximate message passing require solving complex optimizations and many pilots. Deep learning approaches also need large labeled datasets which are time-consuming to collect. 3) The proposed approach optimizes the input to the generative network to maximize correlation between the received signal and estimated channel, while minimizing channel rank. This allows channel estimation from few pilots without knowledge of the sparse basis.

Uploaded by

Jamie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

1

High Dimensional Channel Estimation Using

Deep Generative Networks


Eren Balevi, Akash Doshi, Ajil Jalal, Alexandros Dimakis, Jeffrey G. Andrews
arXiv:2006.13494v1 [eess.SP] 24 Jun 2020

Abstract

This paper presents a novel compressed sensing (CS) approach to high dimensional wireless channel

estimation by optimizing the input to a deep generative network. Channel estimation using generative networks

relies on the assumption that the reconstructed channel lies in the range of a generative model. Channel

reconstruction using generative priors outperforms conventional CS techniques and requires fewer pilots. It also

eliminates the need of a priori knowledge of the sparsifying basis, instead using the structure captured by the

deep generative model as a prior. Using this prior, we also perform channel estimation from one-bit quantized

pilot measurements, and propose a novel optimization objective function that attempts to maximize the

correlation between the received signal and the generator’s channel estimate while minimizing the rank of the

channel estimate. Our approach significantly outperforms sparse signal recovery methods such as Orthogonal

Matching Pursuit (OMP) and Approximate Message Passing (AMP) algorithms such as EM-GM-AMP for

narrowband mmWave channel reconstruction, and its execution time is not noticeably affected by the increase

in the number of received pilot symbols.

Index Terms

The authors are with the University of Texas at Austin, TX, USA. Contact Author Email: [email protected]. This work was

supported in part by Intel. This paper was presented in part at the 21st IEEE Signal Processing Advances in Wireless Communications

Workshop, May 2020 in the Special Session for Machine Learning in Communications [1].
2

MIMO channel estimation, Generative Adversarial Networks (GAN), compressed sensing, one-bit receivers

I. I NTRODUCTION

A. Motivation

To meet the demand for extremely high bit rates and much lower energy consumption per bit, future

wireless systems are trending to bandwidths larger than 1 GHz and carrier frequencies above 100 GHz.

As an example, a future communication system (6G and beyond) may operate at a carrier frequency

approaching 300 GHz with well over 10,000 cross-polarized antenna elements at each transceiver,

and antenna spacings on the order of 1-2 mm [2], [3]. For channel estimation in many antenna

systems, typically the number of pilots is assumed to be larger than the number of transmit antennas,

resulting in significant training overhead which does not scale well to such high dimensional future

communication systems. Using sparsity with a compressed sensing method to alleviate this problem

leads to solving a complex optimization problem at every coherence interval, whose complexity

scales with the number of antennas, will become infeasible. Thus, existing approaches to channel

estimation will not scale to this regime in terms of complexity, power consumption, or pilot overhead,

and fundamentally new methods are needed. The key to simplifying channel estimation in such high

dimensional systems is to exploit stronger prior knowledge of the channel structure. In this paper we

propose a novel unsupervised learning-based approach using deep generative networks for channel

estimation.

B. Related Work

Traditional training-based channel estimators such as least-squares (LS) are optimum Maximum

Likelihood estimators for rich multipath channels. Furthermore, for Gaussian signal recovery with a

known correlation matrix, minimum mean-squared estimators (MMSE) find the signal estimate x
3

that maximizes the a posteriori probability p(x|y) and outperforms LS [4]. However recent channel

measurements conducted for mmWave and THz cellular systems have indicated that, due to clustering

of the paths into small, relatively narrowbeam clusters, high dimensional channels are often very

sparse in their beamspace representation [2] or their spatial covariance matrix is low rank [5]. Among

the first papers to highlight the need for exploiting these sparse structures, that LS and MMSE cannot

exploit, was [6], which exploited channel sparsity in the beamspace representation of a multi-antenna

channel to formulate channel estimation as a CS problem, while [7] also highlighted how to exploit

sparsity in the delay-Doppler domain. MmWave channel estimation is made difficult by the low

received SNR due to high omnidirectional path loss, and to combat this path loss, large antenna arrays

are used to obtain beamforming gain. In [8], a sparse formulation of the mmWave channel estimation

problem was given by expressing the sensing matrix Ψ as a function of the transmit and receive

antenna array response vectors, in addition to the training precoders and combiners. An open loop

strategy for downlink mmWave channel estimation and design of precoders/combiners that minimize

the coherence of Ψ, while incorporating hybrid constraints at the transceiver, was presented in [9],

enabling reconstruction from a small number of measurements. In [8] and [9], Orthogonal Matching

Pursuit (L0 norm minimization) and Basis Pursuit Denoising (L1 norm minimization) were employed

for sparse channel reconstruction. Approximate Message Passing (AMP) is another robust class of

techniques for compressed sensing [10], and variants such as EM-GM-AMP [11] and VAMP [12]

outperform OMP and BPDN for a large class of sensing matrices. AMP has been widely advocated

for MIMO channel estimation in the research community, especially for low resolution receivers

[13], [14]. AMP has even been extended to adaptively learn the clustered structure in the angle-delay

domain in [15].

However, real world channels are never exactly sparse in the DFT-basis, nor do we know the basis

that would yield the most sparse representation. Moreover, all these techniques involve solving a

complex optimization problem at each interval, and require a large number of pilots, especially in low
4

resolution receivers. These are some of the reasons why CS-based methods are still not employed in

conventional WiFi receivers for channel estimation, which typically employ LS channel estimation

with frequency domain smoothing that leverages the coherence bandwidth of the channel [16].

Meanwhile, there has been a rapid advancement in the application of techniques from deep learning

to channel estimation for massive MIMO and mmWave systems. One of the approaches taken was

to perform Joint Channel Estimation and Signal Detection (JCESD) [17] [18], thus performing

channel estimation implicitly. Not recovering the channel estimate prevents precoder and/or combiner

optimization, and these techniques call for extensive signal processing changes at the transceiver.

One obvious way to recover the channel estimate is to train a Neural Network (NN) in a supervised

manner, such that it is trained to take as input the pilot measurements and output the channel matrix.

This approach is taken in many recent papers [19]–[23]. In particular, [19] also appends the LS

channel estimate of the current and previous received pilot signal to the NN’s input to improve its

performance. In [20], a variant of the AMP technique called LDAMP is unfolded into a NN, by

making the parameters of LDAMP learnable. Exploiting the inherent structure in a spatial channel

matrix, making its estimation analogous to image reconstruction, [21] and [22] employ Convolutional

Neural Networks (CNNs) in place of Fully Connected NNs to learn a channel estimator. A novel

refinement called SIP-DNN was proposed in [23], that chose to estimate the channel at all the receive

antennas using only the signal received by the high-resolution ADC antennas.

However building such labeled channel datasets for a supervised task is time-consuming, and

most of these techniques would not perform well if the received signal was corrupted by hardware

impairments and/or transient effects such as shadow fading. A few papers recently have been using

techniques from unsupervised learning to overcome this limitation of having to build a huge labeled

dataset. In [24] and [25], they combine an LS estimator with an underparameterized CNN-based

denoiser called Deep Decoder [26] to exploit correlation in the channel estimate to improve its quality.

In [27], they train an autoencoder to learn a compressed representation for the channel that could
5

immensely reduce channel state information (CSI) feedback overhead in massive MIMO, while [28]

trains a CRNet with the same objective in multi-resolution receivers.

C. Contributions

In summary, most of the proposed Deep Learning techniques are discriminative, meaning that a

priori information is not exploited as opposed to generative models and often call for drastic changes

in transceiver signal processing, while the existing signal processing techniques are designed for

sparse signal recovery. As mentioned before, there is no explicit way to determine the basis that will

generate the sparse channel representation with the least non-zero entries, which would allow perfect

recovery for a wider range of sensing matrices with the same number of measurements. This is where

compressed sensing using generative models proves useful. By finding an approximate solution in the

span of a generative model, [29] shows how to achieve CS guarantees without employing sparsity.

The authors of [29] present a simple gradient descent based algorithm that enables signal recovery for

inherently sparse or structured signals from compressed measurements by exploiting the prior learnt

by a generative model. In this paper, we draw inspiration from the approach presented in [29] to

perform the estimation of high dimensional wireless channels from compressive pilot measurements.

Our contributions are elaborated below:

Training a GAN to learn the channel distribution: The underlying probability distribution of

spatial channel matrices for a particular environment can be very complex, and analytically intractable.

We describe how to train a Wasserstein GAN [30] using a set of simulated channel realizations, such

that it learns a generator model that is capable of drawing samples from the underlying channel

distribution.

Full resolution channel estimation: The trained generative model will output channel realizations

for different input vectors. We describe a procedure to find the optimal input vector such that we can

use the prior of the trained generator to find the channel estimate from a low number of noisy pilot
6

measurements. Moreover, the optimization problem defined by the generative network operates in a

low-dimensional subspace, whose dimensionality is independent of the number of received pilots,

and achieves significant reduction in computational complexity. Simultaneously, our technique also

helps to develop a channel representation that drastically reduces CSI feedback overhead, and learn a

prior that enables it to significantly outperform conventional CS techniques, without knowledge of

the sparsifying basis.

One-bit quantized channel estimation: We design a custom loss function that aims to find

the channel estimate, in the range of the generator’s output, that has low rank while maximizing

correlation with the received one-bit measurements. We compare its performance with state-of-the-art

CS techniques such as EM-GM-AMP [11], and find that it significantly improves the quality of the

channel estimate, while still requiring only a limited number of pilots. We validate the improvement

in the channel estimate by evaluating the throughput for a hybrid precoded data transmission, where

the RF and baseband precoders were designed using OMP [31].

The paper is organized as follows. The system model is outlined in Section II. The generative channel

estimator is explained in detail in Section III, the NN architecture details, simulation benchmarks and

results are outlined in Section IV and V respectively, and the conclusions are highlighted in Section

VI.

II. S YSTEM M ODEL

A. Training based channel estimation

Consider a point-to-point downlink (DL) MIMO setup, where the base station is equipped with Nt

transmit antennas and the User Equipment (UE) is equipped with Nr receive antennas. For simplicity,

the exposition that follows considers only a single narrowband frequency channel but can easily be

extended to multiple (Nf > 1) subcarriers. We consider hybrid beamformers and combiners, and

present a training-based channel estimation approach.


7

In the DL channel estimation phase, the BS uses a training beamformer p ∈ CNt ×1 to transmit a

symbol s ∈ C. To simplify analysis, we set s = 1 in all experiments, but retain it in the equations for

ease of understanding. The UE employs Nr RF chains, hence for each beamforming vector p, Nr

measurements are produced at the UE. We assume that the training combiner qi ∈ CNr ×1 i ∈ [Nr ]

is a 1-sparse vector with 1 at the ith position. As explained in [9], the number of measurements

per time instant at the UE does not depend on the number of RF chains employed at the BS. The

total number of measurements M = Nr Np where Np is number of distinct beamforming vectors p

employed by the BS during training. We denote this sequence as P = [p1 ...pNp ] ∈ CNt ×Np . It is

assumed that the channel coherence time is greater than Np T , where T is the symbol period, hence

the spatial channel matrix H ∈ CNr ×Nt remains constant over the Np time slots. Hence the received

training signal Y ∈ CNr ×Np at the UE can be written as [8]

Y = HPs + N, (1)

where each element of N ∈ CNr ×Np are independent and identically distributed complex Gaussian

random variables with mean 0 and variance σ 2 . To have more compact expressions, the matrices are

defined as vectors by concatenating the columns, yielding

y = HPs + n, (2)

where y, HP, n ∈ CNr Np ×1 . Writing HP as INr HP, and utilizing the expansion ABC = (CT ⊗A)B,

we have

y = (PT ⊗ INr )Hs + n, (3)

where T denotes the transpose operator and ⊗ denotes the Kronecker product. Clearly, the system of

equations represented by (3) does not have a unique solution if Np < Nt . In other words, the LS

channel estimate Ĥ given by

Ĥ = argmin ||y − (PT ⊗ INr )Hs||2 (4)


H∈CNr Nt ×1
8

has multiple solutions. Thus, in the low density pilot regime, one cannot directly use LS channel

estimation. If H is inherently sparse or structured in a known basis, this can be exploited by CS

algorithms, and is explained as part of the baselines for comparison in Section IV-E. The above

notation also extends easily to the case where the received signal is quantized, with (3) being rewritten

as

y = Qn ((PT ⊗ INr )Hs + n), (5)

where Qn denotes the n bit quantization operator.

B. Hybrid Precoding for Data Transmission

In Section II-A, we presented a training-based channel estimation approach, hence the training

beamformers used were random sequences of QPSK symbols (one can also use a random subset of

the columns of the DFT matrix or the Zadoff-Chu sequences). Now we move from the training stage

to the data transmission phase, where to obtain a higher throughput, one performs optimization of

the precoder matrices FRF and FBB at the BS, in which P = FRF FBB . To achieve this, the channel

estimate recovered at the UE is conveyed to the BS, to maximize the information-theoretic capacity,

while incorporating the hardware and power constraints imposed on the entries of FRF and FBB . As

outlined in [31], we utilize spatially sparse precoding via Orthogonal Matching Pursuit to find the

optimum FRF and FBB and evaluate the throughput.

III. G ENERATIVE C HANNEL E STIMATOR

A. Training a GAN to learn the channel distribution

We use Generative Adversarial Networks (GANs) for training a generative model. Despite the

extensive recent application of deep learning to wireless communications, few communication papers

have employed GANs, owing to their perceived training instability [32]. In [33], the authors proposed

the use of variational GANs to accurately learn the channel distribution. However, they restricted
9

themselves to additive noise, and did not consider fading or MIMO. In [34], they employ a conditional

GAN that is trained to output the received signal when the transmitted signal and the received pilot

information is appended to the input of the GAN. However, when extending it to fading channels,

they assumed that the real channel response was available as input to the GAN. Moreover, none of

these papers exploit the compressed representation that the generator of a GAN learns for a given

output signal. We now give an overview of the training procedure for GAN’s in the context of spatial

channel matrix generation.

A GAN [35] consists of two feed-forward neural networks, a generator G(z; θg ) and a discriminator

D(x; θd ) engaging in an iterative two-player minimax game with the value function V (G, D):

min max V (G, D) = Ex∼Pr (x) [hD (D(x; θd ))] + Ez∼Pz (z) [hG (D(G(z; θg ); θd ))], (6)
G D

where G(z) represents a mapping from the input noise variable z ∼ Pz (z) to the data space

x ∼ Pr (x), while D(x) represents the probability that x came from the data rather than G. The

exact form of h(.) depends on the choice of loss function. In [35], hD (D(x)) = log D(x) whereas

hG (D(G(z))) = log (1 − D(G(z))). On the other hand, in the Wasserstein GAN proposed in [30],

hD (D(x)) = D(x) and hG (D(G(z))) = −D(G(z)). Given z ∈ Rd and G(z) ∈ Rn , typically

z ∼ N (0, Id ) and d  n. For example, when a GAN is trained on an image dataset, d can be 100,

while n = 64 × 64 × 3 = 12288 (where 64 represents the image height and width in pixels and

3 represents the RGB color triplet). G is said to implicitly learn the distribution Pg (stored in its

weights θg ), which on convergence, should approach Pr .

Since the seminal paper [35], numerous variants of GAN have been published, differing in the

architecture and/or training procedure of G and D or the loss function used for penalizing the output

of D [36], [30]. However, GANs are known to be difficult to train, one of the reasons being that they

are subject to mode collapse. That is, they learn to characterize only a few modes of the distribution

[32]. The objective of training a GAN is that by varying the weights θg and θd of G(z; θg ) and
10

D(z; θd ), we want Pg → Pr . In [30], the Wasserstein-1 (EM) distance is shown to be much weaker

than KL (Kullback-Leibler) or JS (Jensen-Shannon) divergences1 , such that simple sequences of

probability distributions converge under EM but not KL or JS. Using the continuous and differentiable

EM distance as the loss function for the output of D during training with weights clipping eliminates

careful balance in training of D and G, and design of NN architecture. It also drastically reduces

mode collapse since we can train D to optimality. Hence in this paper, we employ the Wasserstein

GAN [30] for learning the spatial channel distribution. An outline of the procedure for training a

Wasserstein GAN in the context of spatial channel matrix generation is given in Alg. 1 (adapted from

[30]) and depicted in Fig. 1.

Algorithm 1: Minibatch stochastic gradient descent training of Wasserstein GANs for spatial
channel matrix generation with nd = 5 and c = 0.01
D should output 1 for a true channel realization x ∼ Pr (x) and 0 for a generated fake
channel realization G(z) ∼ Pg when z is sampled from Pz
for number of training iterations do
for nd iterations do
• Sample minibatch of m noise samples {z1 , ..., zm } ∼ Pz . Update D by
ascending its stochastic gradient
m
1 X
∇θd −D(G(zi )))
m i=1
• Sample minibatch of m channel realizations {x1 , ..., xm } ∼ Pr . Update D by
ascending its stochastic gradient
m
1 X
∇ θd D(xi )
m i=1
• θd = clip(θd , −c, c)
end
Sample minibatch of m noise samples {z1 , ..., zm } ∼ Pz . Update G by
descending its stochastic gradient
m
1 X
∇θg −D(G(zi )))
m i=1
end

1
A set of probability distributions Pn is said to converge to P∞ under a distance metric ρ if ρ(Pn , P∞ ) → 0 as n → ∞. By “weaker”,
1
we mean that the set of convergent sequences under EM is a superset of the sequences convergent under KL or JS The original

paper [30] refers to the discriminator as critic and uses ncritic = 5 which we refer to as nd
11

Fig. 1. Training a GAN for spatial channel matrices

B. CS-based channel estimation using generative networks

Consider a noisy compressive measurement y of an image x∗ such that y = Ax∗ + n. A simple

gradient descent based approach for compressed sensing using generative networks was proposed

in [29] to find the low dimensional representation z ∗ of the given input image x∗ such that the

reconstructed image G(z ∗ ) has small measurement error ||y − AG(z)||22 . While this is a non-convex

objective to optimize (since G(z) is a non-convex function of z), gradient descent was found

empirically to work well. To reconstruct the image, [29] solves the following optimization problem:

z ∗ = arg min f (y, AG(z)), (7)


z

where y is the vector of received samples, G is a generative model, A is a measurement matrix, and

f is a loss function. For example, we could have f (y, AG(z)) = ||y − AG(z)||22 . Here, we minimize

the loss function over the input variable to the generator z. The reconstructed image is then G(z ∗ ).

As long as gradient descent finds a good approximate solution to (7), [29] gives a theoretical proof

to show that G(z ∗ ) will be almost as close to the true x∗ as the closest possible point in the range

of G, when the entries of the sensing matrix A are sub-Gaussian2 .


2
A random variable X ∈ R is said to be sub-Gaussian with variance-proxy σ 2 if E[X] = 0 and its moment generating function

satisfies E[exp(sX)] ≤ exp(σ 2 s2 /2) for all s ∈ R


12

To adapt the framework presented in [29] for channel estimation, we first train a Wasserstein GAN

[30] using a set of realistic channel realizations H (details of channel parameters presented in Section

IV) as defined in (1). We then extract the trained generator G. The trained generator, having implicitly

learned the underlying probability distribution of the channel matrices, will output channel realizations

G(z) for a given L2 bounded input vector z. In the testing phase, we will be given the noisy pilot

measurements y as defined in (3). We consider two possible cases: when the measurements are full

resolution and when they are one-bit quantized. For each case, we have heuristically developed loss

functions, that define the optimization problem to be solved at every coherence interval using gradient

descent. An illustration of the framework is shown in Fig 2 and the approach is summarized in Alg.

2. We refer to this framework as the Generative Channel Estimator (GCE).

Fig. 2. Generative Channel Estimator Framework

Full Resolution Channel Estimation: Replacing the sensing matrix A by PT ⊗ INr as derived in

(3), and imposing an L2 bound on z via regularization, we attempt to solve the following non-convex

optimization problem:

z ∗ = arg min ||y − (PT ⊗ INr )G(z)s||22 + λreg ||z||22 , (8)


z∈Rd

where d is the dimension of the input vector to the GAN and λreg serves as a regularization parameter.

The reconstructed channel estimate is then simply G(z ∗ ). Note that the entries in the training

precoder P were chosen i.i.d. from QPSK symbols. As a consequence, all the entries of the matrix
13

A = PT ⊗ INr are bounded (being either 0 or QPSK symbols) with mean 0, and from Hoeffding’s
3
Lemma applied separately to the real and imaginary parts, it follows that each entry of A will be

sub-Gaussian.

Quantized Channel Estimation: We now consider the case where the received signal is 1-bit

quantized. As a result, MIMO channel estimation even in the noiseless setting with sufficient pilot

symbols is an under-determined problem. In [37], they exploit the low-rank nature of mmWave

channels (due to clustering in the propagation environment) to constrain the space of channel estimates

to matrices H with low nuclear norm ||H||∗ (a relaxation of the low-rank constraint). In [38], the

authors solve the same optimization problem as (7) with the measurements y being one-bit quantized,

and under certain assumptions on the measurement matrix (A in (7)) and the architecture of the GAN,

design a custom loss function to solve for z ∗ . We draw inspiration from the approach taken in [37]

and [38] to design the following non-convex optimization problem for recovery in one-bit setting:
Np Nr
X

z = arg min − λreg Q1 (yi )h(PT ⊗ INr )i , G(z)is + ||G(z)||∗ . (9)
z∈Rd i=1

This heuristically designed loss function attempts to minimize the nuclear norm ||G(z)||∗ of the

output of the generator G(z) while maximizing the correlation between Q1 (y) (which is ±1) and

h(PT ⊗INr ), G(z)i. The summation in (9) should be interpreted as the sum over the real and imaginary

parts, separately,
Np Nr Np Nr
X X
T
Q1 (yi,real )h(P ⊗ INr )i,real , G(z)real is + Q1 (yi,imag )h(PT ⊗ INr )i,imag , G(z)imag is (10)
i=1 i=1

C. Beamforming using the Generative Channel Estimator

Having recovered the channel estimate G(z ∗ ) from the compressed pilot measurements at the UE,

we now use this channel estimate to design the optimum RF and baseband precoder FRF and FBB .

The optimal latent input vector z ∗ of the generator G provides a compressed representation of the
3
Hoeffding’s Lemma states that for any random variable X with E[X] = 0 such that a ≤ X ≤ b w.p. 1, for all s ∈ R,

E[exp(sX)] ≤ exp(s2 (b − a)2 /8). Hence X is sub-Gaussian with variance proxy (b − a)2 /4.
14

Algorithm 2: Channel Estimation using Deep Generative Networks


1. Train a GAN using a set of realistic channel realizations.
2. Extract the trained generator G(z).
3. Given the noisy pilot measurements y, reconstruct the channel y encodes by solving
the following optimization problem using gradient descent:
• For full resolution pilot measurements:
z ∗ = arg min ||y − (PT ⊗ INr )G(z)s||22 + λreg ||z||22 ,
z∈Rd
• For quantized pilot measurements:
Np Nr
X

z = arg min − λreg Q1 (yi )h(PT ⊗ INr )i , G(z)is + ||G(z)||∗ .
z∈Rd i=1
The initial point z0 for gradient descent is sampled from Pz .
4. The reconstructed channel estimate is then G(z ∗ ), which is of dimensions Nr × Nt

channel. If we could convey the weights and architecture of the generator from the UE to the BS

during the initial access phase, then in subsequent data transmissions, the CSI overhead would be

considerably reduced. At every coherence time, we would simply feedback z ∗ to the BS and use

G(z ∗ ) as the channel estimate to design the precoder matrices FRF and FBB .

IV. S IMULATION D ETAILS & B ENCHMARKS

The performance metric is normalized mean square error (NMSE), defined as


" #
2
||H − Ĥ||2
NMSE = E , (11)
||H||22
where H and Ĥ are column vectors that specify the actual and estimated channel taps in the frequency

domain over all antennas, respectively.

A. Data Generation

Channel realizations have been generated using the 5G Toolbox in MATLAB in accordance with

the 3GPP specifications TR 38.9014 . The channel simulation parameters are listed in Table I. In

order to generate structure in the channel realizations, some degree of correlation is required between

neighbouring antennas at the BS and the UE. To generate this correlation, the antenna element spacing
4
https://www.etsi.org/deliver/etsi tr/138900 138999/138901/14.00.00 60/tr 138901v140000p.pdf
15

in the uniform linear arrays (ULA) at the BS and UE were assumed to be λ/10. This reduced antenna

spacing is a crucial assumption, and we will justify its requirement in Section V-D. Each channel

realization generated in MATLAB was of dimension (Nf , 12, Nr , Nt ), the first and second dimension

being the number of subcarriers and number of OFDM symbols respectively. To focus on exploitation

of the spatial structure of the channel matrices, we simply extract the (Nr , Nt ) matrix corresponding

to the first subcarrier and first OFDM symbol for the purpose of these simulations.

Delay Profile CDL-D


Subcarrier Spacing 15 kHz
Nt 64
Nr 16
Antenna Array Type ULA
Antenna Spacing λ/10
Sampling Rate 15.36 MHz
Carrier Frequency 40 GHz
Delay Spread 30 ns
Doppler Shift 5 Hz
Nf 14
TABLE I: Simulation Parameters

B. Data Pre-processing

Note that G(z) has dimensions (Nt , Nr , 2), where the last entry corresponds to the real and

imaginary part. Thus, in the training dataset, H has to be split up into its real and imaginary part and

concatenated to obtain HG ∈ RNr ×Nt ×2 , while G(z) has to be reshaped as a complex-valued matrix

before being utilized for optimization in (8) or (9). Before using the data for training the GAN, we

normalize the channel matrices HG ∈ RNr ×Nt ×2 element-wise as

µi = E[HGi ] σi2 = E[(HGi − µi )2 ] (12)


HGi − µi
HGi,norm = , (13)
σi

where i ∈ [2Nt Nr ] and subscript i is used to denote the ith element in the array. While testing, we

do not have access to the element-wise mean and variance, hence we continue to use the training
16

mean and variance. This implies that G(z) in (8) is replaced by

G(z)i ← µi + σi G(z)i . (14)

We performed a simulation to ascertain the impact of this artifact, and found it was negligible. The

need for normalization arises from empirical evidence that the GAN is unable to learn mean-shifted

distributions [32].

C. NN Architecture of Generator

The GAN was implemented in Keras5 and PyTorch6 , with the basic implementation given online7 .

The generator and discriminator employed in the Wasserstein GAN were Deep Convolutional NNs.

While the discriminator architecture was adopted from [30], the generator was fine-tuned to improve

its ability to learn the underlying probability distribution and its architecture is described next.

The generator G takes an input z ∈ Rd , passes it through a dense layer with output size 128Nt Nr /16,

and reshapes it to an output size of (Nt /4, Nr /4, 128). This latent representation is then passed through

k = 2 layers, each consisting of the following units: upsampling, 2D Convolution with a kernel

size of 4 and Batch Normalization. At each stage, 2 × 2 upsampling is performed, i.e. the input is

reshaped from (Nt /n, Nr /n, 128) to (2Nt /n, 2Nr /n, 128) by replicating the corresponding values.

The performance of the generator is sensitive to this choice of sampling factor, with oversampling

of 4 and above preventing the generator from learning the channel distribution. Similarly, a kernel

size of 4 corresponds to using a 4 × 4 filter in the first two dimensions to replace each value by

a weighted average of the neighboring values that are within a 4 × 4 square surrounding it. Both

upsampling and 2D convolution thus model the local correlations in a spatial channel matrix, with

larger upsampling and size of kernel filter corresponding to a greater estimated spatial correlation. It

is finally passed through a 2D Convolutional layer with a kernel size of 4 and linear activation to

obtain G(z), the Nr × Nt channel estimate.


5 6 7
https://github.com/fchollet/keras https://github.com/pytorch/pytorch https://github.com/eriklindernoren/Keras-GAN
17

D. GAN Training Details & GCE

The training and test parameters for the Wasserstein GAN are specified in Table II. The generator

thus obtained is utilized in the GCE, to find the optimal z ∗ for each channel realization in the test

dataset. To minimize the loss function in (8) or (9), as the case may be, we use two approaches.

A derivative-free optimization procedure known as Powell’s conjugate direction method [39], with

a relative error tolerance of  = 10−5 was employed in minimizing (8) and (9) for the generative

model trained in Keras, since a trained Keras model does not provide for differentiation of the loss

function in (8) with respect to the input vector z. However, as explained in [40], PyTorch allows

automatic differentiation and hence an Adam [41] optimizer with a learning rate of η = 10−2 and

iteration count of 100 is utilized in minimizing (8), for the generative model trained in PyTorch.

Training dataset size 3654


Test dataset size 12
Optimizer RMSProp8
Learning Rate 0.00005
Batch size 200
Epochs 3000
λreg 0.001
TABLE II: GAN Training Parameters

E. Compressed Sensing Based DL Channel Estimation

In this subsection, we describe the baselines used for assessing the performance of GCE. Since

we consider the narrow-band clustered channel model, we can use the virtual channel model [42] to

obtain a sparse representation of the channel matrix in the DFT basis. More specifically, assuming

uniform spaced linear arrays at the transmitter and receiver, the array response matrices are given by

the unitary DFT matrices AT ∈ CNt ×Nt and AR ∈ CNr ×Nr . Then we can represent H in terms of a
8
https://www.cs.toronto.edu/ tijmen/csc321/slides/lecture slides lec6.pdf
18

K-sparse matrix Hv ∈ CNr ×Nt


H = AR H v AH
T
(15)
H= ((AH
T)
T
⊗ AR )Hv .

Therefore, the received signal at the UE y is given by

y = ((AH T
T P ) ⊗ AR )Hv s + n. (16)

Denoting by Asp = ((AH T


T P ) ⊗ AR ), as explained in [9], the reconstruction of the channel can be

formulated as a non-convex combinatorial problem

minimize ||Hv ||0 subject to ||y − Asp Hv s||2 ≤ σ. (17)


Hv ∈CNr Nt

A variety of Matching Pursuit (MP) and Approximate Message Passing (AMP) algorithms have been

proposed to solve (17). In particular, we consider three approaches:

i) Orthogonal Matching Pursuit (OMP): We directly solve (17) using OMP, as described in [9].

The stopping criterion for OMP is based on the power of the residual error. We stop when the energy

in the residual is smaller than a given threshold 9 , which is chosen to be σ 2 .

ii) Lasso Baseline: Consider the L1 convex relaxation of (17), and use Basis Pursuit Denoising to

solve this problem. In its Lagrangian form, it can be written as:

minimize ||Hv ||1 + λsp ||y − Asp Hv s||2 . (18)


Hv ∈CNr Nt

However all the norms and matrices involved are complex valued. Hence, an L1 norm minimization

problem gets converted into a second order conic programming (SOCP) problem [43], and can be

solved by standard convex solvers such as CVXPY [44].

iii) EM-GM-AMP: Approximate Message Passing algorithms such as EM-GM-AMP [11] are well-

established Bayesian techniques for sparse signal recovery from noisy compressive linear measurements

that are known to hold for a large class of sensing matrices. Using the EM-GM-AMP implementation

described in [11], we input y and Asp and recover the channel estimate Hv , which is then used to
9
The maximum number of iterations are set to be 100. If OMP is allowed to run further, it fits to the noise at low SNR and the

NMSE increases.
19

recover H using the array response matrices AT and AR . It is to be noted that assuming an antenna

spacing of λ/10, with the columns of AT as well as AR being independent, leads to the entries of Hv

being correlated. This correlation is however not exploited by EM-GM-AMP. Improved benchmarking

comparisons with algorithms such as EMturboGAMP [45] that attempt to exploit structured sparsity

in non i.i.d signals is left for future work.

These are the three sparse signal recovery baselines - that each require knowledge of the sparsifying

basis - we use to assess the performance of the proposed GCE. It should be noted that the beamspace

sparsity that is exploited by the CS algorithms is in no way utilized by GCE.

V. R ESULTS

The first experiment performed is to determine the optimal latent dimension d of the input z to the

generator. Ideally, CS techniques would determine d in the absence of noise, hence we fix the SNR

at a high value of 40 dB, and evaluate the NMSE as a function of the number of pilot measurements

Np for varying values of d as shown in Fig 3 using full resolution measurements.

−11 CS-GAN d = 25
CS-GAN d = 35
−12 CS-GAN d = 45

−13
NMSE(dB)

−14

−15

−16

−17

0.2 0.4 0.6 0.8 1.0


α = Np/Nt

Fig. 3. NMSE vs. α = Np /Nt for varying dimension d of the input z to the generator G
20

From Fig 3, we can see that d = 35 appears sufficient with Np /Nt = 0.4. Increasing the number

of pilot measurements Np /Nt beyond 0.4 does not have any measurable impact on the NMSE. This

indicates that any more measurements would not improve the accuracy of the channel prediction. More

importantly, it highlights that there exists a compressed representation for the channel in an unknown

basis, but using the optimal latent input vector z ∗ defined in (8), we can recover the channel prediction

perfectly without knowing, for example, that the channel is sparse in the DFT basis. We obtain a

nearly 50x compressed representation of the channel, with under 40 parameters needed to represent a

16 × 64 channel matrix realization( = 2048 real values). While current mmWave channel estimation

techniques focus on the optimal design of training precoders and combiners under the assumption of

either virtual channel models [42], UIU models [46] among others, the GCE minimizes the need

for their optimal design and provides a model-free approach for representing inherently sparse or

structured channels. This may prove valuable for future deployments at progressively higher carrier

frequencies, where these models may not hold. With d = 35, we now vary the SNR, and observe the

NMSE vs. SNR for varying α = Np /Nt in the case of full-resolution and one-bit quantized pilot

measurements. The OMP, Lasso and EM-GM-AMP baselines are also plotted.

A. Full Resolution Channel Estimation

As shown in Fig 4, the GCE offers large improvement in NMSE, of at least 5 dB at an SNR of

-10 dB and up to 8 dB at an SNR of 15 dB for α = 0.2 over the EM-GM-AMP baseline. The GCE’s

performance also does not change significantly as α increases from 0.4 to 0.75, indicating that the

prior learnt by the generator G is informative enough to require only 40% of the total number of

pilots that would have been needed by a well-posed channel estimation problem to reconstruct the

channel. Moreover, the improvement in NMSE offered by the GCE decreases as α increases from

0.2 to 1, with the gap between EM-GM-AMP and GCE being reduced to 2 dB at an SNR of 15 dB

and α = 0.1. However, at low and medium SNR, the GCE continues to outperform all CS based
21

methods significantly.

GCE = 0.2 10 GCE = 0.4


10 Lasso = 0.2 Lasso = 0.4
OMP = 0.2 5 OMP = 0.4
5 EM-GM-AMP = 0.2 EM-GM-AMP = 0.4
0
NMSE(dB)

NMSE(dB)
0
5
5
10
10
15
15
10 5 0 5 10 15 10 5 0 5 10 15
SNR(dB) SNR(dB)

10
15 GCE = 0.75 GCE = 1
Lasso = 0.75 OMP = 1
10 OMP = 0.75 5 EM-GM-AMP = 1
EM-GM-AMP = 0.75
5 0
NMSE(dB)

NMSE(dB)
0
5
5
10
10
15 15

10 5 0 5 10 15 10 5 0 5 10 15
SNR(dB) SNR(dB)

Fig. 4. NMSE vs. SNR for various values of α = Np /Nt . The α values are [0.2, 0.4, 0.75, 1]. The Lasso curve is omitted for α = 1
since CVXPY [44] takes too long to converge due to the large number of optimization variables.

B. One-bit Quantized Channel Estimation

The NMSE for the case of 1-bit quantized pilot measurements is defined slightly differently, since

in one-bit measurements, we cannot determine the relative scaling factor for the reconstructed channel

matrices.
" #
||H − κĤ||22
NMSE = E , (19)
||H||22

where κ = argmin||H − κĤ||22 for a given H and Ĥ. Note that though this may seem genie-aided,

precoder optimization that finally determines the achievable rate is not affected by this scaling factor.

The dependence of NMSE on SNR for one-bit measurements with varying number of pilots is shown
22

in Fig. 5, and contrasted with the performance of EM-GM-AMP on the same measurements. As one

can clearly see, the GCE brings about an immense improvement in NMSE, and this can be attributed

to the rich prior stored in the weights of the generator.

Fig. 5. NMSE v/s SNR as a function of α = Np /Nt with one-bit quantization.

C. Hybrid Precoding for Quantized Channel Estimation

To validate the improvement in channel estimate quality postulated in Section V-B, we calculate

the spectral efficiency obtained using hybrid precoding in the data transmission phase. We assume

Ns = min(Nt , Nr ) = 16 and optimal unconstrained combiners are employed at the UE. The RF

and baseband precoders FRF and FBB are computed as explained in Section II using OMP. Three

different channel estimates are used for designing these precoders: the estimate returned by the GCE,

the AMP algorithm EM-GM-AMP [11] and the ground truth channel realization (for computing the

perfect CSI curve). The spectral efficiency vs. SNR plots are shown in Fig. 6 for varying α. As is

evident, the GCE channel estimate enables design of precoders that support higher capacity data

transmissions than EM-GM-AMP.


23

9
Perfect CSI
8 GCE = 0.75
7 GCE = 0.4

Spectral Efficiency(b/s/Hz)
GCE = 0.2
6 EM-GM-AMP = 0.75
EM-GM-AMP = 0.4
5 EM-GM-AMP = 0.2
4
3
2
1
0
10 5 0 5 10 15
SNR(in dB)

Fig. 6. Spectral Efficiency v/s SNR as a function of α = Np /Nt with one-bit quantization using OMP-based precoding

D. Explanations & Caveats

The benefit obtained from the GCE is clear in the low pilot density and low SNR regime. As the

number of pilot symbols increases, the performance of standard CS-based methods gets closer to the

GCE, and would be similar to that of the GCE for Np ≥ Nt . At low SNR, the pilot measurements

received are of very poor quality, hence CS-based methods do not perform well, but the GCE utilizes

its prior to obtain performance that cannot be achieved by the CS-based methods. This is clearly

evident in the one-bit quantized case (Fig. 5), where the GCE curves are roughly parallel to the

EM-GM-AMP curves with the constant gap being the generative prior gain. It can be expected that

as the number of antennas packed onto a planar array increases with the move toward THz carrier

frequencies, sending an adequate number of pilots would lead to an unsustainable overhead, and

recovering the channel estimate from an insufficient number of pilots will become critical. While the

GCE outperforms the three CS-based methods, it is important to note the following caveats:

High Spatial Correlation: GCE required a reduced antenna spacing of λ/10, rather than λ/2, to

successfully learn the channel distribution. As shown in Fig. 7(a), the singular value profile of a
24

λ/2 channel realization has a higher effective rank than a λ/10 realization, due to its lower spatial

correlation. As a consequence, the generator of a GAN trained on λ/2 channel realizations was unable

to learn the underlying probability distribution and the resulting performance of GCE was poor as

shown on the right in Fig. 7(b) for α = 0.75. Since a GAN was originally designed to learn the prior

for image datasets, which have extremely high spatial correlation, the GCE was also found to work

in a similar domain. However, as thousands of antennas get deployed at a transmitter or receiver

due to their tiny size, it is expected that such singular value profiles will become more commonly

observed and only the maximum eigenvector will be needed to acheive capacity in this regime.

A recent paper [47] shows how metamaterial antennas can be used for wireless communications,

including LTE and WiFi. Conventional antennas that are very small compared to the wavelength

reflect most of the signal back to the source. However, a metamaterial antenna steps-up the antenna’s

radiated power and behaves as if it were much larger than its actual size, because its novel structure

stores and re-radiates energy, which could lead to the deployment of sub-wavelength antennas.

ULA Spacing = /2
101 ULA Spacing = /10 0.0
Magnitude of Singular Values

100 2.5
5.0
10 1
NMSE(dB)

GAN = 0.75 ULA Spacing = /2


7.5 GAN = 0.75 ULA Spacing = /10
10 2
10.0
10 3 12.5
15.0
10 4
17.5
1 2 3 4 5 6 7 8 9 10 5 0 5 10 15
Singular Value Index SNR(dB)

(a) Singular values of channel realizations in descending order (b) NMSE v/s SNR for the two datasets of channel realizations
of magnitude with antenna spacing λ/2 and λ/10.

Fig. 7. The left figure shows the singular value profile of a channel realization with an antenna spacing of λ/10 and λ/2. The higher
correlation in the λ/10 realization enables the generator to learn a rich prior and the GCE to obtain a significantly lower NMSE as
shown in the figure on the right.

Rich Generative Prior: The weights θg of the generator G(z; θg ) encode a probability distribution

over the space of permissible spatial channel matrices, such that by inputting z, we can draw samples

from that distribution. Conventional CS techniques have no such prior knowledge of the distribution
25

of the channel matrices, however they capitalize on the sparsity of the beamspace representation of

the channel, which the GCE does not utilize. The results seem to indicate that the generative prior

is much more informative than the sparsifying basis, but we have no means of quantifying this yet.

Recent efforts in theoretical machine learning [48] have attempted to quantify the information in the

weights of a NN in terms of the impact that perturbing a weight has on the cross-entropy loss. Such

work could prove very useful in quantifying the information gain of a generative prior.

Training on Simulated Channel Realizations: We have currently trained a GAN using simulated

channel realizations, since obtaining realistic channel data has not proven possible, even with our

industry partners. One can only hope to recover the channel estimate based on pilot measurements

from current transceiver chips. It remains to be seen if the GAN can succeed in learning the channel

distribution even from these noisy channel estimates. The original GAN proposed in [35] is known to

learn discriminators with poor generalization capabilities, and many recent works [30], [32], [49] have

taken different approaches to justifying design of custom objective functions for the discriminator that

would help the generator to better approximate the target distribution, and improve the generalization

capability of the discriminator.

E. Timing Analysis

Using the PyTorch based generative model, optimization of (8) involves only performing gradient

descent with respect to z ∈ Rd , with d = 35 in our case. Hence one would expect each iteration to be

computationally inexpensive. To determine the computational advantage of using GCE, we perform a

comparision of its execution time per iteration as compared to the CS baselines and the results are

tabulated in Table III. The number of iterations required to achieve the NMSE results in Fig. 4 for

each method are also given in Table III. The evaluation of the first three methods was performed

on an Intel i9-8950HK CPU @ 2.90GHz. The results for GCE are given both when performed on

the Intel i9-8950HK CPU without acceleration as well as when accelerated using a Nvidia GeForce
26

GTX 2070 GPU. As expected, a GPU does speed up backpropogation through the NN immensely as

required for computing z ∗ in (8).

TABLE III: Comparison of execution time per iteration (in milliseconds) for OMP, Lasso, EM-GM-
AMP and GCE on a single channel realization at an SNR of -10 dB.

The most important finding from Table III is that the execution time of GCE is not noticeably

affected by the increase in the number of pilot symbols, while the execution time of CS baselines

increases with increase in α. The gradient of (8) with respect to z is given by

∇z f (y, AG(z)) = 2(AT (y − AG(z))∇z G(z) + λreg z), (20)

where each row of the matrix ∇z G(z) is d = 35 dimensional. This involves only direct matrix

multiplications of A. On the other hand, for OMP, one of the steps involves inverting columns of

Asp having maximum inner product with y, whose complexity scales as O(Npm ) with 2 ≤ m < 3.

Similarly, the Lasso and EM-GM-AMP optimization problems have complexity scaling with Npm .

Moreover, as explained in Section IV-E, Lasso involves solving an SOCP, hence takes much longer

than the other algorithms. The impact of Np on the execution time of GCE will only be seen at

much higher values of Np , unlike the CS based algorithms for whom the impact of increasing Np

is immediately apparent10 . Note however that the complexity of computing ∇z G(z) is quite high

owing to the large number of weights θg in the trained generator G, hence the execution time of

GCE is comparable to OMP in the low pilot density regime.


10
Each entry of a matrix multiplication can be computed in parallel, but is limited by the number of threads available on the CPU/GPU.

Matrix inversion cannot be parallelized in the absence of its LU factorization.


27

VI. C ONCLUSION

We presented a compressed sensing-based channel estimation approach using deep generative

networks that achieves a significant performance gain over prior techniques for sparse signal recovery,

when applied to CDL channel models. Notable aspects of this approach are that it does not require

knowledge of the sparsifying basis of the channel and immensely reduces the number of pilots required

to achieve the same NMSE as Lasso/OMP/EM-GM-AMP channel estimation, even in the case of

one-bit quantized pilot measurements. Importantly, as a consequence of the gradient computation

of (8) requiring only matrix multiplications, its execution time is approximately independent of the

number of received pilot symbols Np when Np is small.

VII. ACKNOWLEDGEMENTS

The authors would like to thank Nitin Myers for discussions on low resolution quantization and

Shilpa Talwar, Nageen Himayat, Ariela Zeira at Intel for their invaluable support and technical advice

and feedback.

R EFERENCES

[1] A. Doshi, E. Balevi, and J. G. Andrews, “Compressed representation of high dimensional channels using deep generative networks,”

in Proc. IEEE Signal Proc. Adv. in Wireless Comm. (SPAWC), May 2020.

[2] T. S. Rappaport, Y. Xing, O. Kanhere, S. Ju, A. Madanayake, S. Mandal, A. Alkhateeb, and G. C. Trichopoulos, “Wireless

communications and applications above 100 GHz: Opportunities and challenges for 6G and beyond,” IEEE Access, vol. 7, pp.

78 729–78 757, Jun. 2019.

[3] H. Elayan, O. Amin, R. M. Shubair, and M.-S. Alouini, “Terahertz communication: The opportunities of wireless technology

beyond 5G,” in IEEE Intl. Conf. on Advanced Comm. Technologies and Networking (CommNet), Apr. 2018, pp. 1–5.

[4] E. Björnson, J. Hoydis, L. Sanguinetti et al., ”Massive MIMO networks: Spectral, energy, and hardware efficiency”. Foundations

and Trends in Signal Processing, Now Publishers, Inc., Nov. 2017.

[5] P. A. Eliasi, S. Rangan, and T. S. Rappaport, “Low-rank spatial channel estimation for millimeter wave cellular systems,” IEEE

Trans. on Wireless Communications, vol. 16, no. 5, pp. 2748–2759, Apr. 2017.

[6] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse

multipath channels,” Proc. IEEE, vol. 98, no. 6, pp. 1058–1076, Jun. 2010.
28

[7] W. U. Bajwa, A. Sayeed, and R. Nowak, “Sparse multipath channels: Modeling and estimation,” in IEEE 13th Digital Signal

Processing Workshop and 5th IEEE Signal Processing Education Workshop, 2009, pp. 320–325.

[8] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular

systems,” IEEE J. Sel. Topics Sig. Process., vol. 8, no. 5, pp. 831–846, Oct. 2014.

[9] R. Méndez-Rial, C. Rusu, N. González-Prelcic, A. Alkhateeb, and R. W. Heath, “Hybrid MIMO architectures for millimeter

wave communications: Phase shifters or switches?” IEEE Access, vol. 4, pp. 247–267, Jan. 2016.

[10] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in IEEE Intl. Symposium on

Information Theory Proceedings, Jul. 2011, pp. 2168–2172.

[11] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture approximate message passing,” IEEE Trans. on Signal

Processing, vol. 61, no. 19, pp. 4658–4672, Jul. 2013.

[12] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” IEEE Trans. on Info. Theory, vol. 65, no. 10,

pp. 6664–6684, May 2019.

[13] J. Mo, P. Schniter, N. G. Prelcic, and R. W. Heath, “Channel estimation in millimeter wave MIMO systems with one-bit

quantization,” in 48th Asilomar Conference on Signals, Systems and Computers, Nov. 2014, pp. 957–961.

[14] J. Mo, P. Schniter, and R. W. Heath, “Channel estimation in broadband millimeter wave MIMO systems with few-bit ADCs,”

IEEE Trans. on Signal Processing, vol. 66, no. 5, pp. 1141–1154, Dec. 2017.

[15] X. Lin, S. Wu, C. Jiang, L. Kuang, J. Yan, and L. Hanzo, “Estimation of broadband multiuser millimeter wave massive

MIMO-OFDM channels by exploiting their sparse structure,” IEEE Transactions on Wireless Communications, vol. 17, no. 6, pp.

3959–3973, June 2018.

[16] D. Katselis, C. R. Rojas, M. Bengtsson, and H. Hjalmarsson, “Frequency smoothing gains in preamble-based channel estimation

for multicarrier systems,” Signal Processing, vol. 93, no. 9, pp. 2777–2782, Sep. 2013.

[17] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channel estimation and signal detection in OFDM systems,” IEEE

Wireless Communications Letters, vol. 7, no. 1, pp. 114–117, Feb. 2018.

[18] H. He, C.-K.Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for joint MIMO channel estimation and signal detection,”

arXiv preprint arXiv:1907.09439, Feb. 2019.

[19] Y. Yang, F. Gao, X. Ma, and S. Zhang, “Deep learning-based channel estimation for doubly selective fading channels,” IEEE

Access, vol. 7, pp. 36 579–36 589, Mar. 2019.

[20] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel estimation for beamspace mmWave massive MIMO

systems,” IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 852–855, Oct. 2018.

[21] X. Ru, L. Wei, and Y. Xu, “Model-driven channel estimation for OFDM systems based on image super-resolution network,”

arXiv preprint arXiv:1911.13106, Nov. 2019.

[22] P. Dong, H. Zhang, G. Y. Li, I. S. Gaspar, and N. NaderiAlizadeh, “Deep CNN-based channel estimation for mmWave Massive

MIMO systems,” IEEE J. Sel. Topics Sig. Process., vol. 13, no. 5, pp. 989–1000, Jul. 2019.
29

[23] S. Gao, P. Dong, Z. Pan, and G. Y. Li, “Deep learning based channel estimation for massive MIMO with mixed-resolution

ADCs,” arXiv preprint arXiv:1908.06245, Feb. 2019.

[24] E. Balevi and J. G. Andrews, “Deep learning-based channel estimation for high-dimensional signals,” arXiv preprint

arXiv:1904.09346, 2019.

[25] E. Balevi, A. Doshi, and J. G. Andrews, “Massive MIMO Channel Estimation with an Untrained Deep Neural Network,” IEEE

Trans. on Wireless Communications, vol. 19, no. 3, pp. 2079–2090, Jan. 2020.

[26] R. Heckel and P. Hand, “Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks,” in Proc.

ICLR, Feb. 2019.

[27] C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Communications Letters,

vol. 7, no. 5, pp. 748–751, Mar. 2018.

[28] Z. Lu, J. Wang, and J. Song, “Multi-resolution CSI feedback with deep learning in Massive MIMO system,” arXiv preprint

arXiv:1910.14322, Oct. 2019.

[29] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, “Compressed sensing using generative models,” in Intl. Conf. on Machine

Learning (ICML), Aug. 2017, pp. 537–546.

[30] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Intl. Conf. on Machine Learning

(ICML), Dec. 2017, pp. 214–223.

[31] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMO systems,”

IEEE Trans. on Wireless Communications, vol. 13, no. 3, pp. 1499–1513, Jan. 2014.

[32] A. Srivastava, L. Valkov, C. Russell, M. U. Gutmann, and C. Sutton, “Veegan: Reducing mode collapse in GANs using implicit

variational learning,” in Adv. in Neural Info. Process. Systems, Dec. 2017, pp. 3308–3318.

[33] T. J. OShea, T. Roy, and N. West, “Approximating the void: Learning stochastic channel models from observation with variational

generative adversarial networks,” in IEEE Intl. Conf. on Computing, Net. and Comm., Apr. 2019, pp. 681–686.

[34] H. Ye, G. Y. Li, B.-H. F. Juang, and K. Sivanesan, “Channel agnostic end-to-end learning based communication systems with

conditional GAN,” in IEEE GC Wkshps, Dec. 2018, pp. 1–5.

[35] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial

nets,” in Adv. in Neural Info. Process. Systems, Dec. 2014, pp. 2672–2680.

[36] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial

networks,” in Proc. ICLR, Nov. 2015.

[37] N. J. Myers, K. N. Tran, and R. W. Heath Jr, “Low-rank mmWave MIMO channel estimation in one-bit receivers,” arXiv preprint

arXiv:1910.09141, Oct. 2019.

[38] S. Qiu, X. Wei, and Z. Qiu, “Robust One-Bit Recovery via ReLU Generative Networks: Improved Statistical Rates and Global

Landscape Analysis,” in NeurIPS Deep Inverse Workshop, Dec. 2019.


30

[39] M. J. Powell, “An efficient method for finding the minimum of a function of several variables without calculating derivatives,”

The Computer Journal, vol. 7, no. 2, pp. 155–162, Jan. 1964.

[40] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic

differentiation in pytorch,” Neural Info. Process. Systems (NIPS) Workshop Autodiff, Oct. 2017.

[41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. ICLR, Dec. 2014.

[42] A. M. Sayeed, “Deconstructing multiantenna fading channels,” IEEE Trans. Signal Process., vol. 50, no. 10, pp. 2653–2579, Oct.

2002.

[43] S. Winter, H. Sawada, and S. Makino, “On real and complex valued `1 -norm minimization for overcomplete blind source

separation,” in IEEE Wkshp on Appl. of Sig. Process. to Audio and Acoustics, Nov. 2005, pp. 86–89.

[44] S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine

Learning Research, vol. 17, no. 83, pp. 1–5, Apr. 2016.

[45] P. Schniter, “Turbo reconstruction of structured sparse signals,” in 2010 44th Annual Conference on Information Sciences and

Systems (CISS), Mar. 2010, pp. 1–6.

[46] A. M. Tulino, A. Lozano, and S. Verdú, “Capacity-achieving input covariance for single-user multi-antenna channels,” IEEE

Trans. on Wireless Communications, vol. 5, no. 3, pp. 662–671, Mar. 2006.

[47] M. M. Hasan, M. R. I. Faruque, and M. T. Islam, “Dual band metamaterial antenna for LTE/Bluetooth/WiMAX system,” Scientific

reports, vol. 8, no. 1, pp. 1–17, Jan. 2018.

[48] A. Achille and S. Soatto, “Where is the information in a deep neural network?” arXiv preprint arXiv:1905.12213, May 2019.

[49] H. Thanh-Tung, T. Tran, and S. Venkatesh, “Improving generalization and stability of generative adversarial networks,” in Proc.

ICLR, May 2019.

You might also like