Thanks to visit codestin.com
Credit goes to www.scribd.com

100% found this document useful (1 vote)
545 views76 pages

Data Center Cooling Efficiency

This doctoral thesis examines cooling control strategies in data centers to improve energy efficiency and enable heat recovery. The thesis first provides background on data centers, including their societal and economic importance as well as their large energy and environmental impacts. It then discusses various strategies for improving energy efficiency in data center cooling at the server, group, and data center levels. Additionally, it explores opportunities for recovering and utilizing the significant waste heat from data centers. The thesis presents the author's contributions in several published research papers. It also reviews other related work and discusses directions for future research, such as more holistic control-oriented approaches to data center infrastructure management. The goal is to develop simpler yet more effective control systems through co-
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
545 views76 pages

Data Center Cooling Efficiency

This doctoral thesis examines cooling control strategies in data centers to improve energy efficiency and enable heat recovery. The thesis first provides background on data centers, including their societal and economic importance as well as their large energy and environmental impacts. It then discusses various strategies for improving energy efficiency in data center cooling at the server, group, and data center levels. Additionally, it explores opportunities for recovering and utilizing the significant waste heat from data centers. The thesis presents the author's contributions in several published research papers. It also reviews other related work and discusses directions for future research, such as more holistic control-oriented approaches to data center infrastructure management. The goal is to develop simpler yet more effective control systems through co-
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

DO CTO R A L T H E S I S

Cooling Control Strategies in Data Centers


for Energy Efficiency and Heat Recovery

Riccardo Lucchese

Automatic Control
Cooling Control Strategies in Data Centers
for Energy Efficiency and Heat Recovery

Riccardo Lucchese

Department of Computer Science, Electrical and Space Engineering


Luleå University of Technology
Luleå, Sweden
Printed by Luleå University of Technology, Graphic Production 2019

ISSN 1402-1544
ISBN 978-91-7790-437-3 (print)
ISBN 978-91-7790-438-0 (pdf)
Luleå 2019
www.ltu.se
To my family.
Acknowledgments

I would like to express warm and sincere thanks to my advisors, Andreas Johansson, Wolfgang
Birk, and Khalid Atta. I am grateful to these reference figures for their insight and the guidance
provided throughout the doctoral program. A grateful acknowledgment is directed to many
academic bodies at Luleå University of Technology for creating a third-cycle platform that,
while perfectible, enabled me to grow personally and professionally. An important mention
is directed to my collaborators for their valuable support over the course of the last three
years. Finally, I feel obliged to thank my family and friends for being a source of endless
encouragement.
Preface

My interest in the main research topic of this thesis came to be about three and a half years
ago. Since then, I have regarded the design of environmental control systems for data centers
as a remarkable challenge.

Data centers are large scale infrastructures whose state may comprehend a formidable num-
ber of interacting variables. Their energy intensive nature makes it difficult, or even impossible,
for a researcher to have complete supervision of an entire facility. This, in turn, complicates
the design, development, and testing of the mathematical descriptions of the interesting phys-
ical phenomena and, clearly, of the corresponding control strategies. Little operational data is
freely available and, in this respect, it is fair to say that much of the data center knowledge is
anecdotal and qualitative. Adding to this state of affairs, the mechanical design of most current
data center cooling infrastructures does not take control into account, bringing a dispropor-
tion between the complexity of the control structures required to enable improvements and the
benefits that are produced.
This thesis suggests directions and moves initial steps toward a reorganization of the con-
trol hierarchies in data centers and their co-design with the mechanical aspects of cooling and
coolant distribution. At the center of focus is the development of model-based tools that sup-
port online optimal control decisions. Bringing this work to a closure will require further effort
to develop on top of the building blocks that are proposed here.

The format adopted for this document follows the Nordic custom of article theses. There
are two parts: the first one provides an overview of the research setting and highlights the
core questions that drive the research effort; the second part collects the already published and
submitted research manuscripts that address these questions.
Abstract

Data centers are facilities dedicated to the processing, storage, and relay of large amounts of
digital information. As a whole, it is an energy intensive industry, characterized by a sizable
carbon footprint and a short-term exponential growth rate. At a macroscopic level, their
operation requires balancing the offer and demand of computational, cooling, and electrical
power resources. The computational workload is influenced by external factors such as the
end-users’ activity, while the overall run-time costs depend on the weather conditions and the
fluctuating pricing of electricity. In this context, the adoption of optimizing control strategies
and co-design methodologies that address simultaneously both the mechanical and control
aspects, has the potential to unlock more sustainable designs. Improvements in the overall
energetic efficiency open to larger-scale deployments in less favorable geographical locations.
Recovery systems addressing the vast amounts of by-product heat can support other heat
intensive processes such as district networks, wood drying, greenhouses, and food processing.
This work focuses on how to adapt the provisioning of the cooling resources to the cooling
demand, without negotiating the computational throughput. We devise top-down designs, that
address unexplored control possibilities in existing deployments. We moreover apply a bottom-
up perspective, by modeling and studying co-designed cooling setups which bring significant
simplifications to data center level optimal provisioning problems. The analysis aims at the
different levels of the data center infrastructure hierarchy, and provides answers to centerpiece
questions such as i) what are the optimal flow provisioning policies at different levels of the data
centers?; ii) how to design simple but effective control strategies that address the complexity
induced by the large scales?; iii) what are the exhaust heat properties that can be expected in
air-cooled and liquid-cooled data centers?. Exploiting a model-centric approach we demonstrate
the effectiveness of tailored control strategies in both achieving better cooling efficiency and a
higher quality of the heat harvest.
This thesis presents opportunities to simplify data center control structures while retaining
or improving their performance. Furthermore, it lays modeling and control methodologies
toward the holistic control-oriented treatment of the computing, cooling, and power distribution
infrastructures. The results have a practical character and the model-based analysis establishes
important development directions, confirming existing trends. Enabling intelligent data center
management systems might not need to imply more complex tools; rather, a co-design effort
might yield both simpler and effective control systems.
X Acknowledgments

Keywords: data center control policies, environmental control, dynamic flow provisioning,
data center control architectures, energy efficiency, heat recovery.
Contents

Part I

1 Introduction 3
1.1 Societal and economic aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Energetic and environmental impact . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Sustainability and performance metrics . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Data Center Infrastructure Management supervisor classes . . . . . . . . . . . . 10
1.6 Environmental control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Scope of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Greening data center cooling 17


2.1 Cooling and energy efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Strategies at the server level . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Strategies at the group level . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.3 Strategies at the data center level . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Cooling and heat recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Contributions 29
3.1 Other work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Conclusions and future directions 39


4.1 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

References 43
XII Contents

Part II

Paper A. Energy savings in data centers: A framework for modelling and


control of servers’ cooling 55
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2 Thermodynamics inside the server’s enclosure . . . . . . . . . . . . . . . . . . . 59
2.1 A control-oriented static air flow model . . . . . . . . . . . . . . . . . . . 62
2.2 Modeling the dynamics of the temperature xfj of the flow crossing the
single thermal component j . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.3 Modeling the dynamics of the temperature xcj of the single thermal com-
ponent j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3 Estimation of the parameters of the air flow model from CFD trials . . . . . . . 65
4 Minimum cost fan control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1 Assessment of the identification methodology . . . . . . . . . . . . . . . . 69
5.2 Assessment of the control methodology . . . . . . . . . . . . . . . . . . . 70
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Paper B. On Energy Efficient Flow Provisioning in Air-Cooled Data Servers 75


1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2 Thermal management of server enclosures . . . . . . . . . . . . . . . . . . . . . 79
3 Thermal networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.1 The airflow model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.2 The inflow and outflow temperatures . . . . . . . . . . . . . . . . . . . . 85
3.3 Heat generation inside the chips . . . . . . . . . . . . . . . . . . . . . . . 85
4 Minimizing the overall cost of cooling . . . . . . . . . . . . . . . . . . . . . . . . 86
4.1 Static and dynamic constraints . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Control cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 Thermal network model of the Windmill V2 test bed . . . . . . . . . . . 89
5.2.1 Dynamics of the CPUs . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.2 Dynamics of the heat exchangers . . . . . . . . . . . . . . . . . 92
5.3 System identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Controlled cooling experiments . . . . . . . . . . . . . . . . . . . . . . . 97
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7 Appendix: Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Contents XIII

Paper C. Controlled Direct Liquid Cooling of Data Servers 103


1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
1.1 Statement of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 107
1.2 Organization of this manuscript . . . . . . . . . . . . . . . . . . . . . . . 107
2 A thermal modeling framework for controlled liquid cooling . . . . . . . . . . . . 108
2.1 The heat conduction overlay . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.2 The heat convection overlay . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.3 The nodes’ thermal model . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.3.1 Modeling the inflow temperature xij (t) . . . . . . . . . . . . . . 112
2.3.2 Dynamics of the local temperature xcj (t) . . . . . . . . . . . . . 112
2.3.3 Modeling the outflow temperature xoj (t) . . . . . . . . . . . . . 113
3 A library of standard models at the server level . . . . . . . . . . . . . . . . . . 114
3.1 The transport nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.1.1 Joint nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.1.2 Splitter nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.1.3 Supply nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.1.4 Collector nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.2 The thermal nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.2.1 Environmental nodes . . . . . . . . . . . . . . . . . . . . . . . . 116
3.2.2 Heat exchanger nodes . . . . . . . . . . . . . . . . . . . . . . . 116
3.2.3 Active nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4 Controlled liquid cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.1 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2 Static and dynamical constraints . . . . . . . . . . . . . . . . . . . . . . 122
4.3 The cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.4 The RHC problem formulation . . . . . . . . . . . . . . . . . . . . . . . 123
5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.2 The thermal network model . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.3 Identification of the full dynamics . . . . . . . . . . . . . . . . . . . . . . 128
5.4 Assessment of the control performance . . . . . . . . . . . . . . . . . . . 129
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Paper D. On server cooling policies for heat recovery: exhaust air properties
of an Open Compute Windmill V2 platform 137
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2 The experimental test bed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3 The thermal model of the server . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4 A cooling controller to optimize the exhaust heat quality . . . . . . . . . . . . . 145
XIV Contents

5 The experimental study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148


5.1 Airflow rate and temperature analysis . . . . . . . . . . . . . . . . . . . . 149
5.2 Exergy analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6 The numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

Paper E. A study of fine and coarse actuation capabilities in air-cooled server


racks: control strategies and cost analysis 155
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
1.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
1.3 Statement of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 161
1.4 Organization of this manuscript . . . . . . . . . . . . . . . . . . . . . . . 162
2 Mass and heat transfer models at the rack level . . . . . . . . . . . . . . . . . . 162
2.1 The thermal model of the rack . . . . . . . . . . . . . . . . . . . . . . . . 162
2.2 The thermal model of the servers . . . . . . . . . . . . . . . . . . . . . . 165
2.3 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
3 Flow-provisioning controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
3.1 Definition of the cost objectives . . . . . . . . . . . . . . . . . . . . . . . 170
3.2 The local FP controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
3.3 The cooperative FP controllers . . . . . . . . . . . . . . . . . . . . . . . 173
3.4 The global FP controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4 Numerical analysis at steady state conditions . . . . . . . . . . . . . . . . . . . . 174
4.1 The steady-state computational load model . . . . . . . . . . . . . . . . 174
4.2 Experimental plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.3 Analysis of the operating temperatures . . . . . . . . . . . . . . . . . . . 175
4.4 Analysis of the airflow supply rates . . . . . . . . . . . . . . . . . . . . . 176
4.5 Analysis of the cooling costs . . . . . . . . . . . . . . . . . . . . . . . . . 176
5 Numerical analysis under dynamic workload conditions . . . . . . . . . . . . . . 179
5.1 Discrete-time RHC form of the controllers . . . . . . . . . . . . . . . . . 180
5.2 Time-varying computational workloads . . . . . . . . . . . . . . . . . . . 181
5.3 Analysis of the flow supply rate . . . . . . . . . . . . . . . . . . . . . . . 182
5.4 Analysis of the cooling cost . . . . . . . . . . . . . . . . . . . . . . . . . 183
6 Implications on fully ducted and fanless setups . . . . . . . . . . . . . . . . . . . 184
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Contents XV

Paper F. On economic cooling of contained server racks using an indirect


adiabatic air handler 191
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
3 Mathematical model of the cooling setup . . . . . . . . . . . . . . . . . . . . . . 195
3.1 Air distribution model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
3.2 Model of the adiabatic humidifier . . . . . . . . . . . . . . . . . . . . . . 198
3.3 Model of the heat exchanger . . . . . . . . . . . . . . . . . . . . . . . . . 199
3.4 Dynamical model of a single server unit . . . . . . . . . . . . . . . . . . . 200
4 Model-based economic cooling control . . . . . . . . . . . . . . . . . . . . . . . . 201
4.1 Operation cost model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
4.2 The two-variable controller with constant room-side airflow rate . . . . . 203
4.3 A holistic three-variable provisioning controller . . . . . . . . . . . . . . . 204
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.1 Control performance of OFP2 and OFP3 . . . . . . . . . . . . . . . . . . 205
5.2 The role of leakage as a cost term . . . . . . . . . . . . . . . . . . . . . . 206
5.3 Implications on the design of efficient IAAH cooling setups . . . . . . . . 208
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

Paper G. Newton-like phasor extremum seeking control with application to


cooling data centers 213
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
2 Background material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
3 Multi-variable phasor-based derivative estimator . . . . . . . . . . . . . . . . . . 218
3.1 Estimating the Fourier coefficients . . . . . . . . . . . . . . . . . . . . . . 219
3.2 Estimating the gradient and the Hessian . . . . . . . . . . . . . . . . . . 221
4 The Newton-like ESC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Paper H. ColdSpot: A thermal supervisor aimed at server rooms implement-


ing a raised plenum cooling setup 231
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
3 Overview of ColdSpot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
4 Modeling the airflow through the perforated tiles . . . . . . . . . . . . . . . . . 239
5 The thermal supervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
XVI Contents

5.1 The local flow requirement controllers . . . . . . . . . . . . . . . . . . . . 240


5.2 The global flow provisioning controller . . . . . . . . . . . . . . . . . . . 241
6 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
6.1 Airflow models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
6.2 Control experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
6.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

Paper I. Computing the allowable uncertainty of sparse control configurations 249


1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
1.1 Statement of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 253
1.2 Organization of this manuscript . . . . . . . . . . . . . . . . . . . . . . . 253
1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
2 An abstract setting for Control Configuration Selection . . . . . . . . . . . . . . 254
2.1 Interaction Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
3 Modeling plant uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
4 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
5 Approximate solutions using randomized sampling . . . . . . . . . . . . . . . . . 259
5.1 Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
5.2 The uniform sampling strategy . . . . . . . . . . . . . . . . . . . . . . . 262
5.3 Tuning of the termination condition . . . . . . . . . . . . . . . . . . . . . 263
6 Benchmark examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
6.1 2 × 2 model of a distillation process . . . . . . . . . . . . . . . . . . . . . 268
6.2 The Wood and Berry 2 × 2 distillation column . . . . . . . . . . . . . . . 269
6.3 Ogunnaike’s 3 × 3 binary distillation column . . . . . . . . . . . . . . . . 270
6.4 Johansson’s quadruple tank setup . . . . . . . . . . . . . . . . . . . . . . 271
7 A case study on flow provisioning in computer rooms . . . . . . . . . . . . . . . 272
7.1 Modeling the raised-floor air distribution setup . . . . . . . . . . . . . . 272
7.2 Validation of the tile airflow models . . . . . . . . . . . . . . . . . . . . . 274
7.3 Input/output pairing using distance based tile grouping rules . . . . . . . 274
7.3.1 Analysis for tile-set A . . . . . . . . . . . . . . . . . . . . . . . 277
7.3.2 Analysis for tile-set B . . . . . . . . . . . . . . . . . . . . . . . 277
8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
List of Figures

1.1 Primary and support infrastructures in data centers . . . . . . . . . . . . . . . . 4


1.2 Capacity growth by operator type . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Forecasted annual electricity use of U.S.’s data centers . . . . . . . . . . . . . . . 6
1.4 U.S.’ data center electricity usage by infrastructure . . . . . . . . . . . . . . . . 6
1.5 Two examples of data center scales: containerized and hyperscale facilities . . . 7
1.6 U.S.’ data center electricity usage categorized by deployment type . . . . . . . . 8
1.7 Data center supervisory controller classes . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Rack-level computing heat load trends . . . . . . . . . . . . . . . . . . . . . . . 12

2.1 Supervisory control architecture for the computing infrastructure . . . . . . . . 18


2.2 Schematization of a generic data center cooling infrastructure . . . . . . . . . . 20
2.3 Photo of a commercial adiabatic free-cooling unit . . . . . . . . . . . . . . . . . 20
2.4 Photo of an Open Compute twin-server tray . . . . . . . . . . . . . . . . . . . . 21
2.5 Data center heat recovery concept by Facebook . . . . . . . . . . . . . . . . . . 27
List of Tables

1.1 2011 ASHRAE thermal guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Overview of application scenarios and research focus . . . . . . . . . . . . . . . . 29


Abbreviations

ACU Air Cooling Unit


AIC Akaike Information Criterion
ANN Artificial Neural Network
ASHRAE American Society of Heating, Refrigerating and Air-Conditioning Engineers
BMC Baseboard Management Controller
CACSD Computer-Aided Control System Design
CAGR Compound Annual Growth Rate
CCS Control Configuration Selection
CDF Cumulative Distribution Function
CDU Coolant Distribution Unit
CFD Computational Fluid Dynamics
COP Coefficient of Performance
CPS Cyber-Physical System
CPU Central Processing Unit
CRAC Computer Room Air Conditioner
CRAH Computer Room Air Handler
CS ColdSpot
CT Cooling Technology
CUE Carbon Usage Effectiveness
DAG Directed Acyclic Graph
DCIM Data Center Infrastructure Management
DCIS Data Center Information System
DIMM Dual In-line Memory Module
DVFS Dynamic Voltage and Frequency Scaling
EDC Edge Data Center
EPA Environmental Protection Agency
ERE Energy Reuse Effectiveness
ESC Extremum Seeking Control
FEG Fan Efficiency Grading
FMEG Fan and Motor Efficiency Grading
XXII Abbreviations

FP Flow Provisioning
FPC Flow Provisioning Capability
FPDD First-Principles Data-Driven
GDP Gross Domestic Product
GHG Greenhouse gas
GLR Generalized Likelihood Ratio
GP Gaussian Process
HDU Heat Disposal Unit
HEN Heat Exchanger Network
HPC High Performance Computing
HRU Heat Recovery Unit
HVAC Heating, ventilation, and air conditioning
IAAH Indirect Adiabatic Air Handler
IAC Indirect Adiabatic Cooling
ICT Information and Communications Technology
IM Interaction Measure
IP Intellectual Property
IPMI Intelligent Platform Management Interface
IT Information Technology
LS Least Squares
MAE Mean Absolute Error
MBRO Model-Based Repeated Optimization
MCC Minimum Cooling Cost
MFC Minimum Fan Cost
MIMO Multiple Input Multiple Output
MMC Manifold Micro-Channel
MPC Model Predictive Control
MPE Mean Percentage Error
MTTF Mean Time to Failure
NTU Number of Transfer Unit
OCP Open Compute Project
OFP Optimal Flow Provisioning
PCH Platform Controller Hub
PDF Probability Density Function
PDU Power Distribution Unit
PE Prediction Error
PID Proportional Integral Derivative
PRBS Pseudo-Random Binary Sequence
PSO Particle Swarm Optimization
PSU Power Supply Unit
Abbreviations XXIII

PUE Power Usage Effectiveness


QoS Quality of Service
RDE Riccati Differential Equation
RGA Relative Gain Array
RHC Receding Horizon Control
ROI Return on Investment
RPM Revolutions per Minute
TCO Total Cost of Ownership
TDP Thermal Design Power
TF Transfer Function
TH thermo-hygrometric
WUE Water Usage Effectiveness
XXIV Abbreviations
Part I
Chapter 1
Introduction

Data centers are facilities (including space and equipment) dedicated to the processing, storage,
and relay of large amounts of digital data. Their operation requires orchestrating heterogeneous
cyber and physical resources across a wide array of technologies and infrastructures (Glinkowski
2013a; Clipp et al. 2014). The cyber and physical domains are continuously interacting, in-
ducing multi-objective optimal control problems and coupling the supervisory control policies
across systems (Parolini, Sinopoli, Krogh, and Wang 2012).
Broadly speaking, the physical data center aggregates a primary computing infrastructure
(that includes the processing nodes, storage, and networking equipment), and two support
elements (also known as facility systems) that provide the environmental control and power
distribution means (R. Brown et al. 2007; see Figure 1.1). Software stacks, operating over the
computing equipment, reply to the external demand for digital services. This process turns
virtually the whole of its electrical power consumption into low-grade heat, and produces an
internal demand for both electricity and cooling resources that is nonuniform in both time
and space. Simultaneously, decisions affecting the provisioning of the cooling and power re-
sources will reflect on the optimal allocation of the computing workload (Mukherjee et al. 2009;
Beloglazov, Abawajy, and Buyya 2012).
Data center operations reach into different sources of unpredictability, beyond the physical
space of the facility. For instance, within the cyber domain, both the quantity and type
of computing resources’ demand may fluctuate, depending on the time of day, the day of the
week, and so on (Minet et al. 2018). Within the physical domain, power grids induce availability
constraints and optimization objectives in connection to demand response scenarios (Liu et al.
2015; Nadjaran Toosi et al. 2017) and the time-varying pricing of electricity (Parolini, Sinopoli,
and Krogh 2011). Moreover, different weather conditions affect the overall cooling capacity and
the attainable energetic performance.
Recently built data center facilities deploy increasingly higher computing density and capac-
ity. These latter trends exacerbate efficiency issues (see, for instance, Patterson and Fenwick
2008; Glinkowski 2013a and references therein) and bring a number of technological challenges
in connection to the design of the electronics, its packaging and thermal management (Agostini
4 Chapter 1. Introduction

end-users
weather conditions applications
heat recovery sink operating systems

Cooling Computing services & waste heat

Power distribution

electric grid
demand response
on-site generation

Figure 1.1: Primary and support infrastructures in data centers.

et al. 2007). Optimizing operations in energy-intensive data centers produces significant finan-
cial savings while reducing their ecological footprints (GeSI and BCG 2012). In this respect, the
development of dynamic coolant provisioning policies is among the leading recommendations
toward addressing the large overhead costs sustained by the cooling infrastructure (Bash, Patel,
and Sharma 2003; Patel et al. 2003). However, developments in cooling technology are met
by the intrinsic difficulty of predicting the performance of a complex Cyber-Physical System
(CPS)1 that involves a formidable number of variables and parameters. Moreover, cyber and
physical operations require to account for domain specific indexes that reflect the heterogeneous
nature of the provisioned resources, resulting in a spectrum of conflicting objectives (R. Brown
et al. 2007). In practice, these difficulties are addressed by employing multiple Data Center
Infrastructure Management (DCIM) systems2 . Each DCIM supervisor applies a specific lens,
narrowly focusing operations and managing only a subset of all resources, while striving to
attain local and global objectives (Alaraifi, Molla, and Deng 2012; Glinkowski 2013a).
Opposing the state of practice, recent literature suggests that more holistic control strategies
have the potential to outperform control decisions based on local information alone (Parolini,
Sinopoli, Krogh, and Wang 2012). However, a higher degree of coordination among supervisory
controllers (and the underlying requirement of plant self-awareness) induces challenging model-
ing problems and difficult to treat nonlinear optimization tasks. Coping with these difficulties
calls for the development of co-designed mechanical and control systems: novel technological
directions should be applied to render simpler modeling abstractions and control problems that
can be treated directly, taming the complexity of global coordination scenarios, while attaining
the performance and robustness of holistic strategies.

1
Paraphrasing Khaitan and McCalley 2015, a CPS is a system of systems in which complex and heterogeneous
parts interact in a continuous manner, and its proper regulation necessitates a careful co-design of the overall
architecture.
2
Also referred to as Data Center Information Systems (DCISs) or supervisory controllers in the literature.
1.1. Societal and economic aspects 5

1.1 Societal and economic aspects


Data centers form “the backbone of the global digital infrastructure” (Clipp et al. 2014). They
support disparate services directed toward businesses, institutions (governmental, educational,
welfare, and financial) and the general public, satisfying a variety of interactive and batch
computing needs. The high degree of reliance on these technologies, extends their impact to
the way in which individuals and the collective “live, work, learn and play” (The Climate Group
2008). Today, digital information services can be regarded as utilities (Carr 2005).
The momentum behind this industry is reflected by different indicators. For instance, mod-
ern data centers are required to be highly dependable, inducing the need for business-continuity
and disaster recovery planning (Glinkowski 2013b). Typical availability agreements require a
yearly downtime from up to 1 day to less than half an hour. Furthermore, infrastructural re-
dundancy levels, addressing resource capacity and resource distribution topologies, classify the
reliability of deployments from basic to mission critical (Telecommunications Industry Associ-
ation 2012).
The global data center capacity (measured in computing floor area) has experienced a 10
percent Compound Annual Growth Rate (CAGR) over the last decade (see Figure 1.2). To
parallel this expansion, the worldwide Internet traffic is forecasted to grow at a 26 percent
CAGR until the year 2022 (Cisco 2019). While higher density deployments promise lower
Total Cost of Ownership (TCO) for a given capacity, the increase in computing demand keeps
driving higher equipment operation costs. In some cases, the latter may approach or even
exceed the infrastructure acquisition costs (Clipp et al. 2014).
The Information and Communications Technology (ICT) industry is at the center of atten-
tion of key governmental policies supporting the digitalization of economies3 . The latter are in-
tended both as a means to stimulate innovation and to enhance productivity and growth (Clipp
et al. 2014). Overall, the economic impact of the digital economy is placed at about 5 percent
of the global Gross Domestic Product (GDP) (Bukht and Heeks 2017; GSMA 2019). At the
national scale, the internet economy contributed 7.8 percent to the Swedish GDP in 2013; a
share forecasted to sustain growth at a 7.5 percent CAGR (Clipp et al. 2014).

1.2 Energetic and environmental impact


A report of the U.S.’ Environmental Protection Agency (EPA) placed the 2006 annual electricity
use of the country’s data centers at 61.4 billion kWh, up from the 28.2 billion kWh used in the
year 2000 (see Figure 1.3; R. Brown et al. 2007). This swift growth posed significant concerns on
how to source and distribute the increasing electrical power demand, and eventually instigated
the development of more efficient cooling technologies and a widespread, voluntary, adoption
3
This phenomenon, difficult to capture via a short definition, can be seen as an increased reliance on “that
part of the economic output derived solely or primarily from digital technologies with a business model based
on digital goods or services”. For a survey of possible definitions, we refer to (Bukht and Heeks 2017).
6 Chapter 1. Introduction

Figure 1.2: Capacity growth by operator type. Source: Clipp et al. 2014.

Figure 1.3: Forecasted annual electricity use of U.S.’s data centers (including cooling and power
provisioning) under different technology adoption scenarios. Source: Shehabi et al. 2016.

Figure 1.4: U.S.’ data center electricity usage categorized by infrastructure. Source: Shehabi
et al. 2016.
1.3. Scales 7

(a) Containerized (b) Hyperscale

Figure 1.5: Two examples of data center scales. a) A modular, containerized, facility (Ren-
c
der: Courtesy of International Business Machines Corporation, 2016 International Business
Machines Corporation). b) Google’s facility in St. Ghislain, Belgium (Photo: Google).

of improved operation policies (Shehabi et al. 2016; Acton et al. 2018). During the same
period of time, the energy efficiency of computing units improved steadily, doubling every 18
months (Koomey et al. 2011). The EPA re-estimated the electricity usage at 70 billion kWh
for the year 2014, and forecasted a CAGR below 1 percent until the year 2020 (Shehabi et al.
2016).
At a global scale, data centers drive about 2 percent of the electricity production, implying
a significant carbon footprint. In 2009, about 0.7 percent of the global anthropogenic CO2
emissions were attributable to data centers, and this figure has been estimated to keep growing
at a 7 percent CAGR until 2020 (G. I. Meijer 2010; GeSI and BCG 2012). Over the last
two decades, the primary and support data center infrastructures have been characterized
by comparable electrical power consumption (see Figure 1.4). Notably, the typical energetic
overhead incurred due to the cooling systems ranges between 24 and 61 percent of the electrical
power budget for a medium-sized facility (Ni and Bai 2017; Uptime Institute 2019). While
these figure numbers are in agreement with the average 40 percent overhead obtained from the
EPA estimates shown in Figure 1.4, significant deviations from this average performance are
observed among operators, depending on the year of the deployed technology, the geographical
location and climate conditions, and the scale of the deployment.
Finally, we stress that whereas an accurate impact assessment should pursue the application
of a life cycle perspective (taking into account a chain of processes such as, material acquisition,
manufacturing, distribution, and disposal), current estimates suggest that an overwhelming
contribution originates from the operation phase (Malmodin et al. 2010).

1.3 Scales
The wording “data center” encompasses a broad range of facilities, including both small com-
puter rooms covering a few square meters, and hyperscale settings managing several thousands
8 Chapter 1. Introduction

Figure 1.6: U.S.’ data center electricity usage categorized by deployment type. In year 2020, a
decreasing share of about 30 percent of the overall electricity usage will account for small and
medium scale facilities (with floor capacity up to 500 m2 ); the remaining 70 percent share will
be driven larger enterprise and hyperscale data centers. Source: Shehabi et al. 2016.

square meters of deployed computing floor area (see Figure 1.5). Different sizes address specific
business needs, inducing electrical power requirements that range from a few kilowatts to hun-
dreds of megawatts (Glinkowski 2013a; R. Brown et al. 2007), and implying different overhead
trade-offs (Patterson, Costello, et al. 2007).
A slow, ongoing, paradigm shift has led data center services to be considered as utilities
rather than assets to manage (Carr 2005). To support these economies of scale, smaller facilities
have been consolidated into larger and higher density deployments. Consolidation offers a range
of practical advantages: from an increased flexibility with respect to managing peak workload
demands, to a more cost-effective deployment of physical barriers protecting the privacy of the
immaterial transactions. Moreover, consolidated deployments can be geographically located in
favorable climates, allowing lower cooling costs and access to renewable energy sources.
Paralleling this trend, the industry now witnesses the emergence of containerized Edge
Data Center (EDC) solutions, characterized by significantly lower computing capacity but a
comparable heat load density4 . Distributed networks of EDC deployments are planned to
support the big-data streams of new cloud technologies and services, and to allow for fast and
proximal edge caches within the upcoming 5G networks (TIA 2018).
A mixture of different scales will continue to be operated in the foreseeable future (see Fig-
ure 1.6), requiring both efficiency developments across all platform sizes and a focus on extract-
ing further value from the higher capacity scenarios (Brunschwiler, Smith, et al. 2009).

1.4 Sustainability and performance metrics


Several metrics have been proposed to direct these development efforts, addressing both existing
deployments by aiding in the tuning of operations, and future deployments by quantifying the
4
This latter property is considered an invariant of energy efficient facilities (Patterson, Costello, et al. 2007)
1.4. Sustainability and performance metrics 9

sustainability of different technologies and locations. Performance indexes of this kind can be
divided in two groups: life cycle metrics and operational metrics (see, for example, Avelar,
Azevedo, and French 2012; Patterson, Azevedo, et al. 2011; Belady et al. 2010; E. Brown 2012;
Reddy et al. 2017 and references therein).
From the perspective of the control system designer, the greater interests is placed on
the latter group. Here, we find productivity metrics that address the cyber aspect of the
primary computing infrastructure and thus relate to the cyber-domain aspect of data centers.
In a broad sense, these aim to capture Quality of Service (QoS) aspects such as availability,
reliability, and responsiveness. In principle, these measures may be designed as aggregates
of historical usage trends for Central Processing Unit (CPU), storages, and network, or the
operating frequency of the electronics where Dynamic Voltage and Frequency Scaling (DVFS)
techniques apply (Rivoire et al. 2007; Reddy et al. 2017). In practice, however, the heterogeneity
in nature of computing services challenges the wide adoption of generic and abstract metrics,
rendering them ill-suited to represent specialized workloads.
Sustainability indexes are typically defined as overhead metrics that quantify the efficiency
of operating the same cyber-domain components in function of the physical-domain tunables.
There are four main criteria supporting metrics in this category: overall and partial energetic
efficiency, rate of heat recovery, water usage, and carbon emissions. Perhaps the most widely
recognized energy metric is the Power Usage Effectiveness (PUE), an index proposed in 2007
by a consortium of data center operators. Its formal definition corresponds to the ratio between
the average facility power consumption and the average power consumption of the computing
infrastructure alone (including computing nodes, network and storage) aggregated over the
length of one year:

Total facility energy


PUE =
˙ [adim.]. (1.1)
Computing equipment energy

See (Avelar, Azevedo, and French 2012) for details. Minimizing the PUE corresponds then to
minimizing the energetic overhead induced by the supporting infrastructures. As a practical
example, Facebook’s hyperscale deployments in Luleå, Sweden, incur a yearly average PUE
of 1.05, placing the cooling costs at 9.7 million SEK/year, or about 2 million SEK for each
hundredth of PUE above the perfect score of 1 (Clipp et al. 2014).
A different site-based metric is the Energy Reuse Effectiveness (ERE). This indicator has
been introduced to account for both the energy supplied to the data center and the energy that
the facility provides to external users for use outside the data center (and thus not accountable
by the PUE; Patterson, Tschudi, et al. 2010). Formally, we have the following definition
(adopting the preferred notation of the previous reference)

Reuse
˙ PUE −
ERE = [adim.]. (1.2)
Computing equipment energy

Reducing the ERE for a given PUE corresponds to attaining better energy recovery perfor-
10 Chapter 1. Introduction

mance. This can come to be, for instance, as a consequence of tuning operations such as to
increase the quality of the heat harvest.
The Carbon Usage Effectiveness (CUE) is a source-based metric quantifying the data center’s
operational carbon footprint (Belady et al. 2010)

˙ CEF · PUE
CUE = [kg CO2,eq /kWh]. (1.3)

The aggregate carbon emission factor (denoted above with the acronym CEF) is intended to
measure the weighted Greenhouse gas (GHG) impact resulting from drawing electrical energy
from different renewable and non-renewable sources. Similarly, the Water Usage Effectiveness
(WUE) quantifies the usage efficiency of water resources (Patterson, Azevedo, et al. 2011).
The WUE metric includes both off-site usage (for instance, at the electrical power plants) and
on-site usage (for instance, humidifying sprinklers in evaporative or adiabatic cooling units)
to provide an account of how the regional context, different sources of energy, and the local
climate will affect the sustainability of operations.
Overall, the intent of these metrics is to aid the comparison of different control policies
and operating locations. However, their opaqueness offers little support both to the offline
analysis of the underlying cooling mechanisms and technologies, and to the online decision
making process of selecting the optimal controls. Moreover, the focus on quantifying the relative
weight of the considered overhead variable over large periods of time renders them unsuitable
for direct inclusion in online optimal control objectives. Finally, a more accurate accounting
of the cooling costs should be sought, that include both the flow provisioning costs within
the computing equipment and important temperature-dependent contributions such as leakage
losses within the electronics.

1.5 DCIM supervisor classes


Supervising operations in data centers requires the orchestration of the available cyber and
physical resources. The goal is to satisfy both external and internal demands while managing
the overall provisioning cost. Three main supervision tasks can be associated to each of the
main data center infrastructures: “Dynamic workload placement”, “Environmental control”,
and “Power management” (see Figure 1.7; Alaraifi, Molla, and Deng 2012). Generic supervisory
controllers may then exploit different degrees of coordination when implementing these tasks:

• Class I: these are supervisory controllers that focus the operation of a single infrastruc-
ture and implement solely the corresponding supervision task. Only local information,
capturing the state of the same infrastructure, is used to form control decisions online.
For instance, Class I environmental controllers operate without a precise knowledge of
electricity pricing forecasts or the current workload placement policy;
1.5. Data Center Infrastructure Management supervisor classes 11

• Class II: control strategies in this class monitor and supervise jointly two among the
primary and support infrastructures. By drawing information from both systems, a Class
II controller can potentially infer better decisions, improving the overall energetic perfor-
mance of the facility compared to the fully uncoordinated scenarios;

• Class III: these are fully holistic control strategies that avail of the measured state
of the whole data center and potentially of forecasts for all variables that may affect the
operations of the primary and support infrastructures. A Class III control strategy is able
to participate in demand response schemes while jointly optimizing the cooling operations
and the placement of the computational workload.

Class I is representative of existing data center supervisors, where the resource management
is largely hierarchical and uncoordinated: the end-users determine the computational workload
which, in its turn, drives the cooling and power requirements. The pitfalls of Class I control
strategies are qualitatively understood. For example, workload schedulers that operate without
awareness of the spatial thermal effects can lead to localized hotspots (within the computer
room) that require a disproportionate cooling effort. Coordinated Class III supervisors move
away from purely hierarchical management approaches, and aim to reduce the operating ex-
penses by exploiting new synergies (Parolini, Sinopoli, Krogh, and Wang 2012). However, the
challenges met by Class III supervisors are substantial and, indeed, paralleled by the difficulty
of formalizing accurate control-oriented descriptions of the many physical phenomena of in-
terest, including the coupling interactions. This lack of self-awareness prevents the design of
systematic approaches to making optimal coordinated decisions.

The focus of this thesis is on enabling Class I optimizing supervisors for the cooling in-
frastructure. To this aim we treat flow provisioning problems posed at different levels of the
mechanical cooling hierarchy, from the modeling of the heat production process within the
single computing node, to the provisioning of chilled air within computer rooms. To motivate
our line of development and this prioritization of the cooling infrastructure we consider the
following points:

i) The cooling infrastructure supports a significant energetic overhead, directly affecting


widely recognized performance metrics such as the PUE and CUE;

ii) Historically, conservative flow provisioning policies have been prioritized over the secondary
objective of achieving energetic efficiency. This lack of focus, allows for a large unexplored
space of promising solutions that exploit the co-design of both mechanical and control
aspects;

iii) The life time of data center facilities is one order of magnitude longer than that of the com-
puting equipment (years for the latter and decades for the former). As new higher-density
technologies replace the old equipment, it is of paramount importance that the cooling
12 Chapter 1. Introduction

Class I
Environmental control

Class II

Class III
Dynamic workload placement

Power management

Figure 1.7: Implementations of data center supervisory controllers in Class I, II, and III.

Figure 1.8: Rack-level computing heat load trends. Source: ASHRAE 2012.

provisioning controllers become aware of the heat load context in which they operate and
capable to adapt to the cooling demand;

iv) Finally, the formalization of control-oriented models aimed at the cooling infrastructure
is instrumental to the development of Class II and Class III supervisory controllers that
account for the thermal effects and predicted efficiency of provisioning the cooling resources.

1.6 Environmental control


With the continuous growth of demand for computing capacity, data center deployments have
seen an increase in both total power loads (up to tens of MW) and power densities (up to tens
of kW per square meter, see Figure 1.8; R. Brown et al. 2007). Adding to these trends, the
upcoming generation of high-end CPUs will dissipate in excess of 400 watts across the surface
area of a credit card (Intel 2019). This state of practice brings increased cooling costs and
presents a number of technological challenges concerning the design of the electronic equipment,
its packaging and its thermal management (Agostini et al. 2007; Kogge et al. 2008). A large data
center facility incurs cooling costs in the order of tens of millions of SEK per year, instigating a
natural interest for more effective control strategies that can adapt to the time-varying demand
1.6. Environmental control 13

and operating conditions. At the same time, the need to collect and discard vast amounts of
waste heat opens to repurposing scenarios that allow facilities to produce further value.

1.6.1 Requirements
Operating the electronic equipment safely requires managing its temperatures within thermal
envelopes provided by manufacturers or standardization bodies. Moreover the full thermohy-
grometric state of the computer room environment must be actively regulated to decrease the
risk of static discharge in dry air conditions and moisture condensation when humidity is too
high.
Each computing unit that is operated requires factoring the corresponding safety and relia-
bility requirements into the control policy. The thermal manager needs to operate continuously:
indeed, overheating the electronics can trigger the throttling of the computing throughput (af-
fecting the overall computing capacity), incurs added costs due to temperature-dependent leak-
age phenomena, increases failure rates (by exposing the equipment to decreased Mean Time
to Failure (MTTF)), and consequently can increase operating costs through breached uptime
agreements, maintenance and material replacement.
The focus point of environmental control guidelines is typically placed at the servers, since
this is where the bulk of the heat is produced and the biggest safety concerns arise. Widely
acknowledged guidelines are published by the American Society of Heating, Refrigerating and
Air-Conditioning Engineers (ASHRAE) for both air and liquid cooled equipment (ASHRAE
2011; ASHRAE 2014; European Commission 2019; see also Table 1.1). Recommendations of
this kind, address opportunely both safety and energetic concerns. However, only upper and
lower limits on temperatures and humidity levels are provided, allowing generic facilities to
operate far from the cost-optimal conditions.

Dry-Bulb Maximum
Class Relative humidity Elevation
temperature Dew Point
[◦ C] [%] [◦ C] [m]
Recommended
18 to 27 5.5◦ C DP to 60% RH & 15◦ C DP -
A1 to A4
A1 15 to 32 20 to 80 17 3050
A2 10 to 35 20 to 80 21 3050
−12◦ C DP & 8% RH
A3 5 to 40 24 3050
to 85% RH
−12◦ C DP & 8% RH
A4 5 to 45 24 3050
to 90% RH
B 5 to 35 8 to 80 28 3050
C 5 to 40 8 to 80 28 3050

Table 1.1: 2011 ASHRAE thermal guidelines for air cooled equipment.
14 Chapter 1. Introduction

1.6.2 Objectives
The main aim of environmental control is to reliably satisfy to the time-varying cooling demand.
Assuming sufficient capacity, the basic task of provisioning the cooling resources can in principle
be fulfilled by regulating opportune temperatures across relevant heat exchange surfaces. On
one hand, the qualitative effect of changes in the manipulable variables on these temperatures
is well understood and, stabilizing the system to an a priori fixed state is relatively trivial:
increasing the control effort results in lower coolant temperatures, and vice versa. However,
reducing data center flow provisioning problems to regulation problems with fixed set-points
can significantly degrade the energetic performance.
Greening data center operations calls instead for taking into account optimization objectives
that relate to the economical cost of operating the equipment. Moreover, Return on Investment
(ROI) plans for heat recovery systems are becoming increasing alluring (R. Brown et al. 2007;
Facebook 2019). This supports the formalization of a different set of utility-aware objectives,
that relate to the heat recovery performance of the facility.

1.6.3 Challenges
Maximizing the efficiency or the heat recovery utility of the cooling infrastructure can be
assimilated to solving an optimal control problem that accounts for the complete thermal state
of the facility, together with a large number of unknown exogenous variables corresponding
to the future weather conditions and the computational workload (Patterson and Fenwick
2008; Schmidt, Cruz, and Iyengar 2005). The lack of determinism deriving from the uncertain
future scenarios introduces a robustness requirement into the control strategy. Uncertainty
must be taken into account online in order to avoid overall poorly performing or even brittle
designs (E. A. Lee 2008).
Whereas modularity allows for easily up-scaling a deployment, it also induces a large number
of coupled state variables. Every computational unit that is deployed adds to the global control
problem its local temperature state, its local manipulable variables (that is the rate at which
the coolant is provisioned within the enclosure), its local outputs and the corresponding thermal
envelope constraints. Moreover, whenever a control action is applied, the effects reverberate
across several decades of time and spatial scales, up from, and down to, the chip level (at which
strict temperature constraints must be enforced).
Formulating predictions of the turbulent flow properties that are characteristic of air-cooled
facilities requires tracking tens of thousands of flow variables that interact nonlinearly (Rambo
and Joshi 2007). Although accurate results can be obtained by adopting Computational Fluid
Dynamicss (CFDs), these techniques produce models that are too computationally demanding
for their use online, severely limiting their viability.
Finally, the relationship between the operating temperatures and the coolant flow provi-
sioning rates is typically unknown and nonlinear. This induces optimization surfaces for the
global optimization problem that are also nonlinear, challenging to estimate, and numerically
1.7. Scope of the thesis 15

intensive to explore when searching for the optimal controls. We stress that since the optimal
operating point can vary greatly in time, the hypothesis of a nearly optimal “local” strategy
has to be discarded.

1.7 Scope of the thesis


In this thesis, we pursue a control oriented treatment of flow provisioning problems in data
centers. The focus is on Class I supervisory controllers that dynamically adjust the provisioning
of the coolant to attain objectives relating to energetic efficiency and heat recovery. The core
research questions in our line of development are summarized as follows:
• R.Q. 1: The lifetime of a data center facility is longer than the lifetime of the computing
equipment that it deploys. How to design adaptive coolant provisioning strategies that
can cope with continuous upgrades, including changes in the total heat load and heat
load density?

• R.Q. 2: Knowledge of the exhaust heat properties produced by the computing equipment
is often qualitative. However, characterizations of this heat production process are critical
toward heat recovery analyses. What are the attainable exhaust heat properties in air-
cooling and liquid cooling scenarios? How do dynamic flow provisioning strategies affect
the trade-off between the quantity (the flow rate of coolant) and quality (the temperature)
of the exhaust?

• R.Q. 3: Data centers’ optimal control problems must in principle account for hundreds
of thousands of state variables (as in hyperscale facilities). Which abstractions deriving
from co-designed mechanical and control architectures could yield numerically workable
optimal control problems (producing a reduction in the dimension of the control space)
while retaining, or improving, the performance attained by existing flow actuation means?

• R.Q. 4: The coolant distribution topologies and static policies that are often adopted
by existing data center deployments severely limit the attainable performance. Which
model-based indications can be formulated toward improving the operation in existing
facilities? And, what are the technology development guidelines that emerge from these
analyses?

1.8 Thesis structure


The present chapter identifies and motivates our focus, placing our effort in the socio-economic
landscape. Chapter 2 surveys the scientific literature on technologies and control strategies
aimed at greening data center operation, giving particular attention to the cooling infrastruc-
ture. Chapter 3 highlights our contributions toward the topic of this thesis and other contribu-
tions produced within scope of the doctoral program. Chapter 4 collects concluding remarks
16 Chapter 1. Introduction

and future directions. Finally, the collection of manuscripts supporting this thesis is appended
in Part II.
Chapter 2
Greening data center cooling

Historically, efficiency improvements in the operation of data centers have been achieved through
advances in the hardware (see G. I. Meijer 2010; Koomey et al. 2011; Waldrop 2016; Rong et
al. 2016, and references therein). Upgrading to best-in-class hardware technologies leads to
structurally greener facilities that incur a smaller nominal energetic overhead (R. Brown et al.
2007; Acton et al. 2018). For instance, the deployment of high efficiency continuity and condi-
tioning equipment and the implementation of best practice topologies, can curb electric power
distribution losses below 10 percent of the total facility budget (Frachtenberg 2012; Schärer
2013).
More recently, increasingly sophisticated supervisory controllers are deployed to attain the
performance by acting on software-only components1 . The aim is to operate efficiently the ex-
isting hardware, extending power proportionality 2 beyond the traditional focus point placed at
the computing units. Implementations of adaptive resource provisioning behaviors depend on
online metering information, and a varying degree of a priori plant knowledge, to manipulate
a broad range of cyber and physical domain tunables (Patel et al. 2003; Singh, Korupolu, and
Mohapatra 2008; Shanahan 2013). Turning off idle equipment, DVFS, and power gating of
the electronics have been investigated extensively, addressing both servers, storage, and net-
working equipment, under different capacity assumptions and reliability guarantees (Brooks
and Martonosi 2001; Skadron et al. 2004; Leverich et al. 2010; Abts et al. 2010; Bianzino
et al. 2012). Profiling of the computational workload allows supervisory controllers to exploit
flexibilities in the placement (or scheduling) of the computing workload (see Figure 2.1). Con-
solidation and spreading decisions can be performed in an energy-aware fashion as in (Moore
et al. 2005; Yiyu Chen et al. 2005; Gandhi et al. 2009; Mukherjee et al. 2009; Tang, Gupta,
and Varsamopoulos 2008; Beloglazov, Abawajy, and Buyya 2012; Vanderster, Baniasadi, and
Dimopoulos 2007). Workload scheduling problems addressing contexts with multiple geograph-
1
We notice that other, complementary, software strategies exist that do not depend on a continuous feedback
from variables relating to energetic performance (see, for instance, Demaine et al. 2016). From the perspective
applied by the present discussion, these improvements exhibit rather the characteristics of hardware upgrades.
2
Broadly speaking, this is the notion that the electrical power consumption at the computing units should
be linearly dependent on the computing workload. See (Barroso, Clidaras, and Hölzle 2013).
18 Chapter 2. Greening data center cooling

Batch Workflows Batch Bags of Tasks

Applications VMs / Containers App Managers

App

VM/Container Hadoop

Management Monitoring
System
Scheduler System

App App

OS VM / Container
Datacenter Host Hypervisor
Resources
Physical Machine Host
(space-shared)
Virtual Machine
(time-shared)

Cluster 1 Cluster 2

Figure 2.1: Overview of a supervisory control architecture for the computing infrastructure
(Image: IEEE 2018). Source: Andreadis et al. 2019.

ically sparse facilities are investigated in (Qureshi et al. 2009; Liu et al. 2015; Nadjaran Toosi
et al. 2017).
Cooling supervisory controllers are software tools focusing the physical domain aspect of
provisioning the cooling resources. They may operate at different levels of the cooling infras-
tructure, taking control decisions on the basis of either local or global state information (Tang,
Gupta, and Varsamopoulos 2007; Wang et al. 2009; Parolini, Sinopoli, Krogh, and Wang 2012).
The control community has contributed to the topic by proposing different model-based and
model-free control strategies. The control objectives are aligned with the industry’s need to
improve the cooling efficiency (Schmidt, Cruz, and Iyengar 2005) or, since more recently, to
improve the exhaust heat quality in reuse applications (Brunschwiler, Smith, et al. 2009). The
remainder of this chapter is dedicated to surveying key contributions treating supervisory con-
trol strategies for the cooling infrastructure.

2.1 Cooling and energy efficiency


The average power densities characterizing data centers can exceed those of typical office build-
ings by more than one order of magnitude (Rambo and Joshi 2007). Since the heat load is
predominantly collected using forced convection, maintaining an adequate supply of chilled
coolant requires continuous operation. Data centers employ different cooling systems designs
that are tailored to best cope with specific heat load magnitudes and to exploit favorable local
2.1. Cooling and energy efficiency 19

climatic conditions (Frachtenberg et al. 2012; Capozzoli and Primiceri 2015). At a high level
of abstraction, the data center heat cycle encompasses three main steps: production, recovery,
and disposal (see Figure 2.2). The heat generated by the computing equipment is harvested
into a heat-transfer medium such as air, or less commonly water, oil, and synthetic refrigerants.
A Heat Recovery Unit (HRU) processes the warm coolant drawn from the equipment’s exhaust
before recirculating it back to the computer room’s distribution setup at a lower tempera-
ture. In general, the HRU’s operation is supported by a secondary cooling loop that extends
outside the physical boundaries of the computer room to a Heat Disposal Unit (HDU). The
HDU is tasked to supply facility-wide cooling resources. A number of alternative designs are
possible (R. Brown et al. 2007; Evans 2012; ASHRAE 2014; Capozzoli and Primiceri 2015).
In air-cooled environments, the recovery step is typically performed using Computer Room
Air Conditioners (CRACs) or Computer Room Air Handlers (CRAHs) located near the com-
puting equipment. The collected heat is eventually disposed of using air-to-air, liquid-to-air,
or liquid-to-liquid heat exchangers within cooling towers or compressor-based chillers (Breen,
Walsh, Punch, Shah, and Bash 2010; Bhat et al. 2013). Facilities availing of economizers (also
referred to as free-cooling units) draw an external airflow directly to the HRU, bypassing the
secondary cooling circuit. In indirect free-cooling solutions, the HRU is implemented as an
air-to-air heat exchanger: the external airflow rate modulates the transfer of heat from the
primary internal airflow circulating within the computer room. In these scenarios, the HRU
performance can be enhanced by adopting pre-cooling adiabatic humidifiers or evaporative sys-
tems (see Figure 2.3; Beghi, Dalla Mana, et al. 2017; Rampazzo, Lionello, Beghi, et al. 2019). In
direct free-cooling solutions, the HRU is implemented as a mixing chamber with no separation
between the external and internal airflows (Frachtenberg et al. 2012). By dispensing with me-
chanical refrigeration cycles, direct free-cooling solutions achieve significant cost benefits, but
their deployment is limited to locations with favorable climates. Ongoing trends, moreover,
witness liquid cooling loops extending further near to the equipment as in rear-door cooling
setups, direct on-chip liquid cooling, and immersion liquid cooling (Nemati et al. 2016; G. I.
Meijer 2010; Haywood et al. 2015). Extending the liquid cooling circuit to the electronic chip
is required to address high density thermal loads and can provide significant efficiency and heat
recovery benefits, decreasing the overall facility TCO (Patterson and Fenwick 2008). However,
the perceived hazards in provisioning electrically conductive fluids near the equipment, and an
overall lack of standardization, have prevented a broader adoption of these technologies in a
traditionally risk-averse industry.
As the computing and power infrastructures, also the cooling systems are endowed with a
own hierarchy (Zhu et al. 2008; Raghavendra et al. 2008; Parolini, Sinopoli, Krogh, and Wang
2012). A layering perspective that is widely adopted in the literature addresses the spatial and
temporal scales that characterize the data center heat cycle (Rambo and Joshi 2006; Gong and
Cox 2016; Palomar et al. 2016):
• the server level: capturing the smallest spatial scale and the fastest time dynamics of cool-
ing operations. This level encompasses a single rack-shelved enclosure such as networking
20 Chapter 2. Greening data center cooling

environment
/ heat sink
HDU

Computing
HRU equipment

Figure 2.2: Cooling the computing equipment in data centers involves i) the recovery of heat
produced within the computer room and ii) its eventual disposal into the external environment
or injection into a heat sink for reuse.

Figure 2.3: Commercial adiabatic free-cooling HRU. An independent module can recover in
excess of 2 MW of heat and occupy a volume of up to 100 cubic meters (depending on the
model). Photo: Vertiv.
2.1. Cooling and energy efficiency 21

air outlet

fans

air inlet

Figure 2.4: An Open Compute twin-server deploying two identical Windmill servers (high-
lighted in yellow). The missing plastic cover in the center slot reveals the placement of the
CPUs, the Dual In-line Memory Modules (DIMMs), and companion chips. A third low-power
slot on the far right is dedicated to the Power Supply Unit (PSU) and the storage bay. Each
server is cooled by two high velocity fans placed at the tray’s outlet face.

and storage equipment, and server computing units;

• the group level: addressing groups of server level components that are physically close to
each other. This level encompasses a range of spatial scales to represent a small number
of tightly packaged blade server units or several fully shelved racking cabinets;

• the data center level: comprehending the full range of spatial and time scales, including
the thermal dynamics and transport phenomena connected with the operation of the HRU
and HDU in Figure 2.2. In hierarchical control architectures, supervisory controllers at
this level form decisions on the optimal thermal boundary conditions for the group and
server levels.

The literature on controlled data center cooling can be naturally organized according to the
previous taxonomy.

2.1.1 Strategies at the server level


The server level is where the bulk of the data center heat load is produced. Flow provisioning
strategies focusing single equipment enclosures are then of interest since the primary means for
rejecting the heat is forced convection. As an illustrative example, Figure 2.4 shows an Open
Compute data server, highlighting its geometry and parts’ layout. The heat loads are localized
at the electrical components where virtually all the absorbed electrical power is converted into
heat. Cooling is performed by regulating the local flow provisioning rate, which in turn affects
the rate of convective heat transfer at each component. The control policies acting at the server
level affect both the local energetic efficiency and the closed loop performance of the overall
22 Chapter 2. Greening data center cooling

cooling system3 (Breen, Walsh, Punch, Shah, and Bash 2010; Breen, Walsh, Punch, Shah,
Bash, et al. 2012; Ovaska, Dragseth, and Hanssen 2016).
In practice, on-off provisioning schemes and variable rate strategies are designed as simple
fuzzy systems (Rajamani et al. 2010; Zapater, Ayala, et al. 2013) or using the server’s inlet
temperature as a feed-forward signal (J. Chen et al. 2014, and references therein). Static
feedback mappings of the current CPU utilization and workload type are considered in, for
example, (Shin et al. 2009; Zapater, Ayala, et al. 2013). In these studies, an optimal fan speed
is pre-determined by measuring the consumption envelope of the server while running different
workloads.
(Kim et al. 2014) considers instead a dynamic strategy based on adaptive Proportional
Integral Derivative (PID) regulators with parameters dependent on the past control input. A
similar approach is considered in (C. Lee and R. Chen 2015), where a PID controller is trained
online by a neural network assuming a linear lumped parameter model of the thermal network
inside the server. Cascaded PID control loops (Fu et al. 2010) and state machines (Ayoub,
Nath, and T. Rosing 2012; Chan et al. 2012) have been designed to treat scenarios in which
also the local rate of heat generation can be manipulated.
(Rajamani et al. 2010) considers a model free asymmetric strategy where a proportional
controller regulates increases in the fan speed while speed decreases are commanded by fuzzy
logic. Regulation strategies based on standard building blocks such as PIDs can improve the
local energetic efficiency of cooling to a varying degree but lead nevertheless to overprovision.
In particular, the set-point for the coolant rate needs to be over-estimated in order to let
the electronics operate within the safe thermal envelope at all times. (Huang et al. 2011;
Pradelle et al. 2014) propose to use adaptive hill-climbing strategies that seek the minimum-
cost flow rate online. However, they require stable workloads that are typically found only in
High Performance Computing (HPC). The previous reactive strategies either neglect to account
explicitly for changes in the physical environment such as inlet air temperature or fail to account
for the volatility of compute loads.
To remedy these shortcomings, equipment at the server level has been subject to a substan-
tial modeling effort. Flow and thermal models at the server level have been addressed using
CFD (Choi et al. 2008), linear and nonlinear system identification concepts (Parolini, Sinopoli,
Krogh, and Wang 2012; Zapater, Risco-martín, et al. 2016), and genetic programming (Za-
pater, Risco-martín, et al. 2016). Control-oriented models of individual electronic chips have
been devised in (Skadron et al. 2004) where resistor-capacitor networks are used to capture heat
conduction, generation, and storage within a single package. Temperature-dependent leakage
dissipation are modeled in (Liao, He, and Lepak 2005). (Zheng et al. 2018) develops a nonlinear
temperature model to capture the interacting dynamics of flash storage disks and CPUs within
a single enclosure. The model is then used for the offline tuning of an Active Disturbance Re-
jection strategy and to pair different fan actuators with different thermal zones within the unit.
3
According to (Wang et al. 2009), the “peak power usage by fans in certain blade servers can be as high as
2000W”, corresponding to roughly 23% of the system’s power consumption.
2.1. Cooling and energy efficiency 23

Other works have cast flow provisioning problems as minimum-cost model-based control prob-
lems. For instance, a nonlinear temperature dynamics is adopted in (Wang et al. 2009) to drive
a one-step look-ahead predictive strategy. The corresponding optimal control problems involve
then an objective relating to the actuation cost, the model of a uni-CPU blade server, and
temperature and actuation constraints over a horizon of length one. (Parolini, Sinopoli, Krogh,
and Wang 2012) considers multi-CPU systems, extending the modeling of (Wang et al. 2009) so
to account also for the thermal dependencies among components. However, time-constant heat
convection coefficients are assumed. A limit of these model-based control strategies is to ad-
dress minimum flow provisioning cost objectives, disregarding temperature-dependent leakage
dissipation terms at the electronics. This potentially leads to higher operating temperatures
and lower energetic efficiency as a result of leakage.

2.1.2 Strategies at the group level


A more systematic analysis of data center physical boundaries and flow interfaces that in-
volve heat and mass exchanges is among the centerpiece recommendations toward improving
efficiency (Patel et al. 2003). Despite this fact, the vast majority of controlled cooling contri-
butions focus either the lowest or the highest level of abstraction, namely the server and data
center levels. Within the specific context of air cooled data centers, the main obstacle toward
treating formally cooling problems at the group level is the widespread lack of mechanical means
to precisely direct the chilled air. Consequently, there is a lack of control oriented models that
capture the complex airflow behaviors and thermal properties across the free space of computer
rooms.
Tightly packaged groups of 16 blade servers with shared air cooling resources are treated in
(Wang et al. 2009). The authors propose a model predictive strategy and design a minimum
actuation cost control policy that jointly manipulates the speed of 10 high velocity fans. More
commonly, however, thermal management techniques at the group level operate within the
cyber domain. In these latter scenarios, the typical control action is to consolidate multi-
tier virtualized environments in order to improve resource sharing within the group, while at
the same time enabling other units to save energy by entering low-power states (see Ahmad
et al. 2015, and references therein). Other, local, group level thermal management strategies
involve upgrading the coolant distribution topology or the cooling technology. For instance,
in hybrid air-and-liquid cooling problems, a portion of the compute capacity is upgraded with
direct liquid cooling to address computer room hotspots. Reducing the overall cooling cost
corresponds then to 1) an offline selection of the machines to be retrofitted and 2) the online
optimal job allocation over the mix of air and liquid cooled platforms (Li et al. 2014; Li et al.
2015).
Energetic and viability analyses have been carried out for air distribution setups adopting
aisle containment (Arghode et al. 2019), and rear door or overhead cooling systems (Nemati
et al. 2016; Silva-Llanca et al. 2019). Overall, however, flow provisioning interfaces at the group
24 Chapter 2. Greening data center cooling

level, and the potential interactions with the low level cooling controllers operating at the server
level, remain largely unexplored from a controlled cooling perspective.

2.1.3 Strategies at the data center level


Cooling strategies aimed at the data center level typically focus either the inside or the outside
of computer rooms. Strategies in the first class can be moreover categorized on the basis of
the means and capacities by which they adapt to the spatial distribution of the heat load
throughout the room.
(Song, Murray, and Sammakia 2013) develops a linear model capturing the average tem-
perature of different room zones for a small data center facility with a single cooling unit. The
chilled air is supplied through a raised plenum and a CFD environment is used to generate
the ground truth for estimating the model. An Artificial Neural Network (ANN) based PID
controller is trained to regulate the zonal temperatures by acting on the cooling unit, show-
ing promising results. However, the feasibility of a linear zonal model needs to be validated
in scenarios with more realistic air distribution topologies. A conceptually similar modeling
approach is discussed (VanGilder et al. 2011), where the focus is to capture a flow model,
including mass rates and temperatures, within the raised plenum and at the perforated tiles.
For numerical investigations of the accuracy and efficiency of these modeling techniques in the
context of data centers we refer to (Vangilder and Shrivastava 2006; Toulouse et al. 2009; Song,
Murray, and Sammakia 2014). We report also on works that focus on the design problem of
optimizing the placement and geometry of the individual perforated tiles (Srinarayana et al.
2014; Song, Murray, and Sammakia 2013). Model-based analysis has proven particularly use-
ful toward understanding relevant heat and mass transfer phenomena in data centers (Rambo
and Joshi 2006; Patankar 2010), and instrumental in aiding decision making about promising
development directions (Breen, Walsh, Punch, Shah, and Bash 2010; Dayarathna, Wen, and
Fan 2016). However, by their offline nature, these techniques are non adaptive with respect to
online heat load changes.
A model-free zonal methodology is evaluated experimentally by (Bash, Patel, and Sharma
2006), where the authors exploit metering information to design sparse control structured based
on the volume of influence of the different cooling units. (Yuan Chen et al. 2010) builds on
this strategy and proposes a coordinated power and workload aware supervisory controller that
dynamically provisions the computing and cooling resources in order to ameliorate hotspot
phenomena due to warm-air recirculation. Adaptive mechanisms that account for the spatial
heat load distribution have been devised in (Tang, Gupta, and Varsamopoulos 2007; Parolini,
Sinopoli, Krogh, and Wang 2012), among others. (Tang, Gupta, and Varsamopoulos 2007)
first assumes a static linear airflow recirculation model within the room that and estimates
its parameters from a numerical CFD campaign. Then it designs a workload allocation policy
that aims at minimizing the maximum inlet temperature predicted at any computing enclosure.
(Parolini, Garone, et al. 2010; Parolini 2012; Parolini, Sinopoli, Krogh, and Wang 2012) devise
2.1. Cooling and energy efficiency 25

a comprehensive data center control framework that accounts for both the compute and cooling
resources. The modeling approach of (Tang, Gupta, and Varsamopoulos 2007) is extended by
including Coefficient of Performance (COP) based models of the cooling units and linear models
to capture the temperature dynamics of the servers. The resulting description is then exploited
in a online model predictive scheme that aims at minimizing the actuation cost due to cooling
alone (Parolini, Sinopoli, Krogh, and Wang 2012), or the total cost of operating data centers
participating in smart grids with a time-varying pricing of the electricity. We stress however
that optimizing controllers based on knowledge of the airflow model within the room depend
on the unrealistic assumption of constant heat convection rates at the equipment, limiting their
viability and the attainable performance. Higher order numerical temperature models exists
but are not ready for use in real-time predictive schemes (Boer et al. 2018).
(Lazic et al. 2018) addresses a hard floor air cooling scenario and designs a model predic-
tive control strategy that first learns a linear temperature model of the computer room and
then minimizes online a quadratic objective relating to the true actuation cost associated to a
cooling tower and multiple air handlers within the room. The strategy is shown to outperform
a decentralized control structure using tuned PIDs. However, the applicability of the proce-
dure appears limited to flooded supply and ducted return air distribution setups. In these
settings, linear interaction models might be sufficient to capture the flow mixing behaviors.
(Le et al. 2019) considers a reinforcement learning strategy in which the reward weights the
power consumption of the cooling infrastructure and the thermal envelop violations resulting
from applying a given cooling policy. The strategy is shown to perform favorably against a
model-based optimizing controller in simulation. However, further analysis is required to assess
the overall performance. It is worth noting that generic model-based frameworks exists for en-
vironmental control in the built-environment. But, this latter line of works places a lesser focus
on the advection phenomena and practical implementations are limited to linear and bilinear
models (Oldewurtel et al. 2012; Sturzenegger et al. 2016). Moreover, an accurate modeling of
the mechanical cooling plant is required to attain the performance in a data center application.
Other strategies focus on optimizing the process-side operations of the cooling units. Op-
timizing model-based controllers have been proposed recently to address indirect free-cooling
units (Ogawa et al. 2015; Beghi, Mana, et al. 2017). In particular, (Beghi, Mana, et al. 2017)
proposes to use Model-Based Repeated Optimization (MBRO) to minimize the process-side
operation cost of an Indirect Adiabatic Air Handler (IAAH).
We report also on gradient-free optimizing controllers such as extremum seeking strategies.
These approaches aim at learning the optimal controls online in a model-free fashion (Ram-
pazzo, Lionello, Panebianco, et al. 2018; Beghi, Lionello, and Rampazzo 2019). Their effec-
tiveness however comes at cost of a slow convergence. Moreover, they require to continuously
perturb the controlled plant with a dither signal, typically introducing the need for continuously
manipulable control variables and a requirement for small disturbances to correctly reconstruct
the local gradient of the objective. How these assumptions reflect on realistic deployments
awaits experimental validation.
26 Chapter 2. Greening data center cooling

To our best knowledge, there is a lack of attention in the literature on the modeling and
control of the complete cooling system, which includes the cooling unit and the servers under
dynamic provisioning policies. In particular, a large body of works in the literature neglect
important output constraints induced by the servers’ thermal envelope.

2.2 Cooling and heat recovery


Repurposing waste data center heat has attracted a significant amount of interest over the
last few years. The potential benefits of implementing recovery systems has been investi-
gated by many authors. Among others: (Brunschwiler, Smith, et al. 2009; Zimmermann et al.
2012; Haywood et al. 2015; Paludetto and Lorente 2016). Viable application scenarios include
supplying the basic heat load needs to indoor complexes (Brunschwiler, I. G. Meijer, et al.
2010), greenhouses (Campen, Bot, and Zwart 2003), district heating (Brunschwiler, Smith,
et al. 2009; Wahlroos et al. 2017), desalination and refrigeration processes (Zimmermann et
al. 2012; Ebrahimi, Jones, and Fleischer 2014), and preheating of boiler feed water in power
plants (Marcinichen, Olivier, Lamaison, et al. 2016). On-site generation and reuse opportuni-
ties are considered in (Araya, Jones, and Fleischer 2018; Ebrahimi, Jones, and Fleischer 2015).
In particular, (Ebrahimi, Jones, and Fleischer 2015) evaluates an absorption refrigeration cycle
to harvest heat from the exhaust of a first group of servers and produce new cooling resources
for a second group. Exploitation of data center waste heat resources can be done opportunisti-
cally, by tapping from the exhaust coolant of small co-located facilities (Woodruff et al. 2014),
or within the framework of hyperscale projects (Facebook 2019; see also Figure 2.5).
From the perspective of heat recovery, a source of thermal energy can be graded in terms
of quantity and quality (that is, temperature; Cengel and Boles 2015). Different trade-offs
between temperature and flow rate of the exhaust coolant lead to different recovery efficiencies.
In particular, implementing cost-effective heat recovery systems depends on the data center’s
ability to systematically act as a stable source of high quality heat, that is, to sustain outlet
flows with high temperatures (Marcinichen, Olivier, and Thome 2012; Marcinichen, Olivier,
Lamaison, et al. 2016). In this respect, liquid cooling solutions offer a number of practical
advantages. Indeed, air cooling operates with high temperature gradients between air and
the electronic components. This both produces the necessity to pre-cool the air (increasing
the energetic overhead) and results in low exergetic gains at the server outlet which hinder
the repurposing of waste heat. On the contrary, liquid coolants exhibit both higher thermal
capacitance and lower thermal resistance than air, allowing compact designs that match higher
power densities and result in generally smaller rates of exergy destruction. For example, using
hot water coolant enables heat recovery systems with efficiencies (up to 85 percent) which are
not possible in air-cooled settings (Zimmermann et al. 2012). Direct on-chip liquid cooling
allows a reduced temperature gap between the electronics and the coolant, which enables hot
inlet water (Zimmermann et al. 2012), exhaust coolant temperatures above 60◦ C (Druzhinin
et al. 2016), and thermodynamic heat pump cycles with high COP for heat sinks between 70◦ C
2.2. Cooling and heat recovery 27

Data Center DH station

Figure 2.5: Data center heat recovery concept for Facebook’s facility in Odense, Denmark. The
project aims to inject 100 000 MWh/year of thermal energy in a local district heating network
by tapping from more than 50 000 square meters of deployed computing infrastructure. Original
concept design: Facebook.

and 110◦ C (Arpagaus et al. 2018) that match the upcoming fourth generation of district heating
networks (Lund et al. 2018).
Whereas several modeling and control contributions have targeted flow provisioning prob-
lems in the context of reducing the energetic overhead of cooling, there is a lack of works
focusing dynamic and control-oriented strategies for heat recovery. Moreover, little attention
has been dedicated to the potential benefits of dynamic provisioning strategies in both liquid
cooled and air cooled equipment within the latter context. This is despite both the predomi-
nant role played by air-cooling technology in the industry and the ongoing large scale projects
aiming to tap from the vast heat resource (Facebook 2019).
Building blocks toward a control-oriented treatment of heat recovery policies are, for exam-
ple, (Brunschwiler, Smith, et al. 2009; Rubenstein et al. 2010; Marcinichen, Olivier, and Thome
2012; Ovaska, Dragseth, and Hanssen 2016), where the authors pursue to quantify the prospec-
tive efficiency gains of direct liquid cooling over other liquid-cooling and air-cooling technologies.
Convective liquid cooling has been investigated for 3D stacked architectures where using air
as the advection medium becomes inadequate due to the manifold increase in the power con-
sumption (Brunschwiler, Michel, et al. 2009; Coşkun, Ayala, Atienza, and T. S. Rosing 2011;
Coşkun, Ayala, Atienza, and T. Rosing 2011). However, only single package are focused, dis-
regarding any dynamical interactions with other passive and active components. (Koeln et al.
2016) lays a graph-based framework aimed at describing thermal and hydrodynamic systems
but the experimental scenario is not tailored to data center cooling applications.
The vast majority of data center heat recovery studies discuss energetic steady state char-
acterizations. Different levels of the cooling hierarchy have been addressed, including chip
level analyses, the server and the data center levels (Wälchli et al. 2010; Kasten et al. 2010;
Druzhinin et al. 2016; Zimmermann et al. 2012). The focus of the previous works reflects their
originating community. While their significance to the development of dynamical provisioning
strategies is enormous, the application of a control oriented lens is required to treat optimizing
28 Chapter 2. Greening data center cooling

cooling policies targeting heat recovery.


Chapter 3
Contributions

This chapter summarizes the specific scope and contribution of each manuscript submitted in
the collection of Part II. An overview of the connections between the manuscripts and the
different application scenarios and research questions is given in Table 3.1.

Paper Cooling Infrastructure Research


medium level questions
A air server 1, 3
B air server 1, 3
C liquid server 1, 2,
D air server 1, 2,
E air rack 1, 3, 4
F air data center 1, 3
G air data center 1, 3,
H air room 1, 4
I air room 1, 4

Table 3.1: Overview of application scenarios and research focus for each manuscript.

Paper A. Energy savings in data centers: A framework for modeling and control
of servers’ cooling

Published as: Riccardo Lucchese, Jesper Olsson, Anna-Lena Ljung, Winston Garcia-
Gabin, and Damiano Varagnolo, “Energy savings in data centers: A
framework for modeling and control of servers’ cooling”, IFAC World
Congress, 2017

Summary
Paper A focuses attention on improving the local energetic efficiency of provisioning cooling
resources to air cooled computing units. A model-based methodology is applied. The Paper
30 Chapter 3. Contributions

first shows how to capture the temperature dynamics of interest from numerical Computational
Fluid Dynamics (CFD) trials. Then, it discusses the formulation of a dynamic optimizing con-
troller that aims at minimizing the flow actuation cost online. A key stress point of the Paper
is the applicability of control-oriented modeling principles to multi-component platforms with
complex layouts.

Contribution
Paper A demonstrates the efficacy of control-oriented, semi-empirical, modeling approaches
in the context of air cooled data center equipment. It moreover shows the viability of these
modeling efforts in forming the building blocks of local flow provisioning controllers. The
proposed modeling framework adopts a nonlinear description of the direct and recirculation
airflow rates inside the enclosure, allowing it to capture the thermal interactions among the
components due to advection. A least squares system identification procedure for estimating
the model parameters from steady state experiments is detailed and applied to CFD simulation
data, further validating our modeling framework. Model-based control strategies are shown
to be able to provision the local cooling resources dynamically while respecting the thermal
envelope constraints imposed on the temperatures of the electronic components.

Paper B. On energy efficient flow provisioning in air-cooled data servers

Published as: Riccardo Lucchese and Andreas Johansson, “On energy efficient flow pro-
visioning in air-cooled data servers”, Elsevier Control Engineering Prac-
tice, 2019

Summary
Paper B considers the problem of provisioning the cooling resources dynamically at the
server level. A refined modeling and control framework is first proposed and then applied ex-
perimentally. An air cooled Open Compute server platform is used as a test bed. The Paper
discusses and evaluates a model-predictive control strategy in which the objective to be mini-
mized includes the temperature-dependent leakage dissipation at the CPUs.

Contribution
Paper B contributes a generic and flexible modeling framework to describe the temperature
dynamics of computing units, improving on the results of Paper A. The resulting models are
given in the form of state space dynamics that involves a low dimensional vector of thermal state
variables. The predominant heat transfer and storage phenomena are accounted for, yielding
descriptions that demonstrate excellent reconstruction accuracy. Paper B, moreover proposes
a more accurate accounting of the local cooling cost by considering the predominant power
31

consumption terms that are affected by the active cooling policy. In particular, it is argued
that leakage dissipation at the CPUs should be addressed as a cooling cost. The viability and
effectiveness of the methodology is demonstrated in full experimentally. The trade-off realized
by the dynamical controller when accounting for leakage are discussed, showing that higher
coolant flow rates can improve the local provisioning efficiency. This characterization of the
performance at the level of the single computing enclosure can be seen as a building block for
the analysis of flow provisioning problems involving multiple units.

Paper C. Controlled Direct Liquid Cooling of Data Servers

Published as: Riccardo Lucchese, Damiano Varagnolo, and Andreas Johansson, “Con-
trolled Direct Liquid Cooling of Data Servers”, IEEE Transactions on
Control Systems Technology, 2019

Summary
Paper C considers the problem of producing high quality heat harvests in direct-on-chip
liquid cooled data center equipment. A dynamical model-based flow provisioning controller is
designed that operates locally, at the level of the single enclosure. Experimental results per-
formed on a retrofitted Open Compute server platform demonstrate the benefit of dynamical
flow provisioning strategies across a wide range of relevant operating conditions.

Contribution
Paper C extends the modeling and control framework of Paper B and applies it to direct
liquid cooled computing units. A graph-oriented formalism is proposed to capture the temper-
ature dynamics of the cooled platform. Two graph overlays are used to encode the topology of
heat exchanges, including which passive and active components interact with the liquid cool-
ing loop or the surrounding environment. Experimental trials performed on the liquid cooled
test bed are used to identify its thermal network. The validation results demonstrate excellent
accuracy in capturing the unit’s temperature dynamics. Paper C moreover details the design
of model-based flow provisioning controllers targeting high temperature heat harvests. The
performance of the proposed feedback control strategy is compared to a static law in which the
coolant is provisioned at a constant rate. Experimental trials demonstrate the effectiveness and
quantify the overall benefit of dynamical provisioning strategies for producing higher quality
heat harvests. Characterizations of the exhaust flow rate and temperature are provided for a
range of operating conditions, supporting the development of heat recovery analyses for data
centers using on-chip liquid cooling solutions.
32 Chapter 3. Contributions

Paper D. On server cooling policies for heat recovery: exhaust air properties of an
Open Compute Windmill V2 platform

Published as: Riccardo Lucchese and Andreas Johansson, “On server cooling policies
for heat recovery: exhaust air properties of an Open Compute Windmill
V2 platform”, The 3rd IEEE Conference On Control Technology And
Applications, 2019

Summary
Paper D focuses its attention on the production of high quality heat in air cooled comput-
ing units. The modeling and control framework developed in Paper B is tailored to the new
scenario and evaluated experimentally on an Open Compute test bed. Paper D demonstrates
and quantifies the benefits of designing optimizing controllers with objectives that relate to the
exhaust air temperature produced by the computing unit.

Contribution
Paper D complements the analysis of Paper C by addressing the design of model-based flow
provisioning controllers that target heat recovery applications in the context of air cooled data
center equipment. The dynamical model developed in Paper B is validated with respect to its
accuracy at predicting the exhaust air temperature of the test bed under varying operating
conditions. An optimizing flow provisioning controller is designed that aims at maximizing
the average exhaust air quality while taking into account the safety constraints induced by
the equipment’s thermal envelope of operation. An experimental campaign demonstrates the
benefit of tailoring the controller for heat recovery by comparing its performance against that
of the default air provisioning strategy running on the platform. Paper D reports on key mea-
surements quantifying the trade-off between the exhaust coolant flow rate and its quality, for
the given Open Compute platform, across a range of operating conditions. The experimental
analysis is extended in silico to address higher inlet temperature scenarios and different upper
bounds on the maximum operating temperature of the on-board components.

Paper E. A study of fine and coarse actuation capabilities in air-cooled server racks:
control strategies and cost analysis

Submitted as: Riccardo Lucchese, Andreas Johansson, and Wolfgang Birk, “A study of
fine and coarse actuation capabilities in air-cooled server racks: control
strategies and cost analysis”, IEEE Transactions on Control Systems
Technology
33

Summary
Paper E introduces the concepts of fine and coarse Flow Provisioning Capabilities (FPCs)
in the context of air cooled data center equipment. For each FPC scenario, it considers how
to adaptively economically provision the cooling airflow through a fully shelved rack of Open
Compute Windmill servers. Novel local, collaborative, and global provisioning strategies are
devised to address a range of implementation complexity and performance trade-offs. In silico
results corresponding to a wide range of operating conditions are presented, supporting the
viability of coarse FPC assumptions.

Contribution
Paper E uses the dynamical model developed in Paper B to capture in silico cooling scenar-
ios involving a fully shelved rack of Open Compute Windmill servers. The aggregate dynamics
of the rack is then used as a virtual plant to evaluate and compare different control strategies
and different FPC assumptions. In particular, Paper E identifies and treats two capability sce-
narios: fine FPC, in which each computing unit can adjust the local rate at which the cooling
is provisioned independently from other servers in the rack; and coarse FPC, in which the only
manipulable flow rate is defined at the rack-level and, consequently, all the shelved enclosures
share the same flow rate at all times. Local and global flow provisioning controllers are designed
by assuming access to either local information alone or the global thermal state of the shelved
computing units. A third cooperative control strategy is proposed that uses the local optimal
controls to derive an overall well performing provisioning strategy targeting coarse FPC scenar-
ios. An extensive in silico campaign demonstrates that coarse FPC scenarios incur at most a
small performance loss while allowing for a dramatic reduction in the complexity of the control
architecture.

Paper F. On economic cooling of contained server racks using an indirect adiabatic


air handler

Submitted as: Riccardo Lucchese, Michele Lionello, Mirco Rampazzo, and Andreas Jo-
hansson, “On economic cooling of contained server racks using an indirect
adiabatic air handler”, SEMI-THERM 2020 Symposium

Summary
Paper F studies the economic operation of a free-cooling setup in which an Indirect Adi-
abatic Air Handler (IAAH) recovers heat from an array of server racks placed in a contained
aisle. Two different control strategies are presented and compared: the first one optimizes only
the process-side operations of the IAAH; the second strategy considers a more holistic approach
that simultaneously optimizes both the process side and room side operations of the unit. An in
34 Chapter 3. Contributions

silico experimental plan compares the performance of the two control strategies across different
external air temperature and humidity conditions, and varying heat loads within the computer
room.

Contribution
Paper F applies the results and concepts from Paper B and Paper E to the modeling of a
complete computer room cooling system. The calibrated model of an Open Compute Windmill
server platform is used to simulate the heat load produced by an ensemble of 504 computing
units shelved over 4 Open Compute Triplet racks. The provisioning of the cooling resources is
captured using the calibrated model of an indirect free-cooling unit using adiabatic humidifica-
tion. Two different control strategies are designed and compared, in which the process and room
side operations are either coordinated or uncoordinated. An extensive in silico energetic study
is developed considering different external air temperature and humidity conditions and a range
of computing workloads at the servers. The numerical results highlight the potential benefits
of adopting the coordinated strategy and provide a model-based basis to develop indications
addressing existing deployments operating in close conditions. Overall, the results reinforce the
role of model-based analysis as a means to inform technology development directions.

Paper G. Newton-like phasor extremum seeking control with application to cooling


data centers

Published as: Riccardo Lucchese, Michele Lionello, Mirco Rampazzo, Martin Guay,
and Khalid Atta, “Newton-like phasor extremum seeking control with
application to cooling data centers”, IFAC Nonlinear Control Systems
Conference, 2019

Summary
Paper G presents a newton-like multivariable Extremum Seeking Control (ESC) strategy
and applies it to the economic operation of a data center IAAH. The formal design of the
controller and its stability properties are discussed in detail. A simulated cooling scenario is
used to evaluate the effectiveness of the extremum-seeking controller in minimizing the cooling
cost despite the minimal a priori knowledge on the plant.

Contribution
Paper G extends an existing derivative-estimation strategy used in Extremum Seeking Con-
trol (ESC) to a multivariable plant setting. The derivative-estimation block is then used to
formulate a newton-like update rate for the plant parameters that need to be optimized online.
A proof of the local asymptotic stability of the strategy is outlined for the averaged dynamics.
Paper G then applies this gradient-free optimization strategy to the economic provisioning of
35

the cooling resources in which an IAAH supplies chilled air to a computer room. The strategy is
shown to reject environmental and computing workload disturbances while continuously steer-
ing the plant toward the optimal operating conditions. Paper G opens new research directions
with respect to addressing explicitly the output constraints in the controller structure.

Paper H. ColdSpot: A thermal supervisor aimed at server rooms implementing a


raised plenum cooling setup

Published as: Riccardo Lucchese and Andreas Johansson, “ColdSpot: A thermal su-
pervisor aimed at server rooms implementing a raised plenum cooling
setup”, IEEE American Control Conference, 2019

Summary
Paper H considers how to provision the cooling airflow in computer rooms that deploy a
raised plenum and perforated tiles setups. The Paper presents in detail ColdSpot: an optimiz-
ing thermal supervisor that regulates the cooling airflow across the floor plane, adaptively with
respect to the spatial distribution of the heat load. ColdSpot forms its control decisions by
combining a model-free estimation strategy of the flow requirements at each tile, and a model-
based optimization step to determine the minimum cost controls satisfying these requirements.
The Paper validates the tile flow modeling approach and flow requirements estimation strategy
in silico.

Contribution
Paper H targets a multivariable flow provisioning problem for a popular class of data cen-
ter computer room cooling setups. Its contribution is threefold. First, Paper H suggests a
methodology for control system design addressing flooded computer room environments. A
model-based strategy is adopted to explain the airflow rates at the perforated tiles in terms of
the Air Cooling Units (ACUs)’ working points. A model-free robust strategy is used to cope
with the complex flow behavior in the space above the floor. Secondly, Paper H suggests and
validates Gaussian Processes (GPs) as a model structure to capture the flow behavior at the
tiles. A state-of-art CFD simulation tool targeting data centers is used to generate the tile
flow models ground truth. The numerical campaign data is used to estimate and validate the
GP model at each tile, demonstrating good reconstruction accuracy and the overall fitness of
the model structure. The analysis in the Paper points out the limits of zonal flow provisioning
control strategies that pair the set of ACUs and temperature monitoring points (across the
room space) based on intuitive proximity considerations.
36 Chapter 3. Contributions

Paper I. Computing the allowable uncertainty of sparse control configurations

Submitted as: Riccardo Lucchese and Wolfgang Birk, “Computing the allowable un-
certainty of sparse control configurations”, Elsevier Journal of Process
Control

Summary
Paper I addresses Control Configuration Selection (CCS) problems in the presence of para-
metric plant uncertainty. A novel randomized search algorithm is proposed to assess and quan-
tify the robustness of the nominal control configuration. Different benchmark examples are
discussed in some detail, providing improved robustness characterizations with respect to the
existing literature. A data center flow provisioning problem inspired by Paper H is considered.
The application of formal Control Configuration Selection (CCS) tools to this latter setting
sheds insight into the performance of zonal control strategies in raised plenum and perforated
tiles setups.

Contribution
The contribution of Paper I is to design a generally applicable strategy to address uncertain
CCS problems. An algorithmic randomized strategy is proposed to quantify the amount of un-
certainty that may be tolerated by a nominal plant model. The proposed estimation algorithm
can be applied as long as the control configuration protocol can be sampled efficiently at a finite
(but a priori unknown) number of points in the domain of uncertain parameters. Within the
scope of the thesis, Paper I contributes an analysis of popular, computer room, zonal control
structures that are designed on the basis of intuitive distance-based considerations. The novelty
is then in recognizing the nature of these problems as CCS problems and in their analysis using
formal control theoretic tools. The numerical case study highlights the limitations of zonal con-
trol strategies of the previous kind, and supports, instead, more systematic flow provisioning
architectures such as the one of Paper H.

3.1 Other work


Within the broader scope of the doctoral program, other manuscripts have addressed automatic
control problems relating to randomized distributed computations and extremum seeking con-
trol. The list below collects those supplementary works that are already published or accepted.
• K. Atta, M. Guay, R. Lucchese (2019). “A geometric phasor extremum seeking control
approach with measured constraints”. In: IEEE Conference on Decision and Control.

• R. Lucchese, D. Varagnolo (2016). “A Tight Bound on the Bernoulli Trials Network Size
Estimator”. In: IEEE Conference on Decision and Control.
3.1. Other work 37

• R. Lucchese, D. Varagnolo (2015). “Average consensus via max consensus”. In: IFAC
Workshop on Distributed Estimation and Control in Networked Systems.

• R. Lucchese, D. Varagnolo, J.-C. Delvenne, J. Hendrickx (2015). “Network cardinality


estimation using max consensus: the case of Bernoulli trials”. In: IEEE Conference on
Decision and Control.

• R. Lucchese, D. Varagnolo (2015). “Networks cardinality estimation using order statis-


tics”. In: IEEE American Control Conference.
Chapter 4
Conclusions and future directions

This thesis considers adaptive flow provisioning problems in data centers. We present novel
modeling frameworks and control strategies to support the design and co-design of energy-
aware and utility-aware cooling policies. Our effort targets uncoordinated controllers belonging
to Class I in the taxonomy of Section 1.5. However, the entirety of the research outcomes
can be seen as forming a library of building blocks (including models, control tools, and de-
sign methodologies) toward the enablement of a higher degree of process self-awareness, as
required by Class II and Class III supervisory controllers. Our line of development proceeds
top-down, targeting existing control problems in currently deployed systems, and bottom-up,
laying a model-based foundation on which a further development of thermal-aware policies
is envisioned. The boundaries of this research have been shaped by two guiding objectives:
i) improving the efficiency of provisioning the cooling resources by adopting dynamic control
policies, and ii) enabling higher quality heat harvests that are instrumental to enhancing ROI
and sustainability indexes for the upcoming wave of applications exploiting data center heat
recovery.

A significant portion of the thesis is dedicated to the development and exploitation of


semi-empirical models of the temperature dynamics of data center equipment. A particular
focus is set on the computing units. Both air cooled and liquid cooled platforms are considered.
This novel understanding and characterization of the thermal generation and harvesting process
enables the computer-aided design of dynamic flow provisioning strategies, targeting either cost-
effective operations or enhanced heat harvest quality. Leakage dissipation at the electronics is
investigated from the perspective of flow provisioning controllers demonstrating how, on one
hand, accounting for temperature-dependent leakage allows to more accurately capture the cost
of cooling the computing equipment and, on the other hand, leveraging these effects induces
some degree of flexibility in the control strategies.
The adoption of optimizing flow provisioning strategies is considered as a key enabler toward
recovering data center heat. The thesis discusses both air and liquid-cooled platforms in detail,
highlighting the existing trade-offs between quantity and quality of the exhaust flows, and thus
40 Chapter 4. Conclusions and future directions

providing experimental characterizations for the further analysis of heat recovery applications.
Building on top of these developments, we introduce the concepts of fine and coarse FPC.
Through the analysis of different rack-level flow provisioning policies, we show that feedback
controllers operating under the coarse FPC scenario can achieve nearly optimal cost perfor-
mance, while allowing a dramatic reduction in the number of flow rate variables that need to
be optimized online. These results consolidate the significance of coarse FPC assumptions and
reflect current aisle pressurization trends.

Representative of our contribution in the top-down line of development, we have evaluated


flow provisioning problems at the computer room level, proposing optimizing control strategies
for operating an indirect free-cooling unit and for the adaptive spatial provisioning of cooling
resources within computer rooms. The first case study, treating a computer room serviced by
an Indirect Adiabatic Air Handler (IAAH), revealed crucial insights on the optimal power allo-
cation among the IAAH’s actuators on the process and room sides, highlighting the importance
of manipulable internal airflow rates. The analysis relative to the second case study, considering
the spatial provisioning of the chilled air in raised plenum and perforated tiles setups shows
that adaptive provisioning on the room side is of critical importance to reduce the overhead
incurred by the cooling infrastructure. The application of formal Control Configuration Selec-
tion (CCS) tools to study flow provisioning control structures within computer rooms, suggests
that the zonal control architectures adopted in practice induce preferred control configuration
that change drastically depending on the operating point.

4.1 Future directions


The contributions of this thesis should be assessed in view of the practical value of leveraging
predictions from the proposed ancillary building blocks to inform the cooling decisions. Direc-
tions of immediate interest encompass then the implementation and assessment of the proposed
strategies in larger scale test beds.
At the center of upcoming efforts we place, moreover, the co-design of both the mechanical
and control aspects of future data center infrastructures. As highlighted in one of our study
cases, optimal provisioning controllers can yield significant cost savings, which are unfeasible
through improvements in the mechanical refrigeration technology alone. It is then of interest
to ask which mechanical cooling designs can enable simple control architectures while retaining
the performance of model-based, global, optimizing controllers.
The further development of control-oriented models will enable coordinated control strate-
gies in Class II and III, that build on the self-awareness of the data center’s thermal dynamics.
A formalization and analysis of thermal-aware workload schedulers that exploit the accurate
description of the heat generation process is envisioned.
Finally, as future data centers will emphasize even more the adoption of renewables and
heat recovery systems, the experimental investigation of the coupling between data center
4.1. Future directions 41

infrastructures and potential heat recovery and storage applications is of particular interest.
References

Abts, Dennis et al. (2010). “Energy proportional datacenter networks”. In: Int. Symposium on
Computer Architecture. ACM.
Acton, Mark et al. (2018). 2018 Best Practice Guidelines for the EU Code of Conduct on Data
Centre Energy Efficiency. Tech. rep.
Agostini, Bruno et al. (2007). “State of the Art of High Heat Flux Cooling Technologies”. In:
Heat Transfer Engineering.
Ahmad, Raja W. et al. (2015). “A survey on virtual machine migration and server consolidation
frameworks for cloud data centers”. In: Journal of Network and Computer Applications.
Alaraifi, Adel, Alemayehu Molla, and Hepu Deng (2012). “An exploration of data center infor-
mation systems”. In: Journal of Systems and Information Technology.
Andreadis, Georgios et al. (2019). “A reference architecture for datacenter scheduling: Design,
validation, and experiments”. In: Int. Conference for High Performance Computing, Net-
working, Storage, and Analysis. IEEE.
Araya, Sebastian, Gerard F. Jones, and Amy S. Fleischer (2018). “Organic Rankine Cycle as
a Waste Recovery System for Data Centers: Design and Construction of a Prototype”. In:
2018 17th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena
in Electronic Systems (ITherm).
Arghode, Vaibhav K. et al. (2019). “Thermal Characteristics of Open and Contained Data
Center Cold Aisle”. In: Journal of Heat Transfer.
Arpagaus, Cordin et al. (2018). “High temperature heat pumps: Market overview, state of the
art, research status, refrigerants, and application potentials”. In: Energy.
ASHRAE (2011). Thermal Guidelines for Data Processing Environments. Tech. rep.
— (2012). Datacom Equipment Power Trends and Cooling Applications.
— (2014). Liquid Cooling Guidelines for Datacom Equipment Centers. Tech. rep.
Avelar, Victor, Dan Azevedo, and Alan French (2012). PUE: A Comprehensive Examination of
the Metric. Tech. rep. The Green Grid.
Ayoub, Raid, Rajib Nath, and Tajana Rosing (2012). “JETC: Joint energy thermal and cooling
management for memory and CPU subsystems in servers”. In: Int. Symposium on High-
Performance Computer Architecture. IEEE.
Barroso, Luiz André, Jimmy Clidaras, and Urs Hölzle (2013). The Datacenter as a Computer.
Morgan & Claypool.
44 References

Bash, Cullen E., Chandrakant D. Patel, and Ratnesh K. Sharma (2003). “Efficient Thermal
Management of Data Centers - Immediate and Long-Term Research Needs”. In: HVAC&R
Research.
— (2006). “Dynamic Thermal Management of Air Cooled Data Centers”. In: Conference on
Thermal and Thermomechanical Phenomena in Electronics Systems. IEEE.
Beghi, Alessandro, Giuseppe Dalla Mana, et al. (2017). “Energy-efficient operation of an indirect
adiabatic cooling system for data centers”. In: American Control Conference. IEEE.
Beghi, Alessandro, Michele Lionello, and Mirco Rampazzo (2019). “Efficient Operation of Indi-
rect Evaporative Data Center Cooling Systems Via Newton-Like Extremum-Seeking Con-
trol”. In: Conference on Control Technology and Applications. IEEE.
Beghi, Alessandro, Giuseppe Dalla Mana, et al. (2017). “Data-Driven Supervisory Control of
Indirect Adiabatic Cooling Systems”. In: American Control Conference. IEEE.
Belady, Christian et al. (2010). Carbon Usage Effectiveness (CUE): A green grid data center
sustainability metric. Tech. rep. The Green Grid.
Beloglazov, Anton, Jemal Abawajy, and Rajkumar Buyya (2012). “Energy-aware resource allo-
cation heuristics for efficient management of data centers for Cloud computing”. In: Future
Generation Computer Systems.
Bhat, Shrikant et al. (2013). “Keeping it cool”. In: ABB review: Data centers.
Bianzino, Aruna P. et al. (2012). “A Survey Of Green Networking Research”. In: IEEE Com-
munications Surveys and Tutorials.
Boer, Gregory Nicholas de et al. (2018). “Three computational methods for analysing thermal
airflow distributions in the cooling of data centres”. In: Int. Journal of Numerical Methods
for Heat & Fluid Flow.
Breen, Thomas J., Ed J. Walsh, Jeff Punch, Amip J. Shah, and Cullen E. Bash (2010). “From
chip to cooling tower data center modeling: Influence of server inlet temperature and tem-
perature rise across cabinet”. In: Thermal and Thermomechanical Phenomena in Electronics
Systems. IEEE.
Breen, Thomas J., Ed J. Walsh, Jeff Punch, Amip J. Shah, Cullen E. Bash, et al. (2012).
“From Chip to Cooling Tower Data Center Modeling: Chip Leakage Power and Its Impact
on Cooling Infrastructure Energy Efficiency”. In: Journal of Electronic Packaging.
Brooks, David and Margaret Martonosi (2001). “Dynamic thermal management for high-
performance microprocessors”. In: Symposium on High-Performance Computer Architecture.
IEEE.
Brown, Eric (2012). Electronics Disposal Efficiency (EDE): an IT recycling metric for enter-
prises and data centers. Tech. rep. The Green Grid.
Brown, Richard et al. (2007). Report to Congress on Server and Data Center Energy Efficiency:
Public Law 109-431. Tech. rep.
Brunschwiler, Thomas, Ingmar G. Meijer, et al. (2010). “Direct Waste Heat Utilization from
Liquid-cooled Supercomputers”. In: Proceedings of the Int. Heat Transfer Conference. ASME.
References 45

Brunschwiler, Thomas, Bruno Michel, et al. (2009). “Interlayer Cooling potential in vertically
integrated packages”. In: Microsystem Technologies.
Brunschwiler, Thomas, Brian Smith, et al. (2009). “Toward zero-emission data centers through
direct reuse of thermal energy”. In: Journal of Research and Development.
Bukht, Rumana and Richard Heeks (2017). Defining, Conceptualising and Measuring the Digital
Economy. Tech. rep. Manchester Centre for Development Informatics.
Campen, Jouke B., Gerard P.A. Bot, and Hendrik F. de Zwart (2003). “Dehumidification of
Greenhouses at Northern Latitudes”. In: Biosystems Engineering.
Capozzoli, Alfonso and Giulio Primiceri (2015). “Cooling Systems in Data Centers: State of
Art and Emerging Technologies”. In: Energy Procedia.
Carr, Nicholas G. (2005). “The End of Corporate Computing”. In: MITSloan Management
Review.
Cengel, Yunus A. and Michael A. Boles (2015). Thermodynamics. McGraw-Hill.
Chan, Christine S. et al. (2012). “Fan-speed-aware scheduling of data intensive jobs”. In: In-
ternational Symposium on Low-Power Electronics and Design. ACM.
Chen, Jinzhu et al. (2014). “PTEC : A System for Predictive Thermal and Energy Control in
Data Centers”. In: IEEE Symposium on real-time systems. IEEE.
Chen, Yiyu et al. (2005). “Managing server energy and operational costs in hosting centers”. In:
Int. Conference on Measurement and Modeling of Computer Systems. ACM SIGMETRICS.
Chen, Yuan et al. (2010). “Integrated management of application performance, power and
cooling in data centers”. In: Network Operations and Management Symposium. IEEE.
Choi, Jeonghwan et al. (2008). “A CFD-based tool for studying temperature in rack-mounted
servers”. In: IEEE Transactions on Computers.
Cisco (2019). Cisco Visual Networking Index: Forecast and Trends 2017-2022. Tech. rep.
Clipp, Celeste et al. (2014). Digital Infrastructure and Economic Development: An impact as-
sessment of Facebook’s data center in Northern Sweden. Tech. rep. The Boston Consulting
Group.
Coşkun, Ayşe K., José L. Ayala, David Atienza, and Tajana Rosing (2011). “Thermal Modeling
and Management of Liquid-Cooled 3D Stacked Architectures”. In: VLSI-SoC: Technologies
for Systems Integration. Springer.
Coşkun, Ayşe K., José L. Ayala, David Atienza, and Tajana Simunic Rosing (2011). “Modeling
and dynamic management of 3D multicore systems with liquid cooling”. In: Int. Conference
on Very Large Scale Integration. IEEE.
Dayarathna, Miyuru, Yonggang Wen, and Rui Fan (2016). “Data center energy consumption
modeling: A survey”. In: Communications Surveys & Tutorials.
Demaine, Erik D. et al. (2016). “Energy-Efficient Algorithms”. In: Conference on Innovations
in Theoretical Computer Science. ACM.
Druzhinin, Egor A. et al. (2016). “Server Level Liquid Cooling: Do Higher System Temperatures
Improve Energy Efficiency?” In: Supercomputing frontiers and innovations.
46 References

Ebrahimi, Khosrow, Gerard F. Jones, and Amy S. Fleischer (2014). “A review of data cen-
ter cooling technology, operating conditions and the corresponding low-grade waste heat
recovery opportunities”. In: Renewable and Sustainable Energy Reviews.
— (2015). “Thermo-economic analysis of steady state waste heat recovery in data centers using
absorption refrigeration”. In: Applied Energy.
European Commission (2019). “Commission Regulation 2019/424”. In: Official Journal of the
European Union.
Evans, Tony (2012). The Different Technologies for Cooling Data Centers. Tech. rep. Schneider
Electric.
Facebook (2019). Odense Data Center. url: facebook.com/OdenseDataCenter/.
Frachtenberg, Eitan (2012). “Holistic datacenter design in the open compute project”. In: IEEE
Computer.
Frachtenberg, Eitan et al. (2012). “Thermal design in the open compute datacenter”. In: Con-
ference on Thermal and Thermomechanical Phenomena in Electronic Systems.
Fu, Yong et al. (2010). “Feedback Thermal Control for Real-time Systems”. In: Real-Time and
Embedded Technology and Applications Symposium. IEEE.
Gandhi, Anshul et al. (2009). “Optimal power allocation in server farms”. In: Int. Conference
on Measurement and Modeling of Computer Systems. ACM.
GeSI and BCG (2012). GeSI SMARTer 2020: The Role of ICT in Driving a Sustainable Future.
Tech. rep.
Glinkowski, Mietek (2013a). “Data center defined”. In: ABB review: Data centers.
— (2013b). “Designed for uptime”. In: ABB review: Data centers.
Gong, Haifeng and Gabriel C. Cox (2016). “Thermal Management in Rack Scale Architecture
System with Shared Power and Shared Cooling”. In: Thermal Measurement, Modeling &
Management Symposium. IEEE.
GSMA (2019). The Mobile Economy. Tech. rep. GSMA.
Haywood, Anna M. et al. (2015). “The relationship among CPU utilization, temperature, and
thermal power for waste heat utilization”. In: Energy Conversion and Management.
Huang, Wei et al. (2011). “TAPO: Thermal-aware power optimization techniques for servers
and data centers”. In: Int. Green Computing Conference. IEEE.
Intel (2019). Intel Xeon Platinum 9282 Processor. url: ark.intel.com/content/www/us/
en/ark/products/194146/intel-xeon-platinum-9282-processor-77m-cache-2-60-
ghz.html.
Kasten, Peter et al. (2010). “Hot water cooled heat sinks for efficient data center cooling:
Towards electronic cooling with high exergetic utility”. In: Frontiers in Heat and Mass
Transfer.
Khaitan, Siddhartha K. and James D. McCalley (2015). “Design techniques and applications
of cyberphysical systems: A survey”. In: Systems Journal.
References 47

Kim, Jungsoo et al. (2014). “Global Fan Speed Control Considering Non-Ideal Temperature
Measurements in Enterprise Servers”. In: Design Automation & Test in Europe Conference.
IEEE.
Koeln, Justin P. et al. (2016). “Experimental validation of graph-based modeling for thermal
fluid power flow systems”. In: Dynamic Systems and Control Conference. ASME.
Kogge, Peter et al. (2008). ExaScale Computing Study: Technology Challenges in Achieving
Exascale Systems. Tech. rep.
Koomey, Jonathan G. et al. (2011). “Implications of Historical Trends in the Electrical Efficiency
of Computing”. In: Annals of the History of Computing.
Lazic, Nevena et al. (2018). “Data center cooling using model-predictive control”. In: Neural
Information Processing Systems.
Le, Duc V. et al. (2019). “Control of Air Free-Cooled Data Centers in Tropics via Deep Rein-
forcement Learning”. In: Int. Conference on Systems for Energy-Efficient Buildings, Cities,
and Transportation. ACM.
Lee, Chengming and Rongshun Chen (2015). “Optimal self-tuning PID controller based on low
power consumption for a server fan cooling system”. In: Sensors.
Lee, Edward A. (2008). “Cyber physical systems: Design challenges”. In: Symposium on Object
Oriented Real-Time Distributed Computing. IEEE.
Leverich, Jacob et al. (2010). “Power management of datacenter workloads using per-core power
gating”. In: IEEE Computer Architecture Letters.
Li, Li et al. (2014). “Coordinating Liquid and Free Air Cooling with Workload Allocation for
Data Center Power Minimization”. In: Int. Conference on Autonomic Computing. IEEE.
— (2015). “Placement optimization of liquid-cooled servers for power minimization in data
centers”. In: Int. Green Computing Conference. IEEE.
Liao, Weiping, Lei He, and Kevin M. Lepak (2005). “Temperature and supply voltage aware
performance and power modeling at microarchitecture level”. In: Transactions on Computer-
Aided Design of Integrated Circuits and Systems.
Liu, Zhenhua et al. (2015). “Greening geographical load balancing”. In: Transactions on Net-
working.
Lund, Henrik et al. (2018). The status of 4th generation district heating: Research and results.
Malmodin, Jens et al. (2010). “Greenhouse gas emissions and operational electricity use in the
ICT and entertainment & Media sectors”. In: Journal of Industrial Ecology.
Marcinichen, Jackson B., Jonathan A. Olivier, Nicolas Lamaison, et al. (2016). “Advances in
Electronics Cooling”. In: Heat Transfer Engineering.
Marcinichen, Jackson B., Jonathan A. Olivier, and John R. Thome (2012). “On-chip two-phase
cooling of datacenters: Cooling system and energy recovery evaluation”. In: Applied Thermal
Engineering.
Meijer, G. Ingmar (2010). “Cooling energy-hungry data centers”. In: Science.
Minet, Pascale et al. (2018). “Analyzing Traces from a Google Data Center”. In: Int. Wireless
Communications and Mobile Computing Conference. IEEE.
48 References

Moore, Justin et al. (2005). “Making Scheduling “ Cool ”: Temperature-Aware Workload Place-
ment in Data Centers”. In: USENIX Annual Technical Conference.
Mukherjee, Tridib et al. (2009). “Spatio-temporal thermal-aware job scheduling to minimize
energy consumption in virtualized heterogeneous data centers”. In: Computer Networks.
Nadjaran Toosi, Adel et al. (2017). “Renewable-aware geographical load balancing of web ap-
plications for sustainable data centers”. In: Journal of Network and Computer Applications.
Nemati, Kourosh et al. (2016). “Experimental Characterization of a Rear Door Heat Exchanger
with Localized Containment”. In: Conference on Thermal and Thermomechanical Phenom-
ena in Electronic Systems. IEEE.
Ni, Jiacheng and Xuelian Bai (2017). “A review of air conditioning energy performance in data
centers”. In: Renewable and Sustainable Energy Reviews.
Ogawa, Masatoshi et al. (2015). “Development of a cooling control system for data centers
utilizing indirect fresh air based on Model Predictive Control”. In: Congress on Ultra Modern
Telecommunications and Control Systems. IEEE.
Oldewurtel, Frauke et al. (2012). “Use of model predictive control and weather forecasts for
energy efficient building climate control”. In: Energy and Buildings.
Ovaska, Seppo J., Roy E. Dragseth, and Svenn A. Hanssen (2016). “Direct-to-chip liquid cooling
for reducing power consumption in a subarctic supercomputer centre”. In: Int. Journal of
High Performance Computing and Networking.
Palomar, Oscar et al. (2016). “Energy Minimization at all Layers of the Data Center: The
ParaDIME Project”. In: Design, Automation & Test in Europe Conference. IEEE.
Paludetto, Delphine and Sylvie Lorente (2016). “Modeling the heat exchanges between a dat-
acenter and neighboring buildings through an underground loop”. In: Renewable Energy.
Parolini, Luca (2012). “Models and Control Strategies for Data Center Energy Efficiency”. PhD
thesis. Carnegie Mellon University.
Parolini, Luca, Emanuele Garone, et al. (2010). “A hierarchical approach to energy management
in data centers”. In: Conference on Decision and Control. IEEE.
Parolini, Luca, Bruno Sinopoli, and Bruce H. Krogh (2011). “Model Predictive Control of Data
Centers in the Smart Grid Scenario”. In: Proceedings of the 18th IFAC World Congress.
Elsevier.
Parolini, Luca, Bruno Sinopoli, Bruce H. Krogh, and Zhikui Wang (2012). “A cyber-physical
systems approach to data center modeling and control for energy efficiency”. In: Proceedings
of the IEEE.
Patankar, Suhas V. (2010). “Airflow and Cooling in a Data Center”. In: Journal of Heat Trans-
fer.
Patel, Chandrakant D. et al. (2003). “Smart Cooling of Data Centers”. In: Int. Conference on
Packaging and Integration of Electronic and Photonic Microsystems. ASME.
Patterson, Michael K., Dan Azevedo, et al. (2011). Water Usage Effectiveness (WUE): A green
grid data center sustainability metric. Tech. rep. The Green Grid.
References 49

Patterson, Michael K., D. G. Costello, et al. (2007). Data center TCO: a comparison of high-
density and low-density spaces. Tech. rep.
Patterson, Michael K. and Dave Fenwick (2008). The State of Data Center Cooling. Tech. rep.
Patterson, Michael K., Bill Tschudi, et al. (2010). Ere: A Metric for Measuring the Benefit of
Reuse Energy From a Data Center. Tech. rep. The Green Grid.
Pradelle, Benoît et al. (2014). “Energy-centric dynamic fan control”. In: Computer Science -
Research and Development.
Qureshi, Asfandyar et al. (2009). “Cutting the electric bill for internet-scale systems”. In: SIG-
COMM Conference on Data Communication. ACM.
Raghavendra, Ramya et al. (2008). “No Power Struggles: Coordinated Multi-level Power Man-
agement for the Data Center”. In: Solutions.
Rajamani, Karthick et al. (2010). “Power-performance management on an IBM POWER7
server”. In: Int. Symposium on Low-Power Electronics and Design. IEEE.
Rambo, Jeffrey and Yogendra Joshi (2006). “Convective transport processes in data centers”.
In: Numerical Heat Transfer.
— (2007). “Modeling of data center airflow and heat transfer: State of the art and future
trends”. In: Distributed and Parallel Databases.
Rampazzo, Mirco, Michele Lionello, Alessandro Beghi, et al. (2019). “A static moving boundary
modelling approach for simulation of indirect evaporative free cooling systems”. In: Applied
Energy.
Rampazzo, Mirco, Michele Lionello, Fernando Carpignani Panebianco, et al. (2018). “Model-free
Control of Data Center Compressor-based Cooling Systems”. In: Int. Forum on Research
and Technology for Society and Industry. IEEE.
Reddy, V. Dinesh et al. (2017). “Metrics for Sustainable Data Centers”. In: Transactions on
Sustainable Computing.
Rivoire, Suzanne et al. (2007). “Models and metrics to enable energy-efficiency optimizations”.
In: Computer.
Rong, Huigui et al. (2016). “Optimizing energy consumption for data centers”. In: Renewable
and Sustainable Energy Reviews.
Rubenstein, Brandon A. et al. (2010). “Hybrid cooled data center using above ambient liq-
uid cooling”. In: Conference on Thermal and Thermomechanical Phenomena in Electronic
Systems. IEEE.
Schärer, André (2013). “DC for efficiency”. In: ABB review: Data centers.
Schmidt, Roger R., Ethan E. Cruz, and Madhusudan K. Iyengar (2005). “Challenges of data
center thermal management”. In: Journal of Research and Development.
Shanahan, Jim (2013). “Automated excellence”. In: ABB review: Data centers.
Shehabi, Arman et al. (2016). United States Data Center Energy Usage Report. Tech. rep.
Shin, Donghwa et al. (2009). “Energy-optimal dynamic thermal management for green com-
puting”. In: Int. Conference on Computer-Aided Design. IEEE.
50 References

Silva-Llanca, Luis et al. (2019). “Cooling effectiveness of a data center room under overhead
airflow via entropy generation assessment in transient scenarios”. In: Entropy.
Singh, Aameek, Madhukar Korupolu, and Dushmanta Mohapatra (2008). “Server-storage vir-
tualization: Integration and load balancing in data centers”. In: Int. Conference for High
Performance Computing, Networking, Storage and Analysis. IEEE.
Skadron, Kevin et al. (2004). “Temperature-Aware Microarchitecture: Modeling and Implemen-
tation”. In: Transactions on Architecture and Code Optimization.
Song, Zhihang, Bruce T. Murray, and Bahgat Sammakia (2013). “Airflow and temperature
distribution optimization in data centers using artificial neural networks”. In: Int. Journal
of Heat and Mass Transfer.
— (2014). “Numerical investigation of inter-zonal boundary conditions for data center thermal
analysis”. In: Int. Journal of Heat and Mass Transfer.
Srinarayana, Nagarathinam et al. (2014). “Thermal performance of an air-cooled data center
with raised-floor and non-raised-floor configurations”. In: Heat Transfer Engineering.
Sturzenegger, David et al. (2016). “Model predictive climate control of a swiss office building:
Implementation, results, and cost-benefit analysis”. In: Transactions on Control Systems
Technology.
Tang, Qinghui, Sandeep K. S. Gupta, and Georgios Varsamopoulos (2007). “Thermal-Aware
Task Scheduling for Data Centers Through Minimizing Heat Recirculation”. In: Conference
on Cluster Computing. IEEE.
— (2008). “Energy-efficient thermal-aware task scheduling for homogeneous high-performance
computing data centers: A cyber-physical approach”. In: Transactions on Parallel and Di-
stributed Systems.
Telecommunications Industry Association (2012). Telecommunications Infrastructure Standard
for Data Centers TIA-942-A. Tech. rep.
The Climate Group (2008). SMART 2020: Enabling the low carbon economy in the information
age. Tech. rep.
TIA (2018). Edge Data Centers. Tech. rep.
Toulouse, Michael M. et al. (2009). “Exploration of a Potential-Flow-Based Compact Model of
Air-Flow Transport in Data Centers”. In: Proc. Amer. Soc. Mech. Eng. Conf. ASME.
Uptime Institute (2019). 2019 Annual Data Center Survey Results. Tech. rep.
Vanderster, Daniel C., Amirali Baniasadi, and Nikitas J. Dimopoulos (2007). “Exploiting task
temperature profiling in temperature-aware task scheduling for computational clusters”. In:
Lecture Notes in Computer Science. Springer.
Vangilder, James W. and Saurabh K. Shrivastava (2006). Real-Time Prediction of Rack-Cooling
Performance. Tech. rep.
VanGilder, James W. et al. (2011). “Potential flow model for predicting perforated tile airflow
in data centers”. In: ASHRAE Transactions.
Wahlroos, Mikko et al. (2017). “Utilizing data center waste heat in district heating – Impacts on
energy efficiency and prospects for low-temperature district heating networks”. In: Energy.
References 51

Wälchli, R. et al. (2010). “Combined local microchannel-scale CFD modeling and global chip
scale network modeling for electronics cooling design”. In: Int. Journal of Heat and Mass
Transfer.
Waldrop, M. Mitchell (2016). “More than moore”. In: Nature.
Wang, Zhikui et al. (2009). “Optimal fan speed control for thermal management of servers”.
In: Int. Electronic Packaging Technical Conference. ASME.
Woodruff, Zachary J. et al. (2014). “Environmentally opportunistic computing: A distributed
waste heat reutilization approach to energy-efficient buildings and data centers”. In: Energy
and Buildings.
Zapater, Marina, José L. Ayala, et al. (2013). “Leakage and Temperature Aware Server Control
for Improving Energy Efficiency in Data Centers”. In: Design, Automation & Test in Europe
Conference. IEEE.
Zapater, Marina, José L. Risco-martín, et al. (2016). “Runtime data center temperature pre-
diction using Grammatical Evolution techniques”. In: Applied Soft Computing.
Zheng, Qinling et al. (2018). “An optimized active disturbance rejection approach to fan control
in server”. In: Control Engineering Practice.
Zhu, Xiaoyun et al. (2008). “1000 Islands: Integrated capacity and workload management for
the next generation data center”. In: Int. Conference on Autonomic Computing. IEEE.
Zimmermann, Severin et al. (2012). “Aquasar: A hot water cooled data center with direct energy
reuse”. In: Energy.

You might also like