Trends in Computing
Architecture
CMSC828E
Ramani Duraiswami
Several slides taken from a Microway/NVIDIA webinar
Some figures adapted from web sources
Problem sizes in simulation and data processing
are increasing
• Change in paradigm in science
– Simulate then test
– Fidelity demands larger simulations
– Problems being simulated are also much more
• Sensors are getting varied and cheaper; and storage is
getting cheaper
– Cameras, microphones
• Other Large data
– Text (all the newspapers, books, technical papers)
– Genome data
– Medical/biological data (X-Ray, PET, MRI, Ultrasound, Electron
microscopy …)
– Climate (Temperature, Salinity, Pressure, Wind, Oxygen content, …)
Ways to attack problem size
growth
• Faster algorithms with better asymptotic
complexity
• Faster processors
– “Moore’s law will take care of it”
• Go parallel!
– Clusters of computers
– New data parallel chips (multicore processors,
GPUs)
“Moore’s Law will take care of it”
• Not law but an
observation by Gordon
Moore in the 1960s
• Number of transistors
doubles every 18
months
• Basically has been
taken to mean that the
“standard computer”s
performance improves
exponentially, with a
doubling time of 18
months
Refuting the Moore’s law argument
• Argument:
– Moore’s law: Processor speed doubles every 18 months
– If we wait long enough the computer will get fast enough and let my
inefficient algorithm tackle the problem
• Is this true?
– Yes for algorithms with linear asymptotic complexity
– No!! For algorithms with different asymptotic complexity
– Most scientific algorithms are O(N2) or O(N3)
– For a million variables, we would need about 16 generations of
Moore’s law before a O(N2) algorithm was comparable with a O(N)
algorithm
• Did no one tell you that Moore’s law is dead?
Moore’s Law is dead:
“Issues at small scales”
- Lithography not possible
- 2D electrostatics harder to control,
- “parasitic resistance” degrade performance,
- device to device variations will be larger,
- ultra-thin bodies and hyper-abrupt junctions
make manufacturing difficult
Moore’s Law is dead!
• Feature sizes and clock speeds on commodity
chips have been stagnant over the past 4 years
– ~3 GHz and 45 nm
• All manufacturers are going with multicore to
maintain performance
– Core-2, core-2-duo, quad-core, …
• Shared memory multiprocessing
– Intel has demo’ed several many core systems
• Graphics processors and gaming
consoles have already been on the
multicore path for a decade!
Gamer Power
Sony Playstation 3 Microsoft X-Box 360
2.18 teraflops <$400 1.04 teraflops <$300
Difficult to program Difficult to program
Multicore Intel box with 3 GPUs
GEFORCE 8880 GTX
in Slots
~ 1 Teraflop for < 3000
(shown with 1 GPU)
Programming on the GPU
• GPU organized as groups of
multiprocessors (8 relatively slow
processors) with small amount of own Local memory
memory and access to common shared ~50kB
memory
• Factor of 100s difference in speed as one
goes up the memory hierarchy
• To achieve gains problems must fit the GPU GPU shared
programming paradigm/ manage memory memory
~1GB
• Fortunately many practically important tasks
do map well and work on converting others
– Image and Audio Processing
– Some types of linear algebra cores
– Many machine learning algorithms Host memory
• Research issues: ~2-32 GB
– Identifying important tasks and mapping them to the
architecture
– Making it convenient for programmers to call GPU
code from host code
What is GPU Computing?
4 cores
Computing with CPU +
GPU
Heterogeneous
Computing
11
Not 2x or 3x : Speedups are 20x to
150x
146X 36X 18X 50X 100X
Medical Molecular Video Matlab Astrophysic
Imaging Dynamics Transcoding Computing s
U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN
Urbana
149X 47X 20X 130X 30X
Financial Linear Algebra 3D Quantum Gene
simulation Universidad Ultrasound Chemistry Sequencing
Oxford Jaime Techniscan U of Illinois, U of Maryland
Urbana
12
Accelerating Time to Discovery
4.6 Days
2.7 Days 3 Hours
8 Hours
30 Minutes
27 Minutes 16 Minutes
13 Minutes
CPU Only With GPU
13
Molecular Dynamics
Available MD software
NAMD / VMD (alpha
release)
HOOMD
ACE-MD
MD-GPU Source: Stone, Phillips, Hardy, Schulten
Ongoing work
LAMMPS
CHARMM
GROMACS
AMBER
Source: Anderson, Lorenz, Travesset
14
Quantum Chemistry
Available MD software
NAMD / VMD (alpha
release)
HOOMD
ACE-MD Source: Ufimtsev, Martinez
Ongoing work
LAMMPS
CHARMM
Q-Chem
Gaussian
GAMESS
Source: Yasuda
15
Computational Fluid Dynamics (CFD)
Ongoing work
Navier-Stokes
Lattice Boltzman
3D Euler Solver
Weather and ocean
modeling
Source: Thibault, Senocak
Source: Tolke, Krafczyk 16
Electromagnetics / Electrodynamics
FDTD Solvers
Acceleware
EM Photonics
CUDA Tutorial
Ongoing work
Maxwell equation solver
Ring Oscillator (FDTD)
Particle beam dynamics
simulator
FDTD Acceleration using GPUs
Source: Acceleware
17
Weather, Atmospheric, & Ocean
Modeling
CUDA-accelerated WRF available
Other kernels in WRF being
ported
Ongoing work
Tsunami modeling
Ocean modeling Source: Michalakes,
Vachharajani
Several CFD codes
Source: Matsuoka, Akiyama, et al
18
Computational Finance
Financial Computing Software vendors
SciComp : Derivatives pricing
modeling
Hanweck: Options pricing & risk
analysis
Aqumin: 3D visualization of market
data
Source: SciComp
Exegy: High-volume Tickers & Risk
Analysis
QuantCatalyst: Pricing & Hedging
Engine
Oneye: Algorithmic Trading
Arbitragis Trading: Trinomial Options
Pricing
Ongoing work
LIBOR Monte Carlo market model Source: CUDA SDK
19
Callable Swaps and Continuous Time