Thanks to visit codestin.com
Credit goes to www.scribd.com

100% found this document useful (2 votes)
198 views27 pages

Cell Processor Architecture Overview

The document discusses the history and architecture of the Cell processor, which was developed by Sony, Toshiba, and IBM to provide high performance for specialized tasks like graphics processing. It describes the Cell's key components - the PowerPC Processing Element (PPE) which acts as the main processor, and 8 Synergistic Processing Elements (SPEs) which handle most of the computational workload in a parallel manner. The Element Interconnect Bus allows communication between the SPEs and PPE. Comparisons are made to other architectures like x86 and GPUs. The Cell is intended to be highly scalable and configurable for different applications.

Uploaded by

Max Power
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT or read online on Scribd
100% found this document useful (2 votes)
198 views27 pages

Cell Processor Architecture Overview

The document discusses the history and architecture of the Cell processor, which was developed by Sony, Toshiba, and IBM to provide high performance for specialized tasks like graphics processing. It describes the Cell's key components - the PowerPC Processing Element (PPE) which acts as the main processor, and 8 Synergistic Processing Elements (SPEs) which handle most of the computational workload in a parallel manner. The Element Interconnect Bus allows communication between the SPEs and PPE. Comparisons are made to other architectures like x86 and GPUs. The Cell is intended to be highly scalable and configurable for different applications.

Uploaded by

Max Power
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT or read online on Scribd
You are on page 1/ 27

The Cell Processor

From conception to deployment

Presented by Nathan Lemieux


November 16, 2005

Created for CS625a @ UWO


Overview
 Brief History of the Cell Conception
 Cell’s Architecture
 Comparisons to other Architectures
 Design Decisions
 Conclusions
 Extra tidbits
History
 Idea generated by SCEI in 1999 after release of PS2
 STI group formed in 2000
 In 2001 the first design center opened in the US
 Fall 2002 US patent released
 Since then prototypes have been developed and clocked
over @4.5 GHz
 February 2005 final architecture revealed to public
 In 2005 announced that first commercial product of the
Cell will be released in 2006
Sony Toshiba IBM Group (STI)
 Sony
 Leading manufacture of consumer and professional
audio and video products. Includes SCEI that
produces PS consoles
 Toshiba
A Leader in development of consumer electronics
such as HDTV and other devices
 IBM
 Proven track record as a leader in manufacturing
state-of-the art microprocessors
STI
 Each bring different knowledge
 Each have different Requirements and
Expectations
 Power consumption
 Size
 Performance
 Scalability
 Cost
Cell Architecture Overview
Cell Architecture Overview Continued

 Intended to be configurable
 Basic Configuration consists of:
1 PowerPC Processing Element (PPE)
 8 Synergistic Processing Elements (SPE)
 Element Interconnect Bus (EIB)
 Rambus Memory Interface Controller (MIC)
 Rambus FlexIO interface
 512 KB system Level 2 cache
Power Processing Element (PPE)
 Act as the host processor and performs scheduling for
the SPE
 64-bit processor based on IBM POWER architecture
(Performance Optimization With Enhanced RISC)
 Dual threaded, in-order execution
 32 KB Level 1 cache, connected to 512 KB system level
2 cache
 Contains VMX (AltiVec) unit and IBM hypervisor
technology to allow two operating systems to run
concurrently (Such as Linux and a real-time OS for
gaming)
Synergistic Processing Unit (SPU)
 SIMD vector processor
and acts independently
 Handles most of the
computational workload
 Again in-order execution
but dual issue
*

 Contains 256 KB local


store memory
 Contains 128 X 128 bit
registers
Synergistic Processing Unit (SPU)
Continued

 Operate on registers which are read from or


written to local stores.
 SPE cannot act directly on main memory; they
have to move data to and from the local stores.
 DMA device in SPEs handles moving data
between the main memory and the local store.
 Local Store addresses are aliased in the PPE
address map and transfers to and from Local
Store to memory (including other Local Stores)
are coherent in the system
Element Interface Bus (EIB)
 Contains 4 channels.
 Each channel can transfer 24
bytes per cycle (16 bytes data
+ 8 bytes tag). For a total 96
bytes/cycle.
 Enables communication
between the SPEs and the
PPE and is also connected to
level 2 cache, memory
controller and FlexIO
 Great design to allows for
different configurations
*
Rambus Contributions
 Memory Controller
 Dual channel Rambus XDR controller,
 peak memory bandwidth is 25.6 GB per second(2
channels x 2 devices per channel x 2 bytes per device
x 3.2 GHz)
 I/O Controller
 Rambus FlexIO is capable of running from 400 MHz
to 8 GHz.
 Contains 12 lanes (5 lanes are inbound, 7 outbound,
for a theoretical peak I/O bandwidth of 76.8 GB @ 8
GHz (44.8GB out, 32GB in)
Processing Power
 8 (SPE) x 4GHz x 4 (32 bit words in a vector) x 2
(Multiply-Adds are counted as 2 operations) =
256 SP GFLOPS
 Each SPE is capable of 32 SP GFLOPS
 SPE can produce 2 DP FMADD operations
every 7 cycles, ~2.3 DP GFLOPS, ~18.4 Total
 These calculations do not include the processing
power of the PPE
Architecture Wrap Up
 Cell needs to be configured for different uses
 Allows for variable number of PPEs and SPEs with
different memory configurations
 Newer generation Cells will be compatible to older
generations
 Cells are designed to work together; even distributed
over a network
Architecture Wrap Up Continued

 Tasks are divided into SPE and PPE “modules”


or jobs.
 Different resource allocation schemes available
 PPE Scheduling – The PPE maintains a job queue
 SPE self Scheduling – Scheduling is distributed
across the SPEs. PPE still maintans the job queue
 Stream Processing – Each SPE runs a distinct
program to be chained together.
Processing Power Continued

 Supercomputers rankings are done by Double


Precision calculations
 Supercomputer BlueGene/L develop by IBM has
a theoretical peak performance of 183500
GFLOPS but has only achieved 136800
GFLOPS. IBM’s BlueGene/L has 65536
processors giving each processor a theoretical
peak performance of approximately 2.8 DP
GFLOPS
Comparison To Other Architectures
 x86  GPU
 CISC  Specific purpose
 Contain multiple level  Contain vertex/pixel
of cache and OOO units, which are similar
hardware to the SPE
 Current trend is a  Connected to its own
dual-core approach high speed memory
Design Decisions
 STI members each have different expectations.
but power consumption and performance are
shared prerequisite amongst them
 Different techniques OOO execution, branch
predictions units and large cache have been
developed to increase performance but the
trade-off is increased complexity, power
consumption, size and heat.
 Because of the heat issue they are moving
toward dual-core processors.
Design Decisions Continued

 STI removed and/or modified all the techniques other


manufactures have used to increase performance but
have reduced complexity & power consumption, space
 To combat the reduced performance they looked at the
memory latency issue and introduced local store
memory that is closer to the execution units and used
the extra space to insert more execution units and
introduced a large resister file
 Using a multi-core approach that is easily scaleable to
multiple Cells
 Since there is reduced power consumption and heat
generation, the Cell clocked frequency can be cranked
up
Conclusions
 9 Core processor with revolutionary design
 Very scaleable in design and flexible in it uses
 Programming will more likely be difficult at first,
but future compilers will hopefully make things
more simple
 Current POWER apps will port easily to the Cell
 Will perform exceptionally well in its niche
markets but may never be seen in a desktop PC
What’s Apple Doing?
 Recently announced that they are no
longer using the IBM’s PowerPC
 Cell design changed from previous design
to include larger PPE with more advanced
VMX (AltiVec) unit
 Giving up the chance to be the distributor
of Cell based desktops, for power hungry
Intel chips
Reasons?
 PPC970FX failing to reach 3 GHz?
 Shortages of PPC?
 Higher cost of PPC processor?
 Strategic Alliance?
Sony’s PS3
PS3 Specs
 Cell processor @ 3.2 Ghz
 7 functional SPE, but has 8 (Redundancy ?)
 Total 218 SP GFLOPS
 nVidia RSX GPU (1.8 TFLOPS)
 256 MB XDR RAM
 256MB GDDR3 VRAM
 Up to 7 Bluetooth controllers
 Backwards compatible, WiFi capabilities with
PSP
?

You might also like