Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
67 views21 pages

Architecture Overview: Introducing The Cell BE Installing Linux SIMD Programming in C/C++ Asynchronous Data Transfer With The DMA

This document provides an overview of programming with Linux on the Playstation 3. It discusses the Cell Broadband Engine architecture, including the PowerPC core and 8 SPE cores. It introduces SIMD programming in C/C++ using intrinsics and libspe2. It also covers asynchronous data transfer between main memory and SPE local stores using DMA and double buffering techniques.

Uploaded by

Jeremey Zamecnik
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views21 pages

Architecture Overview: Introducing The Cell BE Installing Linux SIMD Programming in C/C++ Asynchronous Data Transfer With The DMA

This document provides an overview of programming with Linux on the Playstation 3. It discusses the Cell Broadband Engine architecture, including the PowerPC core and 8 SPE cores. It introduces SIMD programming in C/C++ using intrinsics and libspe2. It also covers asynchronous data transfer between main memory and SPE local stores using DMA and double buffering techniques.

Uploaded by

Jeremey Zamecnik
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

ProgrammingwithLinuxonthe Playstation3

FOSDEM2008 [email protected]

Architectureoverview: introducingtheCellBE InstallingLinux SIMDprogramminginC/C++ Asynchronousdatatransferwith theDMA

WhoamI

Java/PythondeveloperatNuxeo(FOSSdocument managementserver) InterestedinArtificialIntelligence(andneedfast SupportVectorMachines) Slidestobepublishedat: http://oliviergrisel.name

PS3architectureoverview

CPU:IBMCell/[email protected]

218GFLOPS MainRAM:256MBXDR([email protected]) 1.8TFLOPS(SP)/356GFLOPSprogrammable VRAM:256MBGDDR3(2x128b@700MHz)

GPU:NvidiaRSX

SystemBus:2.5GB/s

TheCellBroadbandEngine

[email protected]

64bithyperthreaded PowerPC 512KBL2cache 128bitSIMDoptimized 256KBSRAM

[email protected]

PS3Clusters

Cheapclusterfor academicresearchers CarolinaStateU.and U.MassachusettsatD. 8+1clusterwithsshand MPI

PS3GRIDComputing

PS3GRIDproject

basedonBOINC 30,000atomssimulation 1PFLOPSwith800 TFLOPSfromPS3s BlueGene==280 TFLOPS


Folding@Home

LinuxonthePS3

Lv1Hypervisorshippedwiththedefaultfirmware PartitionutilityintheSonyGameOSmenu Chooseyourfavoritedistro:

Installapowerpc64smporps3kernel Installgccspu+libspe2

ProgrammingtheCell/BEinC

ProgramthePPEasachiefconductortospreadthe numericalcodetoSPEs UsePOSIXthreadstostartSPEsubroutinesin parallel UseSPEintrinsicstoperformvectorinstructions EliminatebranchesasmuchaspossibleinSPEcode Alignyourdatato16bytes


IntroductiontoSIMDprogramming

128bitsregisters(SSE2,Altivec,SPE)

2xdouble 4xfloat 4xint

introducenewvectortypes 1vectorfloatoperation==4floatoperations logical(and,or,cmp,...),arithmetic(+,*,abs,...), shuffling


SIMDprogrammingthebigpicture

NotalwaysSIMDizable

SIMDprogrammingwithlibspe2and gccspu

#include<spu_intrinsics.h> avoidscalartypesuse:

vector_float4 vector_double2 vector_char16...

d=spu_and(a,b);e=spu_madd(a,b,c); spugccpure_spe_prog.copure_spe_prog.elf

Branchelimination

avoidbranching(if/else)

c=spu_sel(a,b,spu_cmpgt(a,d));

AsampleSPEprogram
volatileunion{ vec_float4vec; floatpart[4]; }sum; floatdot_product(constfloat*xp,constfloat*yp,constintsize){ sum.vec=(vec_float4){0,0,0,0}; vec_float4*xvp=(vec_float4*)xp; vec_float4*yvp=(vec_float4*)yp; vec_float4*xvp_end=xvp+size/4; while(__builtin_expect(xvp<xvp_end,1)){ sum.vec=spu_madd(*xvp,*yvp,sum.vec); xvp++; yvp++; } returnsum.part[0]+sum.part[1]+sum.part[2]+sum.part[3]; }

DMAwiththeSPUs'MemoryFlow Controllers

#include<spu_mfcio.h> mfc_get(&local_data,main_mem_data_ea, sizeof(local_data),DMA_TAG,0,0); mfc_put(&local_data,main_mem_data_ea, sizeof(&local_data),DMA_TAG,0,0); mfc_getb(&local_data,main_mem_data_ea, sizeof(local_data),DMA_TAG,0,0); spu_mfcstat(MFC_TAG_UPDATE_ALL);


Doublebufferingtheproblem

Doublebufferingthebigpicture

DoublebufferingwithMFC

1.SPUqueuesMFCGETtofillbuffer#1 2.SPUqueuesMFCGETtofillbuffer#2 3.SPUwaitsforbuffer#1tofinishfilling 4.SPUprocessesbuffer#1 5.SPUqueuesMFCPUTbackcontentofbuffer#1 6.SPUqueuesMFCGETBtorefillbuffer#1 7.SPUwaitsforbuffer#2tofinishfilling 8.SPUprocessesbuffer#2(...)


Someresources

CellBEProgrammingTutorial(ibm.com190pages) IBMdeveloperworksshortprogrammingtutorials

SearchforarticlesbyJonathanBarlett http://www.bsc.es/projects/deepcomputing/linuxoncell/ http://www.cc.gatech.edu/~bader/CellProgramming.html


BarcelonaSupercomputingCenter(software)

PS3programmingworkshops(videos)

#ps3devonfreenode

Thanks,credits,licensing

MostschemasfromexcellentGFDL'dtutorialby GeoffLevand(SonyCorp)

http://www.kernel.org/pub/linux/kernel/people/geoff/cell

Picturesandtrademarksbelongtotheirrespective owners(Sony,IBM,Universities,Folding@Home, PS3GRID,...) AllremainingworkisGFDL

7differences

You might also like