ProgrammingwithLinuxonthe Playstation3
FOSDEM2008 [email protected]
Architectureoverview: introducingtheCellBE InstallingLinux SIMDprogramminginC/C++ Asynchronousdatatransferwith theDMA
WhoamI
Java/PythondeveloperatNuxeo(FOSSdocument managementserver) InterestedinArtificialIntelligence(andneedfast SupportVectorMachines) Slidestobepublishedat: http://oliviergrisel.name
PS3architectureoverview
CPU:IBMCell/[email protected]
218GFLOPS MainRAM:256MBXDR([email protected]) 1.8TFLOPS(SP)/356GFLOPSprogrammable VRAM:256MBGDDR3(2x128b@700MHz)
GPU:NvidiaRSX
SystemBus:2.5GB/s
TheCellBroadbandEngine
[email protected]
64bithyperthreaded PowerPC 512KBL2cache 128bitSIMDoptimized 256KBSRAM
[email protected]
PS3Clusters
Cheapclusterfor academicresearchers CarolinaStateU.and U.MassachusettsatD. 8+1clusterwithsshand MPI
PS3GRIDComputing
PS3GRIDproject
basedonBOINC 30,000atomssimulation 1PFLOPSwith800 TFLOPSfromPS3s BlueGene==280 TFLOPS
Folding@Home
LinuxonthePS3
Lv1Hypervisorshippedwiththedefaultfirmware PartitionutilityintheSonyGameOSmenu Chooseyourfavoritedistro:
Installapowerpc64smporps3kernel Installgccspu+libspe2
ProgrammingtheCell/BEinC
ProgramthePPEasachiefconductortospreadthe numericalcodetoSPEs UsePOSIXthreadstostartSPEsubroutinesin parallel UseSPEintrinsicstoperformvectorinstructions EliminatebranchesasmuchaspossibleinSPEcode Alignyourdatato16bytes
IntroductiontoSIMDprogramming
128bitsregisters(SSE2,Altivec,SPE)
2xdouble 4xfloat 4xint
introducenewvectortypes 1vectorfloatoperation==4floatoperations logical(and,or,cmp,...),arithmetic(+,*,abs,...), shuffling
SIMDprogrammingthebigpicture
NotalwaysSIMDizable
SIMDprogrammingwithlibspe2and gccspu
#include<spu_intrinsics.h> avoidscalartypesuse:
vector_float4 vector_double2 vector_char16...
d=spu_and(a,b);e=spu_madd(a,b,c); spugccpure_spe_prog.copure_spe_prog.elf
Branchelimination
avoidbranching(if/else)
c=spu_sel(a,b,spu_cmpgt(a,d));
AsampleSPEprogram
volatileunion{ vec_float4vec; floatpart[4]; }sum; floatdot_product(constfloat*xp,constfloat*yp,constintsize){ sum.vec=(vec_float4){0,0,0,0}; vec_float4*xvp=(vec_float4*)xp; vec_float4*yvp=(vec_float4*)yp; vec_float4*xvp_end=xvp+size/4; while(__builtin_expect(xvp<xvp_end,1)){ sum.vec=spu_madd(*xvp,*yvp,sum.vec); xvp++; yvp++; } returnsum.part[0]+sum.part[1]+sum.part[2]+sum.part[3]; }
DMAwiththeSPUs'MemoryFlow Controllers
#include<spu_mfcio.h> mfc_get(&local_data,main_mem_data_ea, sizeof(local_data),DMA_TAG,0,0); mfc_put(&local_data,main_mem_data_ea, sizeof(&local_data),DMA_TAG,0,0); mfc_getb(&local_data,main_mem_data_ea, sizeof(local_data),DMA_TAG,0,0); spu_mfcstat(MFC_TAG_UPDATE_ALL);
Doublebufferingtheproblem
Doublebufferingthebigpicture
DoublebufferingwithMFC
1.SPUqueuesMFCGETtofillbuffer#1 2.SPUqueuesMFCGETtofillbuffer#2 3.SPUwaitsforbuffer#1tofinishfilling 4.SPUprocessesbuffer#1 5.SPUqueuesMFCPUTbackcontentofbuffer#1 6.SPUqueuesMFCGETBtorefillbuffer#1 7.SPUwaitsforbuffer#2tofinishfilling 8.SPUprocessesbuffer#2(...)
Someresources
CellBEProgrammingTutorial(ibm.com190pages) IBMdeveloperworksshortprogrammingtutorials
SearchforarticlesbyJonathanBarlett http://www.bsc.es/projects/deepcomputing/linuxoncell/ http://www.cc.gatech.edu/~bader/CellProgramming.html
BarcelonaSupercomputingCenter(software)
PS3programmingworkshops(videos)
#ps3devonfreenode
Thanks,credits,licensing
MostschemasfromexcellentGFDL'dtutorialby GeoffLevand(SonyCorp)
http://www.kernel.org/pub/linux/kernel/people/geoff/cell
Picturesandtrademarksbelongtotheirrespective owners(Sony,IBM,Universities,Folding@Home, PS3GRID,...) AllremainingworkisGFDL
7differences