Cloud Computing Notes
Cloud Computing Notes
AGenericCloudArchitecture
Architecture
Architectureconsistsof 3tiers
◦ CloudDeploymentModel
◦ CloudServiceModel
◦ EssentialCharacteristicsofCloudComputing.
EssentialCharacteristics1
On-demandself-service.
◦ A consumer can unilaterally provision computing capabilities such as
servertime and network storage as needed automatically, without requiring
humaninteractionwith a serviceprovider.
EssentialCharacteristics2
Broadnetworkaccess.
◦ Capabilities are available over the network andaccessed through
standardmechanisms that promote use by heterogeneous thin or thick client
platforms(e.g.,mobilephones,laptops,andPDAs)aswellasothertraditionalorcloud
basedsoftwareservices.
EssentialCharacteristics3
Resourcepooling.
◦ The provider’s computing resources are pooled to serve multiple
consumersusingamulti-tenant
model,withdifferentphysicalandvirtualresourcesdynamicallyassigned
andreassigned accordingto consumerdemand.
EssentialCharacteristics4
Rapidelasticity.
◦ Capabilitiescanberapidlyandelasticallyprovisioned-insomecasesautomatically
-toquickly scaleout;andrapidly releasedtoquickly scalein.
◦ To the consumer, the capabilities available for provisioning often appear to
beunlimitedand can bepurchased in anyquantity at any time.
EssentialCharacteristics5
Measuredservice.
◦ Cloudsystemsautomaticallycontrolandoptimizeresourceusagebyleveraging a
metering capability at some level of abstraction appropriate to
thetypeofservice.
Resource usage can be monitored, controlled, and reported - providing transparency for
boththeproviderandconsumer ofthe service.
3.2NIST(NationalInstituteofStandardsandTechnologyBackground)
Thegoalistoacceleratethefederalgovernment’sadoptionofsecureandeffectivecloudcomputingto
reducecosts andimproveservices.
CloudComputingReferenceArchitecture:
InteractionsbetweentheActorsinCloudComputing
ExampleUsageScenario1:
Acloudconsumermayrequestservicefromacloudbrokerinsteadofcontactingacloudprovid
er directly.
Thecloudbrokermaycreateanewservicebycombiningmultipleservicesorbyenhancingane
xisting service.
UsageScenario-CloudBrokers
Inthisexample,theactual cloudprovidersareinvisible tothecloudconsumer.
Thecloudconsumerinteracts directlywiththecloudbroker.
ExampleUsageScenario2
Cloud carriers provide the connectivity and transport of cloud services from
cloudprovidersto cloud consumers.
A cloud provider participates in and arranges for two unique service level
agreements(SLAs), one with a cloud carrier (e.g. SLA2) and one with a cloud
consumer (e.g.SLA1).
UsageScenarioforCloudCarriers
⮚ Acloudprovider arrangesservicelevel agreements(SLAs)withacloudcarrier.
⮚ Requestdedicated andencryptedconnectionstoensurethecloudservices.
ExampleUsageScenario3
• Foracloudservice,acloudauditorconductsindependentassessmentsoftheoperationand
security ofthe cloud serviceimplementation.
• TheauditmayinvolveinteractionswithboththeCloudConsumerandtheCloudProvider.
CloudConsumer
TheconsumersofSaaScanbeorganizationsthatprovidetheirmemberswithaccesstosoftwar
eapplications, endusers orsoftwareapplicationadministrators.
SaaSconsumerscanbebilledbasedonthenumberofendusers,thetimeofuse,thenetworkban
dwidth consumed,theamountof data stored orduration ofstored data.
Cloud consumers of PaaScan employ the tools and execution resources provided
bycloudproviders to develop, test, deploy and managethe applications.
PaaS consumers can be application developers or application testers who run and
testapplicationsin cloud-based environments,.
PaaS consumers can be billed according to, processing, database storage and
networkresourcesconsumed.
Consumers of IaaS have access to virtual computers, network-accessible storage
&networkinfrastructure components.
The consumers of IaaScan be system developers, system administrators and
ITmanagers.
IaaSconsumersarebilledaccordingtotheamountordurationoftheresourcesconsumed,
such as CPU hours used by virtual computers, volume and duration of datastored.
CloudProvider
Acloud provideris aperson, anorganization;
Itistheentityresponsibleformakingaserviceavailabletointerestedparties.
ACloudProvideracquiresandmanagesthecomputinginfrastructurerequiredforprovidingt
heservices.
Runsthecloudsoftwarethatprovidestheservices.
MakesarrangementtodeliverthecloudservicestotheCloudConsumersthroughnetworkaccess.
CloudProvider-MajorActivities
CloudAuditor
Acloudauditorisapartythatcanperformanindependentexaminationofcloudservice
controls.
Auditsareperformedtoverifyconformancetostandardsthroughreviewofobjectiveevidenc
e.
Acloudauditorcanevaluatetheservicesprovidedbyacloudproviderintermsofsecuritycontr
ols, privacyimpact, performance,etc.
CloudBroker
Integrationofcloudservicescanbetoocomplexforcloudconsumersto manage.
Acloudconsumermayrequestcloudservicesfromacloudbroker,insteadofcontactingaclou
d provider directly.
Acloudbrokerisanentitythatmanagestheuse,performanceanddeliveryofcloudservices.N
egotiatesrelationshipsbetweencloudproviders andcloudconsumers.
Servicesof cloudbroker
ServiceIntermediation:
Acloudbrokerenhancesagivenservicebyimprovingsomespecificcapabilityandproviding
value-added services to cloud consumers.
ServiceAggregation:
Acloudbrokercombinesandintegratesmultipleservicesintooneormorenewservices.
Thebrokerprovidesdataintegrationandensuresthesecuredatamovementbetweenthecloud
consumer andmultiple cloud providers.
Servicesof cloudbroker
ServiceArbitrage:
Servicearbitrageissimilartoserviceaggregationexceptthattheservicesbeingaggregatedar
enot fixed.
Servicearbitragemeansabrokerhastheflexibilitytochooseservicesfrommultipleagencies.
Eg:Thecloudbrokercanuseacredit-scoringservicetomeasureandselectanagencywiththebest
score.
CloudCarrier
Acloudcarrieractsasanintermediarythatprovidesconnectivityandtransportofcloudservic
es betweencloud consumersand cloud providers.
Cloudcarriersprovideaccesstoconsumersthrough network.
The distribution of cloud services is normally provided by network
andtelecommunicationcarriers oratransport agent
Atransportagentreferstoabusinessorganizationthatprovidesphysicaltransportofstorage
mediasuchas high-capacityhard drives and otheraccessdevices.
ScopeofControlbetweenProviderandConsumer
TheCloudProviderandCloud Consumersharethecontrolof resourcesinacloudsystem
Theapplicationlayerincludessoftwareapplicationstargetedatendusersorprograms.
TheapplicationsareusedbySaaSconsumers,orinstalled/managed/maintainedbyPaaSconsumers, IaaS
consumers and SaaS providers.
Themiddlewarelayerprovidessoftwarebuildingblocks(e.g.,libraries,database,andJavavi
rtual machine) for developing application softwarein thecloud.
UsedbyPaaSconsumers,installed/managed/maintainedbyIaaSconsumersorPaaSprovide
rs,and hiddenfrom SaaS consumers.
TheOSlayerincludesoperatingsystemanddrivers,andishiddenfromSaaSconsumersand
PaaS consumers.
AnIaaScloudallowsoneormultipleguest OStorunvirtualizedonasinglephysicalhost.
TheIaaSconsumersshouldassumefullresponsibilityfortheguestOS,whiletheIaaSprovidercontrol
s thehost OS,
3.3CloudDeploymentModel
PublicCloud
PrivateCloud
HybridCloud
CommunityCloud
3.3.1Publiccloud
A public cloud is one in which the cloud infrastructure and computing resources
aremadeavailable to the general public over apublic network.
A public cloud is meant to serve a multitude(huge number) of users, not a
singlecustomer.
Afundamentalcharacteristicofpubliccloudsismultitenancy.
Multitenancy allows multiple users to work in a software environment at the
sametime,each with their ownresources.
Built over the Internet (i.e., service provider offers resources, applications storage
tothecustomers over theinternet) andcanbeaccessed by anyuser.
Ownedbyserviceprovidersand areaccessible throughasubscription.
Best Option for small enterprises, which are able to start their businesses
withoutlargeup-front(initial) investment.
By renting the services, customers were able to dynamically upsize or downsize
theirIT according to thedemands of their business.
Servicesareofferedona price-per-usebasis.
Promotesstandardization,preservecapitalinvestment
Publiccloudshavegeographicallydisperseddatacenterstosharetheloadofusersandbetters
ervethem according to their locations
Providerisincontrol oftheinfrastructure
Examples:
oAmazonEC2isapubliccloud thatprovidesInfrastructureasaService
oGoogleAppEngine isapubliccloud thatprovides Platformas aService
oSalesForce.comisapubliccloudthatprovidessoftwareasaservice.
Advantage
Offers unlimited scalability – on demand resources are available to meet
yourbusinessneeds.
Lower costs—no need to purchase hardware or software and you pay only for
theserviceyou use.
Nomaintenance-Serviceproviderprovidesthe maintenance.
Offers reliability: Vast number of resources are available so failure of a system
willnotinterrupt service.
ServiceslikeSaaS,PaaS,IaaSareeasilyavailable
onPublicCloudplatformasitcanbeaccessed fromanywherethroughanyInternetenabled
devices.
Locationindependent –theservicescanbe accessedfromanylocation
Disadvantage
Nocontroloverprivacy orsecurity
Cannot be used for use of sensitive applications(Government and Military
agencieswillnot considerPubliccloud)
Lackscomplete flexibility(sincedependentonprovider)
Nostringent(strict)protocols regardingdatamanagement
3.3.2PrivateCloud
Cloudservicesareusedby asingleorganization, whicharenotexposedtothepublic
Services are always maintained on a private network and the hardware and
softwarearededicated only to single organization
Privatecloudisphysicallylocatedat
● Organization’spremises [On-siteprivateclouds](or)
● Outsourced(Given)toathirdparty[OutsourceprivateClouds]
Itmaybemanagedeither by
CloudConsumerorganization(or)
● Byathirdparty
Privateclouds areused by
● governmentagencies
● financialinstitutions
● Midsizetolarge-sizeorganisations.
On-siteprivateclouds
Fig:On-siteprivate clouds
Out-sourcedPrivateCloud
Supposedto delivermoreefficient andconvenient cloud
Offershigherefficiency,resiliency(torecoverquickly),security,andprivacy
Customerinformationprotection:In-housesecurityiseasiertomaintainandrelyon.
● Follows its own(private organization) standard procedures
andoperations(whereasinpubliccloudstandardproceduresandoperationsofs
erviceproviders arefollowed )
Advantage
OffersgreaterSecurityandPrivacy
Organizationhas control overresources
Highlyreliable
Savesmoneybyvirtualizingtheresources
Disadvantage
Expensivewhencomparedtopubliccloud
Requires ITExpertisetomaintainresources.
3.3.3Hybrid Cloud
Builtwithboth publicandprivateclouds
Itisaheterogeneouscloudresultingfromaprivate andpublicclouds.
Privatecloudareusedfor
● sensitiveapplicationsarekeptinsidetheorganization’snetwork
● business-criticaloperationslikefinancialreporting
PublicCloudareusedwhen
● Otherservicesarekeptoutsidetheorganization’s network
● high-volumeofdata
● Lower-securityneedssuchasweb-basedemail(gmail,yahoomailetc)
The resources or services are temporarily leased for the time required and
thenreleased.This practiceisalsoknown ascloud bursting.
Fig:HybridCloud
Advantage
Itisscalable
Offersbettersecurity
Flexible-Additionalresourcesareavailedinpubliccloudwhen needed
Cost-effectiveness—wehavetopayforextraresourcesonlywhenneeded.
Control-Organisation canmaintain aprivate infrastructureforsensitiveapplication
Disadvantage
InfrastructureDependency
Possibilityofsecuritybreach(violate)throughpubliccloud
Difference Public Private Hybrid
These models are offered based on various SLAs between providers and
usersSLAof cloud computing covers
oserviceavailability
operformance
● dataprotection
oSecurity
3.4.1Softwareas aService(SaaS)(Completesoftwareoffering on thecloud)
SaaSisa licensedsoftwareoffering onthecloudandpay peruse
SaaS is a software delivery methodology that provideslicensed multi-tenant access
tosoftware and its functions remotely as a Web-based
service.Usuallybilled based on usage
◦ Usuallymultitenantenvironment
◦ Highlyscalablearchitecture
Customersdonot investonsoftwareapplicationprograms.
The capability provided to the consumer is to use the provider’s applications
runningonacloud infrastructure.
Theapplicationsareaccessiblefromvariousclientdevicesthroughathinclientinterfacesuch
asaweb browser(e.g., web-basedemail).
Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructureincludingnetw
ork,servers,operatingsystems,storage,dataorevenindividualapplicationcapabilities,with
thepossibleexceptionoflimiteduserspecificapplicationconfiguration settings.
Onthecustomer side,thereisno upfrontinvestment inservers orsoftwarelicensing.
Itisa“one-to-many”softwaredeliverymodel,wherebyan
applicationissharedacrossmultiple users
Characteristic of Application Service
Provider(ASP)oProductsold tocustomer isapplication
access.
oApplicationiscentrallymanagedbyServiceProvider.
oServicedeliveredisone-to-manycustomers
oServicesaredelivered onthecontract
E.g. Gmail and docs, Microsoft SharePoint, and the CRM
software(CustomerRelationshipmanagement)
SaaSproviders
Google’sGmail,Docs,Talketc
Microsoft’sHotmail,Sharepoint
SalesForce,
Yahoo
Facebook
3.4.2InfrastructureasaService(IaaS) (Hardwareofferingson thecloud)
IaaSisthedeliveryoftechnologyinfrastructure(mostlyhardware)asanondemand,scalableservi
ce.
◦ Usuallybilledbasedonusage
◦ Usuallymultitenantvirtualizedenvironment
◦ Canbecoupled withManaged ServicesforOSand applicationsupport
◦ UsercanchoosehisOS,storage,deployed app,networkingcomponents
◦ Thecapabilityprovidedtotheconsumeristoprovisionprocessing,storage,networks
,and other fundamental computing resources.
◦ Consumerisabletodeployandrunarbitrarysoftware,whichmayincludeoperatings
ystems and applications.
◦ Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructurebuthasc
ontrol overoperatingsystems, storageand deployedapplications.
Customersareprovided withexecutionplatformfordevelopingapplications.
Execution platform includes operating system, programming language
executionenvironment,database, web server, hardwareetc.
This acts as middleware on top of which applications are
builtTheuser is freedfrom managing thecloudinfrastructure
Application management is the core functionality of the
middlewareProvidesruntime(execution) environment
Developersdesigntheirapplicationsintheexecutionenvironment.
Developers need not concern about hardware (physical or virtual), operating systems,
andotherresources.
PaaScoremiddlewaremanagestheresourcesandscalingofapplicationsondemand.
PaaSoffers
oExecutionenvironmentandhardwareresources (infrastructure)(or)
osoftwareisinstalledon theuserpremises
PaaS: Service Provider provides Execution environment and hardware
resources(infrastructure)
CharacteristicsofPaaS
Runtimeframework:Executesend-user
codeaccordingtothepoliciessetbytheuserandtheprovider.
Abstraction:PaaShelpstodeploy(install)andmanageapplicationsonthecloud.
Automation: Automates the process of deploying applications to the
infrastructure,additionalresources areprovided when needed.
Cloud services: helps the developers to simplify the creation and delivery
cloudapplications.
PaaSproviders
GoogleApp Engine
◦ Python,Java,Eclipse
MicrosoftAzure
◦ .Net,VisualStudio
SalesForce
◦ Apex,Webwizard
TIBCO,
VMware,
Zoho
CloudComputing –Services
❖ SoftwareasaService-SaaS
❖ PlatformasaService-PaaS
❖ InfrastructureasaService-IaaS
Category Description ProductType Vendors
andProd
ucts
PaaS-I Execution platform Middleware + Force.com,
isprovided along Longjump
withhardware Infrastructure
resources(infrastructure)
PaaS-II Execution platform is MiddlewareIn + Google App
providedwith frastructure, Engine
additionalcomponents
Middleware
3.5ArchitecturalDesignChallenges
Challenge 1 : Service Availability and Data Lock-in
ProblemServiceAvailability
ServiceAvailabilityinCloudmightbeaffectedbecauseofSing
lePoint Failure
Distributed Denial of
ServiceSinglePoint Failure
oDependingonsingleserviceprovidermightresultin failure.
o In
caseofsingleserviceproviders,evenifcompanyhasmultipledatacentreslocated in
different geographic regions, it may have common
softwareinfrastructureand accounting systems.
Solution:
o MultiplecloudprovidersmayprovidemoreprotectionfromfailuresandtheyprovideHighA
vailability(HA)
oMultiplecloudProviderswillrescuethelossof alldata.
DistributedDenialofservice(DDoS)attacks.
o Cybercriminals,attack
targetwebsitesandonlineservicesandmakesservicesunavailableto users.
o DDoS tries to overwhelm (disturb) the services unavailable to user by having more
trafficthantheserver ornetwork can accommodate.
Solution:
o Some SaaS providers provide the opportunity to defend against DDoS attacks by
usingquickscale-ups.
Customers cannot easily extract their data and programs from one site to run on
another.Solution:
o Have standardization among service providers so that customers can deploy
(install)servicesand data acrossmultiple cloud providers.
DataLock-in
is a situation in which a customer using service of a provider cannot be moved to
anotherservice provider because technologies used by a provider will be incompatible with
otherproviders.
This makes a customer dependent on a vendor for services and makes customer unable
touseserviceof another vendor.
Solution:
o Have standardization (in technologies) among service providers so that
customers caneasilymove from aserviceprovider to another.
Challenge2:DataPrivacyandSecurityConcerns
Cloudservices arepronetoattacks becausetheyareaccessedthrough internet.
Securityisgivenby
oStoringtheencrypteddata into cloud.
oFirewalls,filters.
Cloudenvironmentattacksinclude
oGuesthopping
oHijacking
oVMrootkits.
Guest Hopping: Virtual machine hyper jumping (VM jumping) is an attack method
thatexploits(make use of) hypervisor’s weakness that allows a virtual machine (VM) to
beaccessedfromanother.
Hijacking: Hijacking is a type of network security attack in which the attacker
takescontrolof acommunication
VM Rootkit: is a collection of malicious (harmful) computer software, designed to
enableaccessto acomputer thatis nototherwise allowed.
A man-in-the-middle (MITM) attack is a form of eavesdroppping(Spy)
wherecommunicationbetweentwousersismonitoredandmodifiedbyanunauthorizedparty.
o Man-in-the-middle attack may take place during VM migrations [virtual machine
(VM)migration-VM is moved from one physical hostto anotherhost].
Passiveattackssteal sensitivedataorpasswords.
Active attacks may manipulate (control) kernel data structures which will cause
majordamageto cloud servers.
Challenge3:UnpredictablePerformanceandBottlenecks
MultipleVMscanshareCPUsandmainmemory incloudcomputing,but
I/Osharingisproblematic.
Internet applications continue to become more data-intensive (handles huge amount
ofdata).
Handlinghugeamountof data(data intensive)is abottleneck in cloudenvironment.
WeakServersthatdoesnotprovidedatatransfersproperlymustberemovedfromcloudenviron
ment
Challenge4:DistributedStorageandWidespreadSoftwareBugs
Thedatabaseisalwaysgrowingincloudapplications.
Thereisa needtocreate astoragesystemthatmeets thisgrowth.
ThisdemandsthedesignofefficientdistributedSANs(StorageAreaNetworkofStoragedevices).
Datacentresmustmeet
oScalability
oDatadurability
oHA(HighAvailability)
oDataconsistence
Bug refers to errors in
software.Debuggingmustbedoneindatacent
res.
Challenge5:CloudScalability,InteroperabilityandStandardizationCloudS
calability
Cloudresourcesarescalable.Costincreaseswhenstorage
andnetworkbandwidthscaled(increased)
Interoperability
OpenVirtualizationFormat(OVF)describesan open,secure,portable,efficient,
andextensibleformat for the packaging and distribution of VMs.
OVFdefinesatransportmechanismforVM,thatcanbe
appliedtodifferentvirtualizationplatforms
Standardization
Cloud standardization, should have ability for virtual machine to run on any
virtualplatform.
Challenge6:Software LicensingandReputationSharing
Cloud providers can use both pay-for-use and bulk-use licensing schemes to widen
thebusinesscoverage.
Cloud providers must create reputation-guarding services similar to the “trusted
e-mail”services
Cloudproviderswantlegalliabilitytoremainwiththecustomer,andviceversa.
3.6.CloudStorage
Storing your data on the storage of a cloud service provider rather than on a local
system.Datastored on thecloud areaccessed throughInternet.
CloudServiceProviderprovidesStorage asaService
3.6.1StorageasaService
Third-partyprovider rentsspaceontheirstorageto cloudusers.
Customersmovetocloudstoragewhenthey lackinbudgetforhavingtheir own storage.
Storage service providers takes the responsibility of taking current backup,
replication,anddisaster recovery needs.
Smallandmedium-sizedbusinesses canmakeuseofCloudStorage
Storage is rented from the provider
using aocost-per-gigabyte-stored(or)
ocost-per-data-transferred
The end user doesn’t have to pay for infrastructure (resources), they have to pay only
forhowmuch they transfer andsaveon the provider’sstorage.
5.2Providers
Google Docs allows users to upload documents, spreadsheets, and presentations
toGoogle’sdata servers.
Thosefiles can thenbeedited usingaGoogleapplication.
Web email providers like Gmail, Hotmail, and Yahoo! Mail, store email messages
ontheirown servers.
Userscanaccesstheiremailfromcomputersand otherdevicesconnectedtotheInternet.
Flicker and Picasa host millions of digital photographs, Users can create their own
onlinephoto albums.
YouTubehostsmillionsofuser-uploaded videofiles.
HostmonsterandGoDaddystorefilesanddataformanyclient websites.
Facebook and MySpace are social networking sites and allow members to post
picturesandothercontent. That content isstored on the company’sservers.
MediaMaxandStrongspaceofferstoragespace forany kindof digitaldata.
3.6.2DataSecurity
Tosecuredata, mostsystems useacombination of techniques:
oEncryption
oAuthentication
oAuthorization
Encryption
oAlgorithmsareusedtoencodeinformation.Todecodetheinformationkeys arerequired.
Authenticationprocesses
oThisrequires auser to createanameand password.
Authorizationpractices
o Theclientliststhepeoplewhoareauthorizedtoaccessinformationstoredonthe
cloudsystem.
Ifinformationstoredon thecloud,thehead oftheIT departmentmighthavecomplete andfree
access to everything.
Reliability
Service Providers gives reliability for data through redundancy (maintaining
multiplecopies ofdata).
Reputationisimportanttocloudstorageproviders.Ifthereisaperceptionthattheproviderisunreliabl
e,they won’t havemany clients.
Advantages
Cloudstorageproviders balanceserverloads.
Move data among various datacenters, ensuring that information is stored close
andtherebyavailable quicklyto whereit is used.
Itallowstoprotectthedataincasethere’s adisaster.
Some products are agent-based and the application automatically
transfersinformationto the cloud viaFTP
Cautions
Don’tcommiteverythingto thecloud,but useitforafew,noncriticalpurposes.
Largeenterprisesmighthavedifficulty withvendors likeGoogle orAmazon.
Forcedtorewritesolutions fortheirapplications.
Lackofportability.
Theft(Disadvantage)
Userdata couldbe stolen orviewedby thosewho arenot authorizedtosee it.
Whenever user data is let out of their own datacenter, risk trouble occurs from
asecuritypoint of view.
If user store data on the cloud, make sure user encrypts data and secures data
transitwithtechnologies likeSSL.
3.7 CloudStorageProvidersAmaz
onSimpleStorageService(S3)
The best-known cloud storage service is Amazon’s Simple Storage Service
(S3),launchedin 2006.
AmazonS3 isdesignedto makecomputingeasier fordevelopers.
Amazon S3 provides an interface that can be used to store and retrieve any amount
ofdata,at any time, from anywhereon theWeb.
AmazonS3isintentionallybuiltwithaminimalfeaturesetthatincludesthefollowingfunctio
nality:
• Write,read,anddeleteobjectscontainingfrom1byteto5gigabytesofdataeac
h.
Thenumberof objectsthat can bestored is unlimited.
•Eachobjectisstoredandretrievedviaauniquedeveloper-assignedkey.
• Objectscanbemadeprivateorpublic,andrightscanbeassignedtospecificuse
rs.
• Uses standards-based REST and SOAP interfaces designed to work
with anyInternet-developmenttoolkit.
DesignRequirements
AmazonbuiltS3 tofulfill thefollowing designrequirements:
• ScalableAmazonS3canscaleintermsofstorage,requestrate,anduserstosupportanunlimit
ednumber ofweb-scaleapplications.
● ReliableStoredatadurably,with99.99percentavailability.Amazonsaysitdoesnotallowany
downtime.
• Fast Amazon S3 was designed to be fast enough to support high-performance
applications.Server-sidelatencymustbeinsignificantrelativetoInternetlatency.Anyperformanceb
ottleneckscan befixedby simply adding nodes tothesystem.
• Inexpensive Amazon S3 is built from inexpensive commodity hardware components.
As aresult, frequent node failure is the norm and must not affect the overall system. It must
behardware-agnostic, so that savings can be captured asAmazon continues to drive
downinfrastructurecosts.
• Simple Building highly scalable,reliable, fast,and inexpensive storage is
difficult.Doingso in a way that makes it easy to use for any application anywhere is more
difficult. AmazonS3 must do both.
DesignPrinciples
Amazon used the following principles of distributed system design to meet Amazon
S3requirements:
• Decentralization It uses fully decentralized techniques to remove scaling bottlenecks
andsinglepoints of failure.
• Autonomy The system is designed such that individual components can make
decisionsbasedon local information.
• LocalresponsibilityEachindividualcomponentisresponsibleforachievingitsconsistency
;this is neverthe burden of its peers.
• Controlled concurrency Operations are designed such that no or
limitedconcurrencycontrolis required.
• Failure toleration The system considers the failure of components to be a normal
mode ofoperationand continuesoperation withno or minimal interruption.
• ControlledparallelismAbstractionsusedinthesystemareofsuchgranularitythatparallelis
mcanbeusedtoimproveperformanceandrobustnessofrecoveryortheintroductionofnew nodes.
• Small, well-understood building blocks Do not try to provide a single service that
doeseverything for everyone, but instead build small components that can be used as
buildingblocksfor other services.
• Symmetry Nodes in the system are identical in terms of functionality, and require no
ormi
• nimalnode-specificconfiguration to function.
•SimplicityThesystemshould bemadeas simpleas possible,but no simpler.
HowS3 Works
AmazonkeepsitslipsprettytightabouthowS3works,butaccordingtoAmazon,S3’sdesignaimstopr
ovidescalability,highavailability,andlowlatencyatcommoditycosts.S3storesarbitraryobjectsatu
pto5GBinsize,andeachisaccompaniedbyupto2KBofmetadata. Objects are organized by
buckets. Each bucket is owned by an AWS account andthebuckets areidentifiedby aunique,
user-assigned key.
Buckets and objects are created, listed, and retrieved using either a REST-style
orSOAP interface.
Objects can also be retrieved using the HTTP GET interface or via BitTorrent.
Anaccess control list restricts who can access the data in each bucket.Bucket names and
keysare formulated so that they can be accessed using HTTP. Requests are authorized using
anaccesscontrol list associated witheach bucketandobject,forinstance:
http://s3.amazonaws.com/examplebucket/examplekey
http://examplebucket.s3.amazonaws.com/examplekey
TheAmazonAWSAuthenticationtoolsallowthebucketownertocreateanauthenticatedURLwith a
set amount of time that theURL will bevalid.