Distributed Computing
Distribution, Part II: The History of Distributed
Computing
In the beginning...
●
The first thing you probably think of is
Mainframe Computing
– That’s distributed right?
– The computer’s over there, my terminal is over
here…
– There are many terminals, gotta be distributing
something right?
●
But this isn’t distributed computing, as all the
compute is in one place.
In the beginning….
●
Distribution first arose when you could have
multiple computers as a single organisation.
●
Problem is one of resource sharing (on
ARPANET circa 1976 no less).
●
Actually predates the TCP/IP stack.
– Used NCP, the Network Control Program.
●
Most RPC stacks were hack jobs for single
purpose systems.
Scalability? Who needs that?
Xerox PARC
●
Special projects research lab owned by Xerox (you’ll
likely know them for their printers)
●
Invented a Xerox specific RPC system for Xerox
machines.
●
This was based on Xerox’s understanding that the
future of computing would have many computers per
organisation.
●
They also invented the first GUI.
– A little company called Apple stole it though.
Sun ONC RPC
●
The year is 1984, and Sun Microsystems has
not invented Java yet.
●
They do have very cool Unix systems based on
RISC architectures though.
●
And they have a problem:
– Hey, I need that file from over there, but I don’t want
to pay for it to be shipped in floppy disk format to
me. Surely I can use the company network to get it,
right?
Sun ONC RPC
●
Sadly no, you could not, as remote file mounts had
not been invented.
– Actually remote anything was a bit of a far-fetched idea
●
So naturally, Sun invented the Network File System
(NFS)
– The descendent of this system runs the home directory
shares in the labs!
●
This included a means of remotely using file
systems (via RPC) called Open Network Computing
Remote Procedure Call.
Sun ONC RPC
●
ONC RPC was wildly popular, as it was both
open source (BSD) and was generic and well
structured.
●
Problem was this only defined a RPC protocol,
not a library that actually did it.
●
So unless you were using C, you were going to
have a bad time.
DCE/RPC
●
In the early 1990’s, IBM got around to doing
RPC properly
– Of course they couldn’t do it themselves, so they
got HP, DEC, and even Sun together to help out.
●
The Open Software Foundation defined a new
RPC framework called the Distributed
Computing Environment.
DCE/RPC
●
DCE was super cool:
– Included a common way of doing authentication
– Had the first built in time service
– Integrated DNS
– Distributed File System
– And a Remote Procedure Call system
DCE/RPC
●
DCE was super cool:
– Included a common way of doing authentication
– Had the first built in time service
– Integrated DNS
– Distributed File System
– And a Remote Procedure Call system
●
Wait a minute… That reminds me of Windows
Domains….
DCE/RPC
●
DCE is still just a guideline though
– Didn’t really have anything greater than a C
implementation
– That said this was the days of Unix…
●
Was hugely popular with larger organisations,
especially now that the IBM PC was gaining
serious traction.
CORBA
●
Common Object Request Broker Architecture
●
What if you’re not using C?
●
What if you don’t like using these big, bloated
frameworks?
●
What if you just want two dang programs to
communicate on two computers?
●
You use CORBA, that’s what you do!
CORBA
●
Directly competed with DCE
●
Didn’t have any of that fancy pants
authentication/time/file system addons
●
Just let you define interfaces for computer
programs to use.
●
Actually was an integrated system for doing so
(not just a set of guidelines and some C
integrations)
CORBA
●
CORBA is built around the idea of Object
Request Brokers (or ORBs)
●
ORBs are a middleware service that allow
languages to communicate over the network
●
ORBs are designed to be cross compatible,
regardless of architecture or underlying
language.
●
ORBs are represented as objects, which allows
the system to hide nasty code inside classes.
CORBA
●
Objects that define interfaces to the internet?
That sounds like a Component!
– And CORBA agreed… eventually
– Added support for all the bloat-features of DCE, but
they were optional.
●
CORBA had ORBs for each OO language
– C++, Java, etc
– Your connection objects simply inherited from
whichever ORB class was present.
CORBA
●
Why was this all so cool?
– There was still no standardised format for passing
data around the internet.
– CORBA provided one that was language and
system independent.
– It also provided a language independent means of
writing the interfaces, meaning clients and servers
were implementation independant!
– CORBA is still around today (although not popular)
CORBA IDL
CORBA IDL Process
CORBA Today
●
Good idea, but there were problems
– Spec was hugely complicated because the ORBs
were written by different vendors.
●
Who charged a lot
●
ORBs turned out to not be as interoperable as promised
– Competing less expenive frameworks killed off the
project
●
Java RMI was free and did the same thing by 1999
●
Also, Microsoft
We’ve forgotten someone important
Microsoft is Distributed Computing
●
Microsoft has dominated distributed computing
since the mid 90’s.
●
This is because Microsoft has based their entire
OS line around the idea of many computers in
enormous distributed systems since Windows
3.11 with Workgroups
●
This idea has been the key to Microsoft’s
success throughout the years.
DLLs
●
The DLL is the fundamental building block of
modern Windows systems
●
Very similar to Unix Shared Object libraries
– They are linked at run time
– Can also be linked at compile time
– Language neutral
●
However, DLLs support Late Binding.
DLL Late Binding
●
The “killer feature” of DLLs is that functions can be
bound by name
– At run time, the OS can search the DLL for a specific
function name
●
This means that applications can check for missing
DLLs and DLL compatibility issues at run time.
This can avoid crashes and allows for dynamic
coding.
●
However, this is slower and there are no compile
time checks.
DLL Functions in C or C++
●
All declarations in DLLs are prefixed with
__declspec(dllexport)
– This includes all classes and functions
●
An alternative way includes a .def file
– This allowed for ordinal positions of functions
– But this is not well used, and so not very popular
DLL Definitions in C#
●
Are just class libraries
– Ie groups of classes that work together
●
These have no special rules and can simply be
compiled via Visual Studio.
Calling Functions in DLLs
●
Using C++/C, compile against header file and .lib
file
– The .lib file contains a stub to perform the DLL lookup
●
Otherwise you need to use the Windows API
– Example of this on the next slide
– Different languages do this differently
– COM DLLs must be handled differently
– .NET DLLs need the .NET common language runtime
DLL pros
●
Exe files are smaller as DLLs are incorporated
at run time
– Disk space use is less too as you only need one
DLL for many applications
●
Can share in memory DLL code amongst all
DLL apps
●
Upgrading a DLL upgrades all client
applications
DLL cons
●
Versions of the DLLs used by an application
must be compatible with each other and the
application
– Bad upgrades can break every app that uses it
●
Dependencies are outside of the compiled
application
●
Security issues exist with “by name” access
– Name clashes?
DLLs today
●
Very old by component standards
– Have existed since OS2 times.
●
More a component container system
●
DLLs can be normal, COM, or .NET
components.
– Modern .NET systems allow all compiled code to
act as DLLs. Even EXEs!
●
So you will probably use DLLs in industry.
What is COM?
●
COM: Component Object Model
●
Also known by it’s cool rebranded name
ActiveX
●
Developed out of Microsofts Object Linking and
Embedding architecture (OLE)
– OLE allowed one application to host objects from
another
– This is what lets you embed Excel spreadsheets in
Word.
What is COM?
●
COM is OLE extended via CORBA lines
– Interfaces defined by Microsoft’s IDL, MIDL
– Interface based RPC (called DCOM)
– Name server (the Windows registry)
●
Allows for lookup by GUID rather than name
●
This is hideous, but allows for unique component lookup
by version/system/machine.
●
Eg: f943b44a-0d95-45e3-90c5-34e841c531b2
●
Seperated into Interface GUIDS (IIDs) and Class GUIDS
(CLSIDs)
COM GUIDs
●
Interfaces via their IID are unbreakable
contracts
– This guarantees that clients can rely on them
forever.
●
Problem: Interfaces change all the time
– Every change of any kind needs a new IID.
– This results in huge logistical problems in COM
projects.
DCOM
●
Distributed computing was added to COM
– COM was just initially for OLE use.
●
DCOM works much like COM, it just uses DCE/
RPC to perform COM requests over a network
interface.
●
DCOM completely dominated DCE via
Microsoft’s ever popular EEE approach.
●
This is still the underlying system behind all
Windows Networks today.
COM GUIs
●
Microsoft used COM to allow users to embed
GUI elements into other applications.
●
This allows for really easy extensibility of
Microsoft programs, without needing to know
how the underlying code works.
●
This could be generalised to any component in
a container.
●
This was eventually renamed to ActiveX
ActiveX
●
ActiveX directly competed with Java applets.
●
Microsoft allowed ActiveX integration with IE
– This was a terrible, terrible idea.
●
ActiveX implements a standard component
interface
– IOleObject – defines parameters of GUI controls
– IDispatch – allows functions to be called by name.
●
This was also a terrible idea.
COM Today
●
Still the core of Windows networks.
●
Very outdated, .NET is the king of the Windows
Environment these days.
– However, lots of COM still exists, so .NET and COM
have a very well defined interface
●
Microsoft continues to push .NET and the general
concept of Web Services out into the world.
– However, Google/Amazon has stolen their ideas and
taken their crown.
Java RMI
●
In the late 1990s, Java arrived, and brought
with it Sun Microsystem’s RPC knowledge.
●
Enterprise Java had a thing called RMI.
– Normal Java has it too these days
●
Remote Method Invocation allows for RPC calls
without any non-language tools.
Java RMI
●
Like CORBA, uses a defined interface.
●
Unlike CORBA, this is entirely defined in Java
– Using an…. Interface.
– Needs to extend java.rmi.Remote interface.
– Then create stub classes from that, and follow
CORBA process from that point.
Java RMI
●
Like CORBA, inheritance is used to hide the nasty
stuff.
– Server object inherits from UnicastRemoteObject
– Again, no IDL class required.
●
Java also has a name service for finding components
– Called rmiregistry.
– It’s a command line program.
●
Problem: RMI has no inbuilt security integration.
Java RMI Today
●
Java RMI is still used today
●
It works pretty well, and provides an all-in-one,
no frills approach to component distribution.
●
The only problem is, it’s Java.
– And therefore kind of stands alone.
.NET
●
Microsoft very much liked the idea of Java’s VM
based, universally compatible features.
– Microsoft tried to make a Java implementation in
1996.
– Sun actually sued Microsoft for not following the
spec.
●
Eventually though, Microsoft decided to build
their own Java like system.
– This was named .NET, and the native language C#
The .NET CLR
●
Works like JavaVM
– Compiles source code to machine-independant
byte code (the Common Intermediate Language)
– Performs memory management and integrates the
underlying OS.
– Converts byte code into platform specific
executable code via a JIT (Just in Time) compiler.
– Both allow multiple lanuages provided they can
convert to the CIL.
CLR CIL
●
Code that compiles to CIL is called managed
code and is managed by the .NET framework
– Better security cause no pointers
– Platform independence via .NET VM
– However, slower due to JIT compilation
●
This is very nearly not a problem these days due to a lot
of paravirtualization.
CLR Non-CIL
●
Code not supported by the CIL is called
unmanaged code (also unsafe or native code)
– Less security
– Generally speaking limited in languages (to C++)
– C++ and C# both can allow for managed and
unmanaged code in the same application
●
Although this is discouraged and will be penalised if you
do it in this unit.
●
Basically there should be very nearly no reason to do
this.
.NET Remoting
●
.NET Remoting is a system that essentially
replaces DCOM for .NET
●
Is, unsurprisingly, very similar to RMI
●
However, there is no IDL or visible proxy code
– It’s all hidden in the .NET backend.
– Remotely-callable server objects must derive from
MarshallByRefObject.
– The server object’s public methods are the RPC
interface. (very cool)
.NET Remoting
●
The client must reference the server assembly
(EXE/DLL)
– The client needs access to the metadata of the
object (kind of like IDL).
– .NET does this by referencing the server object.
●
This is kind of like including a header file, but with a lot of
background magic
– This can be avoided with class factories.
.NET Remoting Today
●
Mostly a legacy system, as Microsoft has a
newer Web Services compatible .NET RPC
framework called WCF.
●
Remoting is still relevant because:
– Remoting does not require a web server
– Remoting supports binary message formats (which
are always more efficient than XML/JSON systems)
●
WCF combines Remoting with Web Services
– And a healthy dose of automagic coding.
.NET WCF
●
The Windows Communications Framework
(WCF) is an extension of .NET Remoting.
●
More like RMI as it uses an interface class.
●
MarshalByRefObject now replaced by
[ServiceContract] and [OperationContract]
attributes.
●
Tons more automatic code generation.
●
Still pretty much the same as older RPC
frameworks.
Examples!
●
For completeness sake, lets look at some
examples.
●
These could be useful in a tutorial or
something….
What are we building?
●
A Calculator!
– More specifically, a calculator add function.
●
Why on earth are we distributing this?
– This may be dumb, but makes the code simple and
lets us focus on the similarities and differences
– Also gives you an idea how easy it is.
●
Examples of code are very useful as you
progress through industry! Keep these
somewhere!
Some Generic IDL
C++ Server DLL
C++ Client
COM Component
●
We’re not going to include COM.
●
COM is for all practical purposes deprecated
– Has been since before Windows XP.
– It’s very ugly in implementation
– We’ll be using .NET exclusively…. Soooo….
●
Moving on.
CORBA – Java (Server)
CORBA – Java (Client)
Java RMI Interface
Java RMI Server
Java RMI Client
Fun fact about RMI
●
Java RMI’s biggest problem is that it is super
tightly integrated with Java
●
For example:
– The RMI client actually doesn’t have the stub code
for the server.
– Instead, it downloads it from the server on first
connect.
●
Both versions of Java must be exactly the same.
●
This has implications for security too, as it must trust the
code it downloads.
.NET Remoting Server
.NET Remoting Client
Fun facts about .NET Remoting
●
You may have noticed that we didn’t explicitly create
an instance of the server object.
●
Instead we quite lazily registers the server’s class.
●
.NET loves the idea of making object creation an RPC
too!
●
This is cool and all, but can result in code errors where
you create a client side version of the server side
object.
– This is very hard to detect
.NET WCF Server
.NET WCF Client
Some useful things for WCF
●
You’ll have noticed [ServiceContract]
[OperationContract] and [ServiceBehavior]
attributes.
– Just remember, Contracts for the Interface, Behavior
for the implementation.
●
You need to build a class factory to use a WCF
interface
– Factories are classes that build other classes
– Really just here cause Microsoft found it was a popular
approach to RPC.
Some More Useful Things for WCF
●
ServiceBehavior has a lot of fields.
●
What we’re doing is overriding Microsoft’s default
single threaded automatically synchronised system.
– Why? Because it’s really inefficient. And because we like
taking our lives into our hands.
●
Basically, Microsoft will often assume that you mean
single threaded by default
– This is very important, as a lot of programmers come to
Windows first.
– But it sucks for us, so we’ll be overriding a lot.
More WCF stuff?
●
Also, you can’t pass RPC objects via reference.
– Why? Because WCF is service oriented, and so it
wants to force you as the client to come to it.
– This fixes a lot of OO problems over the network.
– Objects can be passed by value though.
●
These aren’t server objects though, they’re data objects.
Why do people hate old systems?
●
Why are these older systems falling out of
favor?
– Firewalls (block a lot of ports to stop hackers)
– Configuration overheads (gotta tell clients where
servers are, and COM’s GUIDS make changes very
expensive)
– Proprietary
– And because the Internet
●
Seriously, why don’t we just use HTTP?
Next Week
●
The tiering system of basic distributed systems!
– You will have some idea of this from this week’s
tutorial
●
Asynchronous Communications
●
Statelessness!