Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
58 views17 pages

Overview

MOSIX (multi-computer OS) is An operating system-like management system for distributed-memory architectures, such as clusters and multi-clusters. It is a Single-Systems Image (SSI) that Provides Continuous feedback about the state of resources. Users can login on any node and need not know where their programs run.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views17 pages

Overview

MOSIX (multi-computer OS) is An operating system-like management system for distributed-memory architectures, such as clusters and multi-clusters. It is a Single-Systems Image (SSI) that Provides Continuous feedback about the state of resources. Users can login on any node and need not know where their programs run.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Overview of MOSIX2

Prof. Amnon Barak Department of Computer Science The Hebrew University


http:// www . MOSIX . Org
July 2009

Copyright Amnon Barak 2009


Copyright Amnon Barak 2009

Background
Clusters, multi-clusters (intra-organizational Grids) and Clouds are popular platforms for HPC Typically, users need to run multiple jobs with minimal burden how the resources are managed
Prefer not to:
Modify applications Copy files or login to different nodes Lose jobs when some nodes are disconnected

Users dont know (and doesnt care):


What is the configuration, the status and the locations of the nodes Availability of resources, e.g. CPU speed, load, free memory, etc.
2

Copyright Amnon Barak 2009

Traditional management packages


Most cluster management packages are batch dispatchers that place the burden of management on the users For example, these packages:
Use static assignment of jobs to nodes
May lose jobs when nodes are disconnected

Not transparent to applications


May require to link application with special libraries View the cluster as a set of independent nodes
One user per node, cluster partition for multi-users
Copyright Amnon Barak 2009

Traditional management packages


Applications
One-way assignment of jobs, no feedback

Dispatcher

Dual 4Core

2Core

4Core

Independent Linux workstations and servers

A failed node
Copyright Amnon Barak 2009

What is MOSIX (Multi-computer OS)


An operating system-like management system for distributed-memory architectures, such as clusters and multi-clusters, including remote clusters on Clouds Main feature: Single-Systems Image (SSI) Users can login on any node and need not know where their programs run Automatic resource discovery
Continuous monitoring of the state of the resources

Dynamic workload distribution by process migration


Automatic load-balancing Automatic migration from slower to faster nodes and from nodes that run out of free memory
Copyright Amnon Barak 2009

MOSIX is a unifying management layer


Applications
SSI
Continuous feedback about the state of resources

Transparent

MOSIX management
All the active nodes run like one server with many CPUs
6

Copyright Amnon Barak 2009

MOSIX Version 1
Can manage a single cluster
Main features:
Provides a SSI by process migration Supports scalable file systems

9 major releases developed for Unix, BSD, BDSI, Linux-2.2 and Linux-2.4
Production installations since 1989 Based on Linux since 1998
Copyright Amnon Barak 2009

MOSIX Version 2 (MOSIX2)


Can manage clusters and multi-clusters, with some tools for running applications on Clouds
Developed for Linux-2.6 Geared for High Performance Computing (HPC), especially for application with moderate amounts of I/O Main features:
Provides a SSI by process migration Process migration within a cluster and among different clusters Secure run time environment (sandbox) for guest processes Live queuing - queued jobs preserve their full generic Linux environment Supports batch jobs, checkpoint and recovery
Copyright Amnon Barak 2009

Running applications in a MOSIX cluster


MOSIX recognizes 2 type of processes: Linux processes - are not affected by MOSIX
Usually administrative tasks that are not suitable for migration Processes that use features not supported by MOSIX, e.g. threads MOSIX processes -usually applications that can benefit from migration All such processes are created by the ``mosrun''command They are started from standard Linux executables, but run in an environment that allows each process to migrate from one node to another Each MOSIX process has a unique home-node, which is usually the node in which the process was created Linux processes created by the ``mosrun -E'' command can still benefit from MOSIX, e.g be assigned to the least loaded nodes

Copyright Amnon Barak 2009

Examples: running interactive jobs


Possible ways to run myprog:
> myprog - run as a Linux process on the local node > mosrun myprog - run as a MOSIX process in the local cluster > mosrun -b myprog - assign the process to the least loaded node > mosrun -b m700 myprog - assign the process only to a nodes with 700MB of free memory > mosrun E -b m700 myprog - run as a native Linux job > mosrun M -b m700 myprog - run a MOSIX job whose home node can be any node in the local cluster

Copyright Amnon Barak 2009

10

Running batch jobs


To run 2000 instances of myprog on a multi-cluster
> mosrun G b m700 q S64 myfile

-G assign the job to a node in another cluster -S64 run up to 64 jobs at a time from the queue myfile a file with a list of 2000 jobs

Copyright Amnon Barak 2009

11

How does it work


Automatic resource discovery by a gossip algorithm Provides each node with the latest info about the cluster/multi-cluster resources (e.g free nodes) All the nodes disseminate information about relevant resources: speed, load, memory, local/remote I/O, IPC Info exchanged in a random fashion - to support scalable configurations and overcome failures Useful for high volume transaction processing
Example: a compilation farm - assign the next compilation to least loaded node

Copyright Amnon Barak 2009

12

Dynamic workload distribution


A set of algorithms that match between required and available resources
Geared to maximize the performance Initial allocation of processes to the best available nodes in the users private cluster
Not to nodes outside the private cluster Automatic load-balancing Automatic migration from slower to faster nodes Authorized processes move to idle nodes in other clusters

Multi-cluster-wide process migration

Outcome: users need not know the current state of the cluster and the multi-cluster resources
Copyright Amnon Barak 2009

13

Core technologies
Process migration move the process context to a remote node OS virtualization layer allow migrated processes to run in remote nodes, away from their creation (home) nodes

Lo

OS Virtualization layer

Lo

MOSIX Link reroute syscalls

OS Virtualization layer

Linux

Linux

Home node

Remote node

Gu e st

ca l

ca l

A migrated process

Copyright Amnon Barak 2009

14

The OS virtualization layer


Provides the necessary support for migrated processes
By intercepting and forwarding most system-calls to the home node

Result: migrated processes seem to be running in their respective home nodes


The users home-node environment is preserved No need to change applications, copy files or login to remote nodes or to link applications with any library Migrated processes run in a sandbox

Outcome: users get the illusion of running on one node


Drawback: increased communication and virtualization overheads
Reasonable vs. added cluster/multi-cluster services (see next slide)

Copyright Amnon Barak 2009

15

Reasonable overhead:
Linux vs. migrated MOSIX process times (Sec.), 1Gbit-Ethernet
Application RC SW JEL BLAT

Local - Linux process


Total I/O (MB)
Migrated process- same cluster slowdown

723.4
0

627.9
90

601.2
206

611.6
476

725.7
0.32%

637.1
1.47%

608.2
1.16%

620.1
1.39%

Migrated process across 1Km campus slowdown


Sample applications: RC = CPU-bound job JEL = electron motion
Copyright Amnon Barak 2009

727.0
0.5%

639.5
1.85%

608.3
1.18%

621.8
1.67%

SW = proteins sequences BLAT = protein alignments


16

Main multi-cluster features


Administrating a multi-cluster
Priorities among different clusters

Scheduling and monitoring


Supports batch jobs, checkpoint and recovery

Supports disruptive configurations MOSIX Reach the Clouds (MRC)

Copyright Amnon Barak 2009

17

Administrating a multi-cluster
A federation of x86 (both 32-bit and 64-bit) clusters, servers and workstations whose owners wish to cooperate from time to time Collectively administrated
Each owner maintains its private cluster Determine the priorities vs. other clusters Clusters can join or leave the multi-cluster at any time

Dynamic partition of nodes to private virtual clusters


Users of a group access the multi-cluster via their private clusters and workstations

Process migration among different cluster


Outcome: each cluster and the multi-cluster performs like a single computer with multiple processors
Why an intra-organizational Grid: due to trust

Copyright Amnon Barak 2009

18

The priority scheme


Cluster owners can assign priorities to processes from other clusters
Local and higher priority processes force out lower priority processes
c2 Symmetrically c1 symmetrically(C1-C2) or asymmetrically(C3-C4)

Pairs of clusters could be shared, A cluster could be shared (C6) among other clusters (C5, C7) or blocked for migration from other clusters (C7) Dynamic partitions of nodes to private
virtual clusters

c4

A-symmetrically

c3

Outcome: flexible use of nodes in shared clusters

c6

c5

c7
19

Copyright Amnon Barak 2009

When priorities are needed


Scenario 1: one cluster, some users run many jobs, depriving other users from their fair share
Solution: partition the cluster to several sub-clusters and allow each user to login to only one sub-cluster

Users in each sub-cluster can still benefit from idle nodes in the other sub-clusters Processes of local users (in each sub-cluster) has higher priority over guest processes from other sub-clusters

Scenario 2: some users run long jobs while other user need to run (from time to time) short jobs Scenario 3: several groups using a shared cluster
Sysadmin can assign different priorities to each group
Copyright Amnon Barak 2009

20

Scheduling and monitoring


Batch jobs run as Linux processes in different nodes Checkpoint & recovery - time basis, manually or by the program Live queuing queued jobs maintain an organic connection with
their Unix environment Queue management provides means for tracing jobs, changing priorities, order of execution, for running parallel e.g. MPI jobs Queued jobs are released gradually in a manner that prevents flooding the local cluster or other clusters

Built-in on-line monitor for the local cluster resources On-line web monitor of the multi-cluster and each cluster
http://www.mosix.org/webmon
21

Copyright Amnon Barak 2009

Example: queuing
With the -q flag, mosrun places the job in a queue Jobs from all the nodes in each cluster share one queue Queue policy: first-come-first-serve, with several exceptions Users can assign priorities to their jobs, using the q{pri} option
The lower the value of pre the higher priority The default priority is 50. It can be changed by the sysadmin Running jobs with pri < 50 should be coordinated with the clusters manager

Out-of-order and fair-share


These options allow to instantly start a fix number of jobs per user, overriding the queue

Examples:
> mosrun q b m1000 myprog (queue a MOSIX program to run in the cluster) > mosrun q60 G b -J1 myprog (queue a low priority job to run in a different cluster)

Copyright Amnon Barak 2009

>mosrun q30 E m500 myprog (queue a high priority batch job)

22

mosq view and control the queue


mosq list list the jobs waiting in the queue mosq listall list jobs already running from the queue and jobs waiting in the queue Mosq delete {pid} delete a waiting job from the queue Mosq run {pid} run a waiting process now Mosq cngpri {newpri}{pid} change the priority of a waiting job Mosq advance {pid} move a waiting job to the head of its priority group within the queue Mosq retard {pid} move a waiting job to the end of its priority group within the queue

More options in the mosq manual


Copyright Amnon Barak 2009

23

Disruptive configurations
When a cluster is disconnected:
All guest processes move out To available remote nodes or to the home cluster All migrated processes from that cluster move back
Returning processes are frozen (image stored) on disks Frozen processes are reactivated gradually

Outcome:
Long running processes are preserved No overloading of nodes
Copyright Amnon Barak 2009

24

MOSIX Reach the Clouds (MRC)


MRC is a tool that allows applications to run in remote nodes on Clouds, without pre-copying files to these nodes Main features: Runs on both MOSIX clusters and Linux computers (with unmodified kernel) No need to pre-copy files to remote clusters Applications can access both local and remote files Supports file sharing among different computers Stdin/out/err are preserved locally Can be combined with "mosrun" on remote MOSIX clusters
Copyright Amnon Barak 2009

25

Hebrew University multi-cluster campus Grid (HUGI)


17 production MOSIX clusters ~350 nodes, ~750 CPUs
In Life-sciences, Med-school, Chemistry and Computer Science Sample applications that our users are running: Nano-technology Molecular dynamics Protein folding, Genomics (BLAT, SW) Weather forecasting Navier-Stokes equations and turbulence (CFD) CPU simulator of new hardware design (SimpleScalar)

Copyright Amnon Barak 2009

26

Priorities among HUGI clusters


CS student Farm

100

20
CS Theory group cluster

20 50

100

Biology1

CS general Priority for Accepting processes From cluster


Theory Student Farm Biology1 Biology2

Biology 2 Priority for accepting processes From cluster


Theory

Priority
20 Blocked Blocked 50

20
CS General Cluster

20 50
Biology2

Priority
50 Blocked 70 20 27

Student Farm CS General

70

Biology1

Copyright Amnon Barak 2009

Day use: idle shared nodes allocated to users


HUGI Chemistry
Computer Science

Life Sciences

Student farms

Group 2 clusters

Group 1 cluster

Student and guest processes


Copyright Amnon Barak 2009

Guest processes from Group 1

28

Night use: most nodes are allocated to one group


HUGI
Computer Science

Student farms

Group 2 clusters

Group 1 cluster

Copyright Amnon Barak 2009

29

Web monitor: www.MOSIX.org/webmon

Display: Total number of nodes/CPUs Number of nodes in each cluster Average load

Copyright Amnon Barak 2009

30

Zooming on each cluster

Display: Load Free/used memory Swap space Uptime Users

Copyright Amnon Barak 2009

31

Conclusions
MOSIX2 is a comprehensive set of tools for automatic management of Linux clusters and multi-clusters Self-management algorithms for dynamic allocation of system-wide resources
Cross clusters performance nearly identical to a cluster

Many supporting tools for ease of use


MRC for running applications on Clouds

Includes an installation script and manuals Can run in native mode or on top of virtual machine packages, e.g. VMware, Xen, MS Virtual Server, over an unmodified OS (Linux, Windows, OS X)
Copyright Amnon Barak 2009

32

How to obtain a copy of MOSIX


A free, unlimited trial copy is provided to faculty, staff and researchers for use in academic, research and non-profit organizations A free, limited evaluation copy is provided for nonprofit use Non-academics copies are available

Details at

http://www.MOSIX.org

Copyright Amnon Barak 2009

33

You might also like