Kabul University
Faculty of Computer Science
Department of Information Technology
Clustering in Open Source***
Lecturer: Ass.Pro Sebqatullah Aslamzai
Prepared by: Nazifa Kazimi
8th Semester
Table of Contents
1. Behind the Clustering…………………………….1
2. What is Clustering………………………………..3
3. Type of Clustering………………………………...3
4.Project Goal………………………………………..4
4. Configuration……………………………………...4
Behind the Clustering
*For introducing clustering we consider the following scenario.
This organization provides many services for its clients and it has many servers such as http servers,
ftp servers…
*This organization’s services should be 100 available for its customers. And customers always wanna
have access to services.
Being online 24*7 is important and essential for many services. It is not an easy task to do that.
1
*One approach is to have hardware redundancy. This can solve problems with hardware failures.
For example instead one physical dist we have two disk using Raid system. Two NIC card instead one
NIC card using NIC teaming. Two Power supply instead one power supply for providing redundant
powers.
* But what about server Operating system?
We can not have 100% available services with only redundant hardware because we may face to
operating system failures.
In this case we cann’t provide 100% availability. If your server goes down, then it will result in the
loss including money and reputation. So, it is always necessary to take precautions to prevent such
incident.
2
* Clustering
The best approach is clustering.
What is clustering?
A cluster is a group of servers and other resources that act as a single system and enable high
availability and, in some cases, load balancing and parallel processing.
We are going to make it possible with the Pacemaker stack, a cluster resource manager.
Types of Cluster System
1. High Availability
Or Active-Passive cluster:High-availability clusters (also known as HA clusters or fail-over
clusters) are groups of computers that support server applications that can be reliably utilized with a
minimum amount of down-time. They operate by using high availability software to harness redundant
computers in groups or clusters that provide continued service when system components fail. Without
clustering, if a server running a particular application crashes, the application will be unavailable until
the crashed server is fixed.
3
HA clustering remedies this situation by detecting hardware/software faults, and immediately
restarting the application on another system without requiring administrative intervention, a process
known as failover.
As part of this process, clustering software may configure the node before starting the application on
it. For example, appropriate file systems may need to be imported and mounted, network hardware
may have to be configured, and some supporting applications may need to be running as well
2. Load Balancing
Or Active-Active clustering: Load balancing scales the performance of server-based programs, such as
a web server by distributing client request across multiple servers. In computing, load balancing
improves the distribution of workloads across multiple computing resources, such as computers, a
computer cluster, network links, central processing units, or disk drives. Load balancing aims to
optimize resource use, maximize throughput, minimize response time, and avoid overload of any
single resource. Using multiple components with load balancing instead of a single component may
increase reliability and availability through redundancy
3. High Performance
Its known as High Performance Computing (HPC) cluster. As its name says, to have high performance
we use this cluster. In these clusters, Tasks are divided into smaller chunks, which then get compute on
different nodes.
4. Storage clusters
All nodes provide a single cluster file system that will be used by clients to read and write data
simultaneously. It is is the use of two or more storage servers, working together to increase
performance, capacity and reliability.
Project goal: As we mentioned I mentioned we have many types of cluster system. But in this
project I configure High Availability (HA) cluster with CentOS servers. In which each node should
take the responsibility of providing services during any fail-over or problems.
Two nodes clustering configuration
Our goal is two have two nodes clustering in open source using CentOS servers. CentOS is one of the
distribution of Linux operating system. It comes with Desktop and server operating system.
I used two CentOS server nodes for creating two nodes virtual clustering on virtual machines.
4
1. Installation requirements:
a. Installing Virtual machines
b. Installing two CentOS servers on created virtual machines.
2. After completing installation we need to show ip address of each nodes with ip address command.
Consider that, each nodes should be in one network with your host operating system because later we
wanna test created cluster with host browsers.
Host Ip address: 192.168.43.156 /24
Node1 ip address: 192.168.43.210 /24
Node2 ip address: 192.168.43.7 /24
Host ip address:
Node1 ip address:
5
Node2 ip address:
3. Setup the Name
Resolution
Both the host should
resolve the name of
the two cluster
nodes. We should
edit /etc/hosts file
with sudo
vi /etc/hosts command as following. Do this on both servers
open the /etc/hosts file with vi editor.
Add the following lines
Once you have added the line, just save and close the file.
consider that cluster.node1 and cluster.node2 are the hostnames of each node and node1 and node2 is
aliases for them.
4. Installing the Apache
The apache clustering and load balancing give the best support for the Active-Passive cluster on
CentOS 7.
You have to install the Apache web server on both of the servers. Follow the same steps on both
servers.
The Apache web server clustering is what we are going to do.
6
To install Apache, use the below command
5. Editing /etc/sysconfig/selinux file
open the /etc/sysconfig/selinux file with vi editor and root permission.
Run the following command
and edit as following. Change the SELINUX=enforcing to SELINUX=permissive
After editing /etc/sysconfig/selinux file save and exit.
Run the following commands on both servers
7
6. Configuring Firewall
Run the following two commands on both nodes
7. Installing pacemaker
You must configure a fencing device for each node in the cluster.
Install pacemaker and fencing by running the following command on both nodes.
What is pacemaker?
Pacemaker is open source cluster resource manager (CRM) a system that coordinates resources and
services that are managed and made highly available by a cluster. In essences, corosync enable
services to communicate as a cluster will pacemaker provides the ability to control how the cluster
behaves.
*Now you can observe that a user by the name of hacluster has been created.
See the /etc/passwd file by running tail /etc/passwd command.
8
*Set password for hacluster user
Run passwd hacluster command on both nodes and set password.
8. Start pacemaker service
run these two commands on both nodes
9. Authentication
You have to set up an authentication between the two nodes. Execute the below command on one
node1 (this is not necessary to select which node).
You will get the following output
We have to generate the configuration and synchronize the Corosync configuration on the nodes. Use
the below command for that purpose.
9
Cluster is the name of my cluster system.
You can see that the two node clustering have been created. Run the following command to see.
Start clustering by running the following commands.
You can see the resources (nodes) states by running pcs status command.
10. Configuration of Virtual IP Address
After the above step, we will start using the pcs to interact with the cluster. All commands will be
executed using only one node. It can be any node.
The Pacemaker cluster will become up and start running. We have to start adding the resources.
We will add the first resource to it. The resource is Virtual IP Address.
11. Adding the second resource
Use the below command to create the second resource for the cluster.
10
We can see the configured resources and service by running the pcs status command.
The resources may still are stopped. Run the following commands.
Now again check the status by running pcs status command.
Now I wanna create a file with the following command in each node for testing.
And for node1 add “Node1 is running...” and for node2 “Node2 is running...” text.
Your configurations have been completed. Now you can open your browser and type the Virtual ip you
have created.
11
You can see that the default node (Node1) is running.
Now I stop the node1.
Now when I ping 192.168.43.200.
you will get that “Node2 is running...”.
12
END