Sun Cluster 3.
1 cheat sheet
Daemons
This is used by cluster kernel threads to execute
userland commands (such as the run_reserve and dofsck
commands). It is also used to run cluster commands remotely (like the cluster
clexecd
shutdown command).
This daemon registers with failfastd so that a failfast device driver will panic the
kernel if this daemon is killed and not restarted in 30 seconds.
This daemon provides access from userland management applications to the CCR.
cl_ccrad
It is automatically restarted if it is stopped.
The cluster event daemon registers and forwards cluster events (such as nodes
entering and leaving the cluster). There is also a protocol whereby user applications
cl_eventd
can register themselves to receive cluster events.
The daemon is automatically respawned if it is killed.
cluster event log daemon logs cluster events into a binary log file. At the time of
cl_eventlogd writing for this course, there is no published interface to this log. It is automatically
restarted if it is stopped.
This daemon is the failfast proxy server.The failfast daemon allows the kernel to
failfastd
panic if certain essential daemons have failed
The resource group management daemon which manages the state of all cluster-
rgmd unaware applications.A failfast driver panics the kernel if this daemon is killed and
not restarted in 30 seconds.
This is the fork-and-exec daemon, which handles requests from rgmd to spawn
rpc.fed methods for specific data services. A failfast driver panics the kernel if this daemon
is killed and not restarted in 30 seconds.
This is the process monitoring facility. It is used as a general mechanism to initiate
restarts and failure action scripts for some cluster framework daemons (in Solaris 9
rpc.pmfd OS), and for most application daemons and application fault monitors (in Solaris 9
and10 OS). A failfast driver panics the kernel if this daemon is stopped and not
restarted in 30 seconds.
Public managment network service daemon manages network status information
received from the local IPMP daemon running on each node and facilitates
pnmd
application failovers caused by complete public network failures on nodes. It is
automatically restarted if it is stopped.
Disk path monitoring daemon monitors the status of disk paths, so that they can be
scdpmd reported in the output of the cldev status command. It is automatically restarted if it
is stopped.
File locations
man pages /usr/cluster/man
/var/cluster/logs
log files
/var/adm/messages
sccheck logs /var/cluster/sccheck/report.<date>
CCR files /etc/cluster/ccr
Cluster infrastructure file /etc/cluster/ccr/infrastructure
SCSI Reservations
scsi2:
/usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2
Display reservation keys
scsi3:
/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2
scsi2:
/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2
determine the device owner
scsi3:
/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2
Cluster information
Quorum info scstat –q
Cluster components scstat -pv
Resource/Resource group status scstat –g
IP Networking Multipathing scstat –i
Status of all nodes scstat –n
Disk device groups scstat –D
Transport info scstat –W
Detailed resource/resource group scrgadm -pv
Cluster configuration info scconf –p
Installation info (prints packages and
scinstall –pv
version)
Cluster Configuration
Integrity check sccheck
Configure the cluster (add nodes,
add data services, etc) scinstall
Cluster configuration utility
(quorum, data sevices, resource scsetup
groups, etc)
Add a node scconf –a –T node=<host><host>
Remove a node scconf –r –T node=<host><host>
Prevent new nodes from entering scconf –a –T node=.
scconf -c -q node=<node>,maintstate
Put a node into maintenance state
Note: use the scstat -q command to verify that the node is in
maintenance mode, the vote count should be zero for that node.
scconf -c -q node=<node>,reset
Get a node out of maintenance
state Note: use the scstat -q command to verify that the node is in
maintenance mode, the vote count should be one for that node.
Admin Quorum Device
Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices
added together. You can use the scsetup GUI interface to add/remove quorum devices or use
the below commands.
scconf –a –q globaldev=d11
Adding a device to the quorum
Note: if you get the error message "uable to scrub device" use
scgdevs to add device to the global device namespace.
Removing a device to the quorum scconf –r –q globaldev=d11
Evacuate all nodes
put cluster into maint mode
#scconf –c –q installmode
Remove the last quorum device
remove the quorum device
#scconf –r –q globaldev=d11
check the quorum devices
#scstat –q
scconf –c –q reset
Resetting quorum info
Note: this will bring all offline quorum devices online
obtain the device number
Bring a quorum device into
#scdidadm –L
maintenance mode
#scconf –c –q globaldev=<device>,maintstate
Bring a quorum device out of
scconf –c –q globaldev=<device><device>,reset
maintenance mode
Device Configuration
Lists all the configured devices
scdidadm –L
including paths across all nodes.
List all the configured devices
scdidadm –l
including paths on node only.
Reconfigure the device database,
creating new instances numbers if scdidadm –r
required.
Perform the repair procedure for a
scdidadm –R <c0t0d0s0> - device
particular path (use then when a disk
scdidadm –R 2 - device id
gets replaced)
Configure the global device
scgdevs
namespace
scdpm –p all:all
Status of all disk paths
Note: (<host>:<disk>)
Monitor device path scdpm –m <node:disk path>
Unmonitor device path scdpm –u <node:disk path>
Disks group
Adding/Registering scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true
Removing scconf –r –D name=<disk group>
adding single node scconf -a -D type=vxvm,name=appdg,nodelist=<host>
Removing single node scconf –r –D name=<disk group>,nodelist=<host>
Switch scswitch –z –D <disk group> -h <host>
Put into maintenance mode scswitch –m –D <disk group>
take out of maintenance mode scswitch -z -D <disk group> -h <host>
onlining a disk group scswitch -z -D <disk group> -h <host>
offlining a disk group scswitch -F -D <disk group>
Resync a disk group scconf -c -D name=appdg,sync
Transport cable
Enable scconf –c –m endpoint=<host>:qfe1,state=enabled
scconf –c –m endpoint=<host>:qfe1,state=disabled
Disable
Note: it gets deleted
Resource Groups
Adding scrgadm -a -g <res_group> -h <host>,<host>
Removing scrgadm –r –g <group>
changing properties scrgadm -c -g <resource group> -y <propety=value>
Listing scstat –g
Detailed List scrgadm –pv –g <res_group>
Display mode type (failover or
scrgadm -pv -g <res_group> | grep 'Res Group mode'
scalable)
Offlining scswitch –F –g <res_group>
Onlining scswitch -Z -g <res_group>
scswitch –u –g <res_group>
Unmanaging
Note: (all resources in group must be disabled)
Managing scswitch –o –g <res_group>
Switching scswitch –z –g <res_group> –h <host>
Resources
Adding failover
scrgadm –a –L –g <res_group> -l <logicalhost>
network resource
Adding shared
scrgadm –a –S –g <res_group> -l <logicalhost>
network resource
adding a failover scrgadm –a –j apache_res -g <res_group> \
apache application -t SUNW.apache -y Network_resources_used = <logicalhost>
and attaching the -y Scalable=False –y Port_list = 80/tcp \
network resource -x Bin_dir = /usr/apache/bin
adding a shared scrgadm –a –j apache_res -g <res_group> \
apache application -t SUNW.apache -y Network_resources_used = <logicalhost>
and attaching the -y Scalable=True –y Port_list = 80/tcp \
network resource -x Bin_dir = /usr/apache/bin
Create a scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \
HAStoragePlus > -x FileSystemMountPoints=/oracle/data01 \
failover resource > -x Affinityon=true
scrgadm –r –j res-ip
Removing
Note: must disable the resource first
changing properties scrgadm -c -j <resource> -y <property=value>
List scstat -g
scrgadm –pv –j res-ip
Detailed List
scrgadm –pvv –j res-ip
Disable resoure
scswitch –n –M –j res-ip
monitor
Enable resource
scswitch –e –M –j res-ip
monitor
Disabling scswitch –n –j res-ip
Enabling scswitch –e –j res-ip
Clearing a failed
scswitch –c –h<host>,<host> -j <resource> -f STOP_FAILED
resource
Find the network of
# scrgadm –pvv –j <resource> | grep –I network
a resource
offline the group
# scswitch –F –g rgroup-1
Removing a
remove the resource
resource and
# scrgadm –r –j res-ip
resource group
remove the resource group
# scrgadm –r –g rgroup-1
Resource Types
Adding scrgadm –a –t <resource type> i.e SUNW.HAStoragePlus
Deleting scrgadm –r –t <resource type>
Listing scrgadm –pv | grep ‘Res Type name’