Module 4
Monitoring
Objectives
Upon completion, you will be able to:
• Fault Management – Alarms
• Basic CLI troubleshooting
• Logs , Software Files
• Interface verification – Sh ( Diameter) /
SCTP
• Database replication
• IP addressing and Bonds
• TRL Processing , Live traces and
Capture
• Explain common troubleshooting
procedures
Fault Management
The RMS EMS Fault Management screens depict fault types generated in the mOne system. The mOne
supports two basic types of system faults, as follows:
• Alarm — An alarm indicates a system fault condition that requires attention.
• Event — An event depicts component state changes or administrative events in the mOne system .
Critical : Indicates an active service affecting condition occurred that requires an immediate corrective
action. (managed object is out-of-service)
Major : Indicates an active service affecting condition developed that requires urgent corrective action.
(degradation in the capability of a managed object. )
Minor : Indicates a non-service affecting fault condition.
Colour Code Severity Level
Red • Critical
Yellow • Major
White • Minor
Color coded Alarm Severity Levels
RMS Alarms
The RMS alarm types and descriptions are listed below:
Alarm Type Description
Equipment Alarms An Equipment alarm indicates an equipment fault
Processing Error Alarms A Processing Alarm indictes a software or processing fault.
Communication Alarms A Communication alarm is associated with the procedures
and/or processes that convey information from one point to
another.
Environmental Alarms An environmental alarm indicates a condition relating to an
enclosure where the equipment resides.
Service IMS
• mOne platform application is started by
# service IMS start
# readShm
# ps –ef | grep IMS
• and stopped by:
# service IMS stop
# readShm
# ps –ef | grep IMS
Active AM gets stopped AM
Active
AM
OOS
# service IMS stop
# readShm
# ps –ef | grep IMS SQL SQL
• Verify the HA-IP (High-Availability IP) is on the active ADM card before restarting the other cards
• Check to see where the HA-IP is running
# ip a | grep 172.16.33.3
inet 172.16.33.3/24 brd 172.16.33.255 scope global secondary bond0:0
Check DB Status
AM AM
Active Standby
>dbm;
>show slave status;
SQL SQL
sync
master slave
Traffic Flow
Protocol ports hosted,
applications for encoding
and decoding (diameter,
ldap, soap, sigtran dns)
MX – Base is
Fabric
used i.e.
used i.e.
172.16.33.x
172.16.74.0/24
TP TP
TP TP
TP TP
Troubleshooting - RM
• IP/Route checks
fp_ifconfig
fp_route
fp_arp
• Note: fp commands on ATCA will work only when you access DM/RM using console
• Port/NAT rule checks
/usr/IMS/current/bin/rmDbg ts
/usr/IMS/current/bin/rmDbg us
• Note: you can also add ig/eg options to commands mentioned above for specific ingress or egress data
• Application checks
fp_debuglevel 11 (for DBG logs)
fp_debuglevel 5 (for normal WRN logs)
Troubleshooting - RM
• Login via ssh/telnet and use rmDbg tool
Troubleshooting - RM
• To show configured Rules:
Troubleshooting - RM
• Check configured NAT from AM - GUI
Troubleshooting - RM
• Check statistics
# rmDbg -L -n –stats
IP Virtual Server version 1.2.1 (size=1048576) Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes InSync
InFin OutSync OutFin
-> RemoteAddress:Port
TCP 172.16.94.26:80 0 0 0 0 0 0 0 0 0
-> 172.16.61.22:80 0 0 0 0 0 0 0 0 0
-> 172.16.61.24:80 0 0 0 0 0 0 0 0 0
TCP 10.100.11.51:38880 0 0 0 0 0 0 0 0 0
-> 172.16.93.25:11000 0 0 0 0 0 0 0 0 0
TCP 10.100.11.51:48880 0 0 0 0 0 0 0 0 0
-> 172.16.93.25:65535 0 0 0 0 0 0 0 0 0
• Check debug level
# cat /proc/sys/net/ipv4/vs/debug_level
3.12 change code debug level:
# echo n >/proc/sys/net/ipv4/vs/debug_level
3.13 display code debug info:
# dmesg >dump.txt
rmtCmd
• Tool which enables to execute select commands on selected blades
root@0-9:/root> rmtCmd -h
This tool is used to run commands on multiple blades.
USAGE: rmtCmd "command to be executed" <card_type or slot>
for card_type, use AM, CE, SI, RM, DM, DBSTORE, DBServer, TP, TP_ALL, SC or all.
for sby/act, use ceact, sisby, rmact, dmact,dbstoresby, dbserversby, tpact, tp_allsby, scact etc..
for slot IDs you may use IPs e.g. 172.16.33.41, Slot ID e.g. 1-1.
IP-SM-GW Basic Troubleshooting
Check the modules status
# readShm
Check the module/Blade uptime
# uptime
RMS IP-SM-GW Functionalities
Check the process running in the system
# ps –eaf | grep IMS
Check user on the IPSMGW from CLI
#nodeanchor +12893255758
Subscriber Query from GUI
Logs and Important Software Files
-boot log for particular card -> cd /var/log/boot.log
- AMsetup logs cd /var/log
Check Interfaces ( SCTP / Diameter )
SCTP ( M3UA ) links towards STP
• Check SCTP links data --> getsctp.sh
Diameter links towards HSS ( Sh Interface )
• Check available diameter links data diamlinks.sh
• Or Via GUI : Configurations Interface Signaling interface Diameter Local diameter Local peer,
Select View Connection Status tab
• Check watchdog request via SI blades
ssh 0-1
tshark -i any -R "diameter"
Check Database Replication Status
• Check the DB replication status
• rmtCmd '/root/bin/checkDbState.sh' adm
• Example Output: Note: The example output is taken when ADM 9 is ACTIVE and ADM 10 is STANDBY.
w0575darcsmgt01isg root @0-9:~/bin# rmtCmd '/root/bin/checkDbState.sh' adm
====>> On Card 0-9 Running: /root/bin/checkDbState.sh
MASTER
====>> On Card 0-10 Running: /root/bin/checkDbState.sh
SLAVE
---->> Done!!
Bond 0 on AM ( Base – Internal)
Commands :
ip a show
ip a show bond0
ip a show bond1.1 O&M
Bond 1 on AM ( Fabric External)
TRL Processing
• TRL – Transaction Logs
• Basic TRLs on IPSMGW – REGISTER , DE-REGISTER , SRI_SM , MO_FSM and MT_FSM
• TRL location /data/redun/cdr/trl/
• Raw TRL example of REGISTER operation
• 0-4,1,7,004010000000000000000000,,,,,,,,,,,,,,,,,,,,,,,,,4,0|0|12893257270,,,,,,,20150910172012148-
0400,,,,0,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,,,,,,,,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,0,0,,,0,sip:172.16.138.8:5090 ,REGISTER,20150910172012148-0400,200,,2,2653,223804743@2001:4958:5:f36c:0:a:bb00:f6010301,multipart/
mixed,,,,,,,,,,,sdclab001.ims.bell.ca,1,20150910172012153-0400,2001,0,1,20150910172012162-0400,2001,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,1|1|12893885122,1|1|12893255001,ATM_SM,20150910172012152-
0400,20150910172012278-0400,0,,0,0,0,1|1|12893885122,1|1|12893255002,RFS_SM,20150910172012278-0400,20150910172012365-0400,0,,,0,,,,0,0,,,,,,,,,,,,
• Process TRL from script
#Trl_IPSM.sh IPSMGWCreekLab_TRL_201510041455-0400_06889_A.csv.gz
Important Timers For RMS Functionality
Field Description
tr1m Guard timer for MO SMS. Within the value, rmsMgr shall return response to the client
Guard timer for MT (SMS, StatusReport, SRI-SM). Within the value, rmsMgr shall return response to the
tr1n
remote
ts2 IMDN aggregative timer
tmap timer waiting for all MAP responses
tsmpp timer waiting for all SMPP responses
tsip timer waiting for all SIP responses
tdiam timer waiting for all DIAMETER responses
tdns timer waiting for all DNS responses
tMM4 timer waiting for all MM4 responses
tMM1 timer waiting for all MM1 responses
tMM3 timer waiting for all MM3 responses
tMM7 timer waiting for all MM7 responses
mtCorrelationIdExpire Guard timer for MT-CID
spsProfileAudit Guard timer for userData from HSS/SPS
tShRace Within the timer, IMP that user registered at is the owner of the user’s registration
Important Timers For RMS Functionality
settingForRms
1. correlationTimerStatusReportVP: min(ValidPeriod in original SMS, correlationTimerStatusReportVP) for waiting for
StatusReport
ss7MapAcn
1. operTimer: it guards the response from network for an outgoing request.
ss7MapConfig
2. guardTimer: it guards the response from application for an incoming request.
ss7MapGeneral
3. ss7MapGeneral.operationTimer: it’s an internal timer to prevent operation hung.
4. ss7MapGeneral.dialogTimer: it’s an internal timer to prevent dialogue hung.
ss7MapGeneral.dialogTimer > s7MapGeneral.operationTimer > ss7MapConfig.guardTimer AND ss7MapAcn.operTimer
• For outgoing dialogue, application timer > ss7MapAcn.operTimer
• For incoming dialogue, application timer < ss7MapConfig.guardTimer
kpireport
/home/mavstats/bin - kpireport executable
/home/mavstats/cfg - XML config files
/home/mavstats/incl - perl module include
files
kpireport
kpireport –-list
kpireport –minutes 20 -r 501_HSS_I_CSCF_DIAMETER
mavstasts – cfg kpireport.HSS.xml
kpireport –r LDAP_REQUESTS_PER_PEER
root@0-9:/root> cat /usr/IMS/exports/exports-version
wrlinux_4-exports-150513-1_3_3_0
SW Version
• To Check OS (Distro) version
root@0-9:/root> cat /etc/mav-release
wrlinux_4_3-radisys_atca4600-140819-2_1_3_0
wrlinux_4_3--150929-2_1_3_9
• To check exports version
root@0-9:/root> cat /usr/IMS/exports/exports-version
wrlinux_4-exports-150513-1_3_3_0
• To check build number
root@2-3:/root> /usr/IMS/current/bin/program_version.sh -l
pm: hss-c.4.2.9.6
dbMonMgr: hss-c.4.2.9.0
mcmserver: hss-c.4.2.9.6
SM: hss-c.4.2.9.1
evmserver: hss-c.4.2.9.6
emsserver: hss-c.4.2.9.0
tmmserver: hss-c.4.2.9.6
auditMgr: hss-c.4.2.9.0
backupMgr: hss-c.4.2.9.7
KPI Output
Databases
Active Standby
• MySQL Database server AM AM
Health Health
• Resides ONLY on ADM/AM blades
System System
Config Config
• Types of databases include:
Billing Billing
• mnode_cm_data Manager Manager
• All Configuration (System and Office Parameters) Log
Server
Log
Server
• EMS EMS EMS
Server Server
• EMS related configuration
• Users, active alarms, temporary storage for stats Replication
• Replication
• One way, Active to Standby ADM SLAVE
MASTER
• Role decided by dbMonMgr
• Standby ADM initialization
MySQL - Replication
• Replication is started by the application and not stopped with the it (That is, to stop database syncing, you
need to restart the service with "service mysql restart").
Software Logs on mOne AM - S
AM - A
mlogc
mlogd
• Log Daemon – mlogd mlogd
• Gathers logs from applications SQL
• Optionally writes to a binary log file SQL
TP - A TP - A
• Log level set from /usr/IMS/exports/blade.cfg
• Log Client/Utility - mlogc IP A.A.A.A IP B.B.B.B
• Ability to convert binary log (generated by mlogd) to mlogd mlogd
text format
• Ability to gather real-time text logs from mlogd RM - S
• For example: Run mlogc on Standby AM to gather text logs from RM - A
all cards. mlogd
TP - A TP - A
IP D.D.D.D IP E.E.E.E
mlogd mlogd
Software Logs on mOne
• Log Daemon – mlogd
• Gathers logs from applications
• Optionally writes to a binary log file
• Log level set from /usr/mOne/exports/blade.cfg
• Log Client/Utility
• Ability to convert binary log (generated by mlogd) to text format
• Ability to gather real-time text logs from mlogd
For example: Run mlogc on Standby AM to gather text logs from all cards.
AM AM
Active Standby
RM mlogd mlogd CE
Active RM CE
Active
Standby Standby
mlogd mlogc mlogd
mlogd mlogd
Troubleshooting Scenarios
• SIGTRAN Message failures in deliver:
• On ADM, go to /root/mCon/bin
• ./getsctp.sh – gives us the view of the ss7 services and sctp/m3ua link status.
• If the links are DOWN or in Inactive state, the operator must contact Mavenir TAC/Operations
• If All SIP traffic is degraded or during SIP outage:
• Check if sipMgr on SI/TP is running fine
• Check the memory Utilization and CPU utilization of sipMgr
• Repeat the same for rmsMgr
• Check if RM cards status is OK in “readShm” result
• Check the NAT rules are OK for sip traffic EGRESS and INGRESS directions
• Check kpireports for SIP and check for Timeout or not
• Contact Mavenir TAC/Operations
Troubleshooting Scenarios
• If diameter Traffic is affected:
• If UDR/PUR traffic is degraded in successrate, check if the timeouts has increased from kpireports
• Check if all the cards are in running condition
• Check if the PNR is still OK. And Only UDR/PUR are affected
• If it is a case of timeouts, contact DRA/HSS vendors
• Use netstat and make sure, the diameter links are established and are stable on IP-SM-GW
• If a SC card is out of service:
• Check readShm status from the active ADM
• Confirm the card is really Out Of Service
• Check if the card can be accessed from ADM using ssh
• If it is accessible, try to start service “service IMS restart”
• Recheck readShm from Active ADM.
• If it is still not up, contact Mavenir TAC/Operations
• The startupLogs need to be checked at /data/storage/log dir on the SC/CE