B&C Monitoring Improvement &
Automation Opportunities.
Aug 29, 2024
©LTIMindtree | Privileged and Confidential 2024
Agenda..
Current Monitoring setup
Monitoring Maturity Model
Obstacles and Opportunities
Proposed Improvements
Automation Opportunities
©LTIMindtree | Privileged and Confidential 2024 2
B&C Current Monitoring Setup
Azure
Monitor
Service Bus Recovery Recovery Service Vault Alert
Storage
Service Vault Only Backup Job Status is
account
Metrics monitored.
Metric
s
Key Vault Private Cosmos DB
endpoints
Application & API Alert
Activity App Availability is monitored
Application Metrics Logs
Insights SMART
Detecti
on
App Service App Application
plan Service Insights
VM Alert
Log • VM CPU Usage [90% . 85%,
Analytics 80%]
VM Insights workspace • VM Memory Usage
Diagnosti • Disk Space Usage
cs Metrics &
• VM Availability
Logs VM Data
Virtual SQL Server
Collection
Machine Azure Cloud Rule
Defender
©LTIMindtree | Privileged and Confidential 2024 3
Monitoring Maturity Model
Optimized:
Proactive: Comprehens
Reactive:
Initial: Basic Advanced ive
Monitoring
Monitoring Monitoring Observabilit
for
with Limited and y and
Troubleshoo
Visibility Predictive Continuous
ting
Capabilities Improvemen
t
• Basic Metrics Collection • Enhanced Metrics • Comprehensive Metrics • Full Observability
• Manual Log Analysis Collection and Logs • Machine Learning
• Limited Alerting • Basic Log Management • Automated Log Analysis Insights
• No ITSM Integration • Incident-Based Alerting • Predictive Alerting • Automated Responses
• No Event Correlation • ITSM Integration • Advanced ITSM • Continuous ITSM
• Manual Event Correlation Integration Integration
• Automated Event • Advanced Event
Correlation Correlation
©LTIMindtree | Privileged and Confidential 2024 4
Obstacles & Opportunities in current monitoring setup
• Insufficient Oversight of Key Vault, SA, Service Bus, and Cosmos DB
• Enhanced monitoring of App & key Infra services
Obstacles • Integration between ticketing and monitoring tool
&
• Alert Isolation
Opportunities
• Awareness on impact of Critical Components
• Dependencies on Application Teams
©LTIMindtree | Privileged and Confidential 2024 5
Proposed Improvements.
Enable additional monitoring for Key vault, Enable additional monitoring of Application • Benefits Lack of Awareness of critical components
Service Bus, Cosmos DB & Storage account Service Automated Incident Management thereby, improper categorization of alerts.
to alert on Availability, Latency , Saturation Build Comprehensive Dashboard to Improved Visibility and Monitoring Dependency on Application team to address
& Error metrics. provide Real-time Visibility Faster Issue Resolution critical issues.
Benefits: Monitor and respond to potential Trend Analyse on usage, Request & Enhanced Collaboration Regular Reviews
security incidents, log activity on SA, Response time, http Server Errors Proactive Problem Management Training
optimize performance & track usage of & exception
Service bus. Compliance and Reporting
Improved response time Granular
Benefits: Optimized usage of App service
monitoring of system processes and reduce
and auto scaling. Proactively identify issues potential system outages.
related to SKU Capacity and take actions.
Insufficient Enhanced Integration & Enhanced
Oversight monitoring alert Isolation Collaboration
Use AIOPS Capability for:
Dynamic Threshold
Anomaly Detection
KQL Query based Trend Analysis on
Capacity usage.
Use Application Insights to gauge User
impact analysis during outages.
AI Ops
©LTIMindtree | Privileged and Confidential 2024 6
Automation Opportunities
• Capacity Monitoring. • Patch Management
• Service Availability • Backup Management
Monitoring • Cost Management
• Batch Job Monitoring • Security Management
• Reporting • Password Management
• Performance Monitoring
Cloud Cloud
Monitoring Maintenance
Service
Request
Operations
Management
Management
• User Access Management • Release Management
• Environment Provisioning & • Service Continuity
Updates Management
• Configuration updates • Change Management
• Website Management • Configuration Management
©LTIMindtree | Privileged and Confidential 2024 7
Thank You
©LTIMindtree | Privileged and Confidential 2024
Automation Opportunities continued..
Automation Area Process Current state Proposed Automation approach
RBAC Sheets got Defined , have RBAC based 1. Implement RBAC based pipelines across
User Access Management
pipeline for access grant Project
Environment Provisioning are done using 1. Convert ARM Template base scripts to
Request Management Environment Provisioning/Update
IaC( ARM Template, Biceps, Terraform) Biceps based
Resource Configurations are updated using
Configuration Update
ansible scripts For new requirements, build the Ansible
scripts to fullfill the need.
Release Management are deployed via
Release Management Pipeline 1. Completion of Data Pipelines
Application Changes are script driven, have
Change Management
change request to track
Configuration Management
Service now based Incident Management in
Incident Management
Service Operations Management Place
RTO & RPO are Defined for Each 1. Design and Implement DR Setup for all
Service Continuity Management Application, have IaC Script for Provisioning Application
and Confguring Environment in case DR
Service Level Management Service now based Tickement manage
©LTIMindtree | Privileged and Confidential 2024 9
Automation Opportunities
Automation Area Process Current state Proposed Automation approach
Use AIOPS Capability for:
- Dynamic Threshold
- Anomaly Detection
- KQL Query based Trend Analysis on Capacity usage.
Cloud Monitoring - Use Application Insights to gauge User impact analysis
Capacity Monitoring Alert Based Notification on Monitoring Thresholds during outages.
Service Availability App Insights, Logic App based Monitoring In Place
Reporting No existing report mechanism Custom Dashboards for Overall View
Batch Job Monitoring RedGate based Alerting on SQL job in Place
Make use of Azure Update Management for Applying Automation of Post Verification Steps on Application
Patch Management
Patchess, SoP Driven Steps for Post Verification Checks Availability
Make use of Azure Recovery Service Vault base Backup for
Backup Management Resource Level backup, Database backup. Recovery
Service Vault has in-built capability to Notify on Job Failure
1. Project Wise Dashboards on Cost Trends
2. Automation scripts to Deduct Cost Optimization on
areas like
Reduced Logging
Make use of Munichre's Centralized Team on Cost Unused resources
Cost Management Optimization Recommendation, all the recommended
Cloud Maintenance Reduction in size based on usage
steps were Using Reserved instances for cost saving
1. Current Scripts on Security Recommendation are
Security Recommendation by Central Team are Standalone; we can update the IaC scripts (Biceps) to
Security Management
implemented using Standalone Powershell Scripts include the Securiy Recommendation too during build
phase.
1. Extend the Current Password Management Automation
Password Management Automation in place to notify on Password Expires. to be used for all applications and tune it as per
application challenges
©LTIMindtree | Privileged and Confidential 2024 10