Improving the DevOps Metrics that
Matter with Cloud Native Patterns
FEBRUARY 28, 2019 PATRICIA JOHNSON
Metric One: Throughput
Companies can measure the speed and efficiency of delivering value (whether that’s a
new feature or a bug fix) to their customers in a number of ways. In the State of DevOps
report, the authors focus on throughput metrics, which include:
● Lead time for the change: how long it takes from code commit to production run
● Deployment frequency: how often you deploy to production
You could also include metrics around “developer work”—producing smaller, modular
batches of deployable code more often. After all, if development is not agile, then your
delivery certainly won’t be.
How do organizations achieve these types of results? The actual work of improving
throughput can include:
● Self-serve developer experience—It starts with the code. Do developers have
easy access to the libraries and services they need to create cloud-native apps?
Your team will also need to automate their build and test process through
continuous integration (CI) and have instant access to development and test
environments that are at parity with production.
● End-to-end automated CI/CD pipeline—Automating every facet of delivery,
from development hand-off to production, is critical. Are you automating
deployments and testing? Do you automate time-intensive tasks like feedback
notifications and change tickets? Through a visible end-to-end pipeline, Dev and
Ops teams can work together to get code to staging and production several times
a day.
● Shift-left testing, database changes, security scans—Does your pipeline stall
waiting for testing, the DB team, or the security team? Do you attend change
review meetings? You should integrate and automate these activities in your
delivery pipeline. This way, they can be done earlier and in parallel to other
activities.
Metric Two: Stability
As your team speeds up software delivery, they can also reduce errors and remediate faster.
Does this conclusion surprise you? It shouldn’t. The State of DevOps report shows that as high
performers increase their speed, they simultaneously improve their stability. The report
measures stability through two metrics: mean time to repair (MTTR) and change failure rate
(changes that degrade service to the point of remediation or that cause failure).
What drives software stability?
● Cloud-native architecture—Are you sitting on a complex, monolithic application
portfolio with a mandate to move to the cloud? You can start by replatforming
suitable applications to run on the cloud. But to take full advantage of cloud
infrastructure, you must modernize those monoliths as loosely coupled,
lightweight microservices that follow 12-factor principles.
● Bullet-proofed CI/CD pipeline—Make your pipeline the standard, reliable path
to production. It starts with automating the manual tasks in your application
delivery process to remove human error. Does your pipeline include integrations
with process checks, approval gates, and final artifact testing? All of these steps
validate production readiness. Are your pipelines declarative (“as code”) for
easier traceability, version control, and remediation? They should be!
● Immutable, always secure infrastructure—Are you worried that your servers
are behind on security patching? If you’re not applying patches as soon as they
are issued, your systems and customer data are at risk. With an automated
patching process, you can sleep that much easier at night. You can use
deployment automation to regularly “repave” your infrastructure during business
hours to fight against advanced persistent threats. By deploying new
infrastructure every time you deploy a new application, deployments are more
reliable and predictable. You can be more confident in your releases.
Metric Three: Availability
Availability is a new area of measurement in the State of DevOps report for 2018. It’s a
recognition that the ultimate marker of software delivery performance is that users can
reliably access an application or service.
Of course, metrics like availability opens the door to thinking about software delivery
through an operational lens. After all, continuous delivery is never really done.
Production provides valuable insights for improving the customer experience.
How do you deliver availability and accessibility for your customers? These practices
can help:
● Low-risk deployment strategies—Start your applications off right by de-risking
the cut-over to live production with deployment strategies that gently move
applications into full production run. Are you leveraging automated blue/green
and canary deployments to minimize risk and downtime? Are you able to
roll-back automatically when a deployment does not succeed? Use safe
deployment strategies to increase confidence and reduce risk. This approach
boosts your throughput while offering a path to continuous deployment when
you’re ready.
● High-availability (HA) by design—Does your infrastructure include
redundancies? How does your stack handle failure at the availability zone, VM,
application and process level? How do you monitor and manage the health of
your services? Redundancy and distribution minimize downtime during ongoing
operation, security updates, and platform upgrades.
● Dynamically scalable applications—Scaling up or down without disruption
keeps everyone happy—especially during peak traffic times. Can you
automatically scale up to handle a big traffic event? Can you automatically scale
back down to save costs? Having an elastic infrastructure, based on thresholds
you set, helps keep your apps online in the face of unpredictable traffic.
Don’t Forget Culture: The Leading Indicator of
DevOps Performance
Tools and tech are essential to becoming a high performer in software delivery. While it
may be harder to define a clear measure for culture, there are many indicators of what
drives a happier, more productive culture.
For example the State of DevOps report talks about outsourcing as an anti-pattern to
high performance, because of added overhead and greater functional divide. Instead,
cross-functional teams and agile practices are correlated with better performance.
It’s also clear that implementing continuous delivery (defined in the report as “Technical
practices in delivery and deployment that reduce the risk and cost of performing
releases”) boosts team morale and performance improvement. Better visibility across
teams and faster feedback loops can help break down silos.