Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views76 pages

1.introduction CPS Security

Cyber Physical Systems (CPS) integrate computational and physical elements to enhance system capabilities, safety, and efficiency. Security in CPS is crucial due to the potential risks to human lives and infrastructure, necessitating continuous monitoring and robust protective measures against various cyber threats. The document outlines the importance of confidentiality, integrity, and availability in CPS security, along with the need for tailored security strategies that address the unique challenges posed by the convergence of cyber and physical systems.

Uploaded by

Rakshith Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views76 pages

1.introduction CPS Security

Cyber Physical Systems (CPS) integrate computational and physical elements to enhance system capabilities, safety, and efficiency. Security in CPS is crucial due to the potential risks to human lives and infrastructure, necessitating continuous monitoring and robust protective measures against various cyber threats. The document outlines the importance of confidentiality, integrity, and availability in CPS security, along with the need for tailored security strategies that address the unique challenges posed by the convergence of cyber and physical systems.

Uploaded by

Rakshith Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Introduction to

Cyber Physical Systems Security


Cyber Physical Systems (CPS)
• A CPS is a system featuring a tight coordination between the system’s
computational and physical elements.

• A CPS uses computations and communication embedded in, and


interacting with, physical processes to add new capabilities to physical
system.
– Control
– Management Cyber Physical
– Protection system system
– Etc.
Data, commands,…

• A CPS is convergence of computation, communication, and control.

• A CPS should be resilient, safe, secure, and efficient.

2
Why CPS?

• By merging computing and communication with physical processes, CPS


brings many benefits:
– Safer and more efficient systems
– Reduce the cost of building and operating systems
– Could form complex systems that provide new capabilities
– E.g. Computer-controlled automotive engines are fuel-efficient and low-emission.

• Technological and economic drivers


– The decreasing cost of computation, networking, and sensing provides the economic
motivation.
– Computers and communication are ubiquitous enables national or global scale CPS.
(eg. national power grid, national transportation network)
– Social and economic forces require more efficient use of national infrastructure.
– Environmental pressures make new technologies appear to improve energy
efficiency and reduce pollution.

3
Why we need CPS?
• Example: Operating States of a System
– Normal
– Emergency
– Restorative

• Normal State:
– Secure
– Insecure
• Hugely depends on the definition of security!
• Emergency State:
– Violation of some of the operating constraints
– Must be back to normal state using corrective action
• Restorative State:
– Restore the operation of the system

4
Why we need CPS?
• What we should do?
– Continuous monitoring of system conditions
– Identification of the operating state
– Determination of the necessary preventative actions
• Insecure
• Restorative

5
CPS Created
Opportunities

6
CPS Security

cyber objects Cyber-physical Physical objects


Interaction security
Sensors App 1: IMD
Physical → Cyber: Healthcare
Monitoring Security
• Sensor data attacks
• RFID tag attacks
Implantable devices • Memory reading attacks
• Log attacks (forensics) App 2: Renewable Energy
computing
(HW/SW) Actuators Cyber → Physical:
Control Security
• Wireless charge attacks;
Smart grid • Close-loop control attacks; App 3: Industrial
• Device coordination attack; Control
• Command misleading, etc.

IMD: Implantable Medical Device

7
Example: Automotive Telematics

• In 2005: 30–90 processors per car for:


– Engine control, Break system, Airbag deployment system, Windshield wiper, Door
locks, Entertainment system

• Cars are sensors and actuators in V2V networks


– Active networked safety alerts
– Autonomous navigation (self-driving cars)

• Future Transportation Systems


– Incorporate both single person and mass transportation vehicles, air and ground
transportations.
– Achieve efficiency, safety, and stability using real-time control and optimization.

8
Vehicle Attack Surface

9
Interaction between Cyber and Physical Behaviors

• Not possible to identify whether behavioral attributes are the result


of computations, physical laws, or together.
• Separation of information science and physical science creates a
divergence in scientific foundations.
• Simple combination of physical process and computational process
will be inefficient and unsafe.
• The cost of certification of complex systems is high.
• Security issues: One side can be attacked through the other side.

10
A Holistic Viewpoint of CPS

11
CPS Security

12
Confidentiality
• Ensuring information is not revealed to unauthorized
– individuals, programs, or processes.
• Data should be secure
– In input/output ports, as well as each node of data transfer
– Some information is more sensitive than others, e.g., in military,
Secret, Top Secret, and so forth
• Individuals personal information such as SIN numbers
• It is not only about unauthorized personnel always
– Privacy
• Confidentiality has different levels:
– Personal: financial, personal revenge,…
– Cooperation's: financial gains
– National Security: damage to infrastructures, …

13
Integrity
• Assuring that accuracy and reliability of information and systems are
preserved

• Any unauthorized modification is prevented.

• Assuring data, resources, are not altered in an unauthorized fashion.


– Modification of stored or communicated data would violate integrity
– Modification of any programs, computer systems, and network connections

• Data may be changed, disturbed, or even destroyed

• In battlefield, recognizance information or command directives are vital

• Cryptography is one way of ensuring integrity (Not always works!!!!) (why)

14
Example of Integrity!!

Operator

Power system

15
Availability
• Ensures reliable and timely access to data and resources to
authorized individuals.

• As example, a system that loses connectivity to its database


would become useless to most users.
– Example: lost connectivity to NETFLIX !!!
– Example: Smart meters Hydro Quebec

• In availability, time is always a constraint


• Denial of Service (DoS) attacks have a wide variety of forms

16
Threat model
• We need to define some vocabulary in context of security
– Risk, exposure, vulnerability,…

• Vulnerability: software, hardware, procedural, or human


weakness that provides a ADVERSARY an open door to a
system to gain unauthorized access to resources.
• Weak passwords, lack of physical security, unmatched
applications, an open port on a firewall, or bugs in software

• Example: Modems, USBs

17
Threat
• A threat is a potential danger to information or systems

• If a specific vulnerability is discovered and someone uses it


against the company or individual, we have a threat.

• A hacker, adversary, or attacker is a threat agent

• An employee making a mistake that exposes confidential


information is an (inadvertent) threat agent.

• The insider who wants to cause harm or steal information


taking advantage of their insider access privileges is a threat
agent.

18
Risk
• Risk is the likelihood that a threat agent being able to take
advantage of a vulnerability and cause the operational impact
• Risk ties together the vulnerability, threat and likelihood of
exploitation, and the resulting operational loss.
• Nothing is 100% secure!!
– Define what, why and against who!

19
Risk
• Risk increases if
– Users are not educated on proper procedures
– An intrusion detection system is not installed or configured
correctly,
– The system is exposed to public
• If a known vulnerability is not addressed, risk goes up
• Threat agents pay attention to discovered vulnerabilities, (they
did not discover it)

20
Asset
• Assets can be

– Physical: computers, network equipment,


embedded device itself

– Information: customer data, proprietary


information, and secret information

– Monetary: Expenses (fine or direct) from a


breach or loss of stock value in the stock
market

– Reputation: Lost due to the poor


perceptions of a company that has suffered
a high visibility breach
• Siemens after Stuxnet

21
Exposure
• Exposure:
– Exposure is an instance of being subjected to losses or asset damage
from a threat agent.
– poor password management exposes an organization to password
capture by threat agents who would then gain unauthorized access to
systems and information

Designed sometimes by engineer to facilitate the maintenance

22
23
Countermeasure
• A safeguard (or countermeasure) mitigates potential risk.

• It could be software, hardware, configurations, or procedures that eliminate


a vulnerability or reduce likelihood a threat agent will be able to exploit a
vulnerability.

24
M. H. Oboudi, M. Mohammadi, and M. Rastegar, “Resilience-oriented intentional islanding of reconfigurable distribution power systems”
Safeguard (Countermeasure)

• Relationships between vulnerability, asset, risk, … form a threat model

• “A threat agent creates a threat, which exploits a vulnerability that leads to a


risk that can damage an asset and causes an exposure, which can be
remedied through a safeguard or countermeasure.”

• Questions:
– What are the asset that you want to protect?
• Or in case of design you want to build
– Who might be interested to attack it?
– How they might attack the system?
– How the attacker can be beneficial from the attack?
– What is your plan to prevent, detect, and respond?
– Are your methods sufficient?

25
Access Control
• Controlling the access to data is one of the major parts of security!!
• Identification: being sure that who wants to access the data or resources.
– It is procedure of ensuring a the identity of a subject (User name or account number)
• Authorization: having a second piece of credential beyond identification
– Ease of faking or spoofing identification
– Stronger defense against social engineering (password, passphrase,…)
• Accountability: Keeping track by identity of what each subject did in the
system is called accountability. (forensics, detection, and to support
auditing.)

26
Security Policy

• A security policy is an strategic plan that clarifies the security state


of a system
– Identify assets the system that is valuable
• Depends on the functionality of system
– Which functionality is affected by security;
– Outline responsibilities ; (In case a large system)
– Prepare and define scope and function of a security team;
– Plan the required response, e.g., customer relations, etc.
– Improving the security state using technical, legal, regulatory, and
standards of due care.

27
CPS Security

• CPS are a critical part of the national cyber infrastructure. Security threats to
CPS pose significant risk to the health and safety of human lives, threaten
severe damage to the environment, and could impose an adverse impact on
the U.S. economy. (Homeland Security, Dr. Nabil Adam, 2010)
• …as the United States deploys new Smart Grid technology, the Federal
government must ensure that security standards are developed and adopted to
avoid creating unexpected opportunities for adversaries to penetrate these
systems or conduct large-scale attacks. (President’s Cyberspace Policy
Review)
Canda’s critical infrastructure Public safety Canada
1. Health
2. Food
3. Finance
4. Water
5. Information and Communication Technology
6. Safety
7. Energy and utilities
8. Manufacturing
9. Government
10. Transportation
28
Internet Accessible Control Systems at Risk

• Is your control system accessible directly from the Internet?


• Do you use remote access features to log into your control system network?
• Are you unsure of the security measures that protect your remote access
services?
• If your answer was yes to any or all these questions, you are at increased risk
of cyber attacks including scanning, probes, brute force attempts and
unauthorized access to your control environment.

ICS-CERT = Industrial Control Systems Cyber Emergency Response Team (Part of US Department of Homeland Security)
(https://ics-cert.us-cert.gov )

2
Security Headlines

2008 2009 2010 2011 2012


3
0
Security Trends

21% threats from


insiders (employees
58% threats from or contractors)
outsiders (e.g.,
hackers) 55% say SCADA /
operational control
systems targeted
most often

2011 CyberSecurity Watch Survey by CERT (Aug 2009 – Jul 2010)

• Insider attacks render cryptographic protection inadequate


• Control systems are prime targets

3
1
Importance of CPS Security

• Cannot simply use conventional, general cyber security


schemes to achieve all CPS protections.
• This is because most CPS security solutions need to be
closely integrated with the underlying physical process
control features.

– Example: antiviruses!!!

32
CPS Components

33
CPS: Systems View

x: state
Physical System

Actuator Sensor

u: input y
Control System

Estimation and control

: Network

34
CPS Network-based Attacks

Physical System

Actuator Sensor

y’ not y:
u’ not u
Sensor
controller
Control System compromised
compromised

Network
jammed

35
Supervisory Control and Data Acquisition System
(SCADA)
• SCADA is an industrial control system that consists of RTUs, PLCs, and
HMIs to control an industrial process.

• Uses: Manufacturing, power generation, fabrication, oil and gas pipelines,


etc.

• RTU: Microprocessor controlled Remote Terminal/Telemetry Unit


• Interface between physical objects and a SCADA.
• Transmits telemetry data to SCADA (e.g. water quality).

• PLC: Programmable Logic Controller


• A computer to control the operation of electro-mechanical devices such as pumps,
motors, switches
• Hard real-time system

• HMI: Human-Machine Interface

36
37
CPS: Medical Device Attack

Pumps can be hijacked by hacking


Implanted device radio signals.

Then the device could be switched off

Dangerous dose of medicine could be


Parameter/algorithm delivered
update computer
Security alerts could be disabled

38
CPS: Medical Device Attack

• More than 2.5 million people rely upon implantable medical devices into their bodies
to help them treat conditions ranging from cardiac arrhythmias to diabetes.

• These embedded electronic devices are connected to some network

• They are becoming part of IoT systems with vulnerabilities inside them

• Pacemakers from several manufacturers can be commanded to deliver a deadly,


830-V shock from someone on a laptop up to 50 ft away

39
https://www.youtube.com/watch?v=YJ8PZeRwweA
Industrial Control System (ICS)

PLC: Programmable Logic Controller


RTU: Remote Terminal Unit https://www.youtube.com/watch?v=xk-d4Bc0xII 40
Attacks on Industrial Control System (ICS)

Infection and data recording Covert sabotage


41
CPS Security Issues
• Increasing complexity can introduce vulnerabilities and
increase exposure to potential attackers
• Interconnected networks can introduce common
vulnerabilities
• Increasing vulnerability to communication and software
disruptions could result in
– Denial of service or
– Compromise of the integrity of software and systems
• Increased number of entry points and paths for
adversaries to exploit
• Potential for compromise of data confidentiality, including
the breach of customer privacy

42
Aspects to Consider

• Adversary models: Restrict the scope; but overly restrictive


assumptions will likely limit their aplicability e.g., in DoS attacks.

• Trust models: Trust in human users and devices, e.g., sensors and
actuators

• “Under attack” behavior: Detection and graceful degradation.

• Independence in component design: Redundant authentication


mechanisms that are independent of each other

43
CPS Security Techniques
• Existing Techniques
• Authentication
• Digital signatures
• Access control
• Intrusion detection

• Need for Enhancement


• How deception and DoS attacks affect application layer
performance (e.g., estimation and control)?
• Intrusion detection and deception attacks in control systems?

44
Traditional versus CPS security
• Traditional
• Confidentiality: Ability to maintain secrecy from unauthorized users.
• Integrity: Trustworthiness of data received; lack of this leads to
deception.
• Availability: Ability to access and use the system being
• CPS
• Timeliness: responsiveness, freshness of data
• Availability: unexpected outages
• Integrity: genuine data displayed and received by the controller
• Confidentiality: Information regarding SCADA not available to any
unauthorized individual
• Graceful degradation
(A Taxonomy of Cyber Attacks on SCADA Systems, Zhu et al., UC Berkeley.)
45
CPS Security Requirements
• Robust to withstand
• Deception attacks
• Denial of service attacks
• Resilient to physical attacks
• High availability (service continuity)
• Defending against device capture attack: Physical devices in CPS
systems may be captured, compromised and released back by
adversaries.
• Real-Time Security: CPS often requires real-time responses to physical
processes
• Collaboration and Isolation: CPS needs to effectively isolate attackers
while maintaining collaborations among distributed system components. In
addition, cascading failures should be avoided while minimizing system
performance degradation.
• Concurrency: CPS is concurrent in nature, running both cyber and
physical processes.
• N-1 criteria for reliability
46
Fault Tolerant Control (FTC)

• Goal: Maintain stability and acceptable behavior in the


presence of component faults by applying physical and/or
analytical redundancies.

• Passive FTC: Consider a fixed set of fault configurations,


and design the system to detect and compensate for these.
• Example: Control in the presence of sensor malfunction.

• Active FTC: Estimate state and fault parameters using


measurements and control data, and reconfigure the system
using different control law.

47
Fault Tolerant Control (FTC)

Desired value
- Controller System

Feedback

48
Example: Electric Power Grid

• Current picture
– Equipment protection devices trip locally, reactively cascading failure.

• Future
– Real-time cooperative control of protection devices
– Self-healing, aggregate islands of stable bulk power
– Coordinate distributed and dynamically interacting participants
– Issue: standard operational control concerns exhibit wide-area characteristics
(bulk power stability and quality, flow control, fault isolation)

49
Smart Grid
Smart Grid: The integration of power, communications, and information
technologies for an improved electric power infrastructure serving loads
while providing for an ongoing evolution of end-use applications.

50
Why Smart Grid?
• Two main drivers: (1) sustainable generation, (2) sustainable return on
investment in infrastructure
• Resilient to failures, disasters, attacks
• Motivates demand response
• Resource-efficient
• Accommodates distributed generation
• Quality-focused

Steve Jetson, “Smart meters – helping industry save money by using energy efficiently,” Sustainability and Technology Forum, 2011.
51
Energy Management System
Energy management system
• The “central nervous system” of a
transmission grid
• A suite of software tools for monitoring,
controlling as well as optimizing generation
and transmission operations

Transmission network Distribution network


Substation

Generators
Advanced
Control center Metering
Substation
Infrastructure
WAN
WAN
EMS SCADA
servers master
Modem Substation bus Residential
Relays Meters consumers
VPN Firewall
DB server Internet Process bus
Internet
Merging units
Industrial
consumers
Corporate LAN Site engineers

Other control centers Cyber intruders

52
Smart Metering Privacy Issue

53
Privacy Possible Solutions

• Robust smart metering design with privacy constraints


• Utility-privacy trade-off design
• Privacy invasion detection/prevention
• Data anonymisation

54
North Pole Toys

• On-line retailer of specialized toys.


• Process: Toy Assembly, Toy Packaging and Toy Shipping
• 2011: Replaced the old manufacturing system with new
automated industrial control system.
• Files are carried on USB sticks from main server to the
workshop; air gap established.
• Attack on the day before Thanksgiving 2011: Instead of one
toy per box, multiple toys were being placed.
• Some empty boxes were being wrapped.
• Initial suspicion: Incorrect PLC code; but the code found to
be correct.
• Discovery: kAndyKAn3 worm had infected the PLC and
the main office computers.

55
References and Videos
• Secure Control: Towards Survivable Cyber-Physical Systems. Alvaro A. Ca ŕ denas Saurabh
Amin Shankar Sastry, The 28th International Conference on Distributed Computing
Systems Workshop, IEEE 2008.
• Common Cybersecurity Vulnerabilities in Industrial Control Systems. US Department of
Homeland Security. May 2011.
• Cyber-Physical Systems Security for Smart Grid. White Paper. Manimaran Govindarasu,
Adam Hann, and Peter Sauer. February 2012.
• Improving the Security and Privacy of Implantable Medical Devices, William H. Maisel and
Tadayoshi Kohno, New England Journal of Medicine 362(13):1164-1166, April 2010.
• Cardenas, S. Amin, and S. Sastry: “Research challenges for the security of control
systems". Proceedings of the 3rd Conference on Hot topics in security, 2008, p. 6.
• Special Issue on CPS Security, IEEE Control Systems Magazine, February 2015
• D. Urbina et al.: ”Survey and New Directions for Physics-Based Attack Detection in
Control Systems”, NIST Report 16-010, November, 2016.
• Henrik Sandberg, Security of Cyber-Physical Systems

• Ralph Langner: Cracking Stuxnet, a 21st-century cyber weapon | TED Talk

https://www.youtube.com/watch?v=CS01Hmjv1pQ
Single Point of Failure
• Single points of failure:
– Specific points of design that can cause the system to fail or be dangerous even if it one
point

• In designing all the life or data critical systems composed of embedded systems
– The system should withstand outage of one device due to Dos Attack
• N-1 criteria
– Consider the correlated multi point failure
– You should not have any assumption about failure
• Assuming computer will crash following fault
– You should not assume the software will not fail
• (dependent and independent to the hardware!)

• Fault containment region (FCR)


– Fault outside can not enter the area
– Fault inside remain inside are kept within the area
– Important for design of secure systems
– Faults inside FCR can have arbitrary bad impact
• Software and hardware

April 8, 2019 57
Example
• In 2000’s Toyota has a problem of unintended acceleration! (UI)
– Caused 89 deaths, hundreds injury, and more than one billion dollars
damages
– Software and Hardware!
– Shared A/D converter

Single point of failure

April 8, 2019 58
Eliminating the Problem
• Having the redundancy for critical systems
– The redundant systems should not have a common point of failure!
– Multiple computation paths
– Crosschecking to detect inconstancy
– Software/Hardware/vendor

• Solutions! Multi-channel
– Multi-channel
– Doer/cheker Doer/checker
• Simpler software/Hardware compared to the doer
• Check process may take some time! => some false data may be transmitted as
well
– Safety gate
• A doer does the computation
• Sends the data to checker
• Checker opens the safety gate if the data is correct

April 8, 2019 Safety gate 59


Eliminating the Problem
• Correlated fault
– If multiple FCR are likely to fail together!
• Common software/hardware design
• Common manufacturing
• Common infrastructure
– Power, clock,…
• Common physical

• Accumulated faults
– Not detected and repaired before next mission

April 8, 2019 60
Product release into the filed

April 8, 2019 61
System level testing

• System test is the last line of defense against shipping products with bugs
– System level acceptance test emphasizes on costumer-type usage
• Is the safety mechanism and watchdog timer turned on or not? (it should never happen)
• “A watchdog timer is used to detect and recover from computer malfunctions. During normal
operation, the computer regularly resets the watchdog timer to prevent it from elapsing. If, due
to a hardware fault or program error, the computer fails to reset the watchdog, the timer will
elapse and generate a timeout signal. The timeout signal is used to initiate corrective action or
actions.”

• Anti-patterns:
– Excessive defects escaped from field testing
– If majority of testing effort is ad hoc exploratory testing
• “Ad hoc is a Latin phrase meaning literally "to this". In English, it generally signifies a solution
designed for a specific problem or task, non-generalizable, and not intended to be able to be
adapted to other purposes.”
– Acceptance test is the only test

April 8, 2019 62
Effective System Testing
• System test should cover all requirements
– Every product requirement is tested
• Ad hoc testing helps but should not be the primary method
– Non-costumer visible requirement should be tested as well
• Specifically non-functional requirements
• Make sure the system does not crash too often !
• Physical testing
– Each bug found in the system is a huge deal
• You should find few ones
– Found bugs in the system test is a process failure
• Make sure this is not a ice-burg (once in while)

April 8, 2019 63
Product testing wont find all
bugs
• Testing a system for the bugs does not make it good
– It makes it less bad!
• If test is the only line of defense, your customers will regularly
complains
• Example: F22
– 360 m$ each!!
– When they passed the time line in pacific, the internal computer
system crushed!!
• No navigation or communications!
• Visually followed back to Hawaii
• “it was a computer glitch in the millions lines of code, somebody made
an error”
• F-16 had same problem, when passing equator, it flipped upside down!!

April 8, 2019 64
Buggy reports
• Role of thumb, 90/10
– 90% of bugs are in 10 % of modules
– Mostly in complex modules
• Bug farms are more than bad code, they may be bad design
– Poorly defined, confusing interface
• When found bug farms, you should not only fix them, you
need to redesign the modules!
• Risk of poor embedded software quality!
– your module fails unit test
– a bug found in peer review
– system fails integration testing or software testing
– system fails acceptance testing
– filed problem report,…

April 8, 2019 65
System test best practices

• Test all system requirements:


– Everything it is supposed to do
– Fault management responses
– Performance and extra functional requirements

• Acceptance test Vs software test:


– Acceptance test is from customer point of view- domain test
– Software test uses internal test interfaces
• Watchdog timer

• Pitfall
– Impractical to get high coverage for system (finding all bugs)
– Test in system level!

April 8, 2019 66
Safety plan
• Anti-patterns
– Normally the safety plan does not deal with software integrity
– It does not link with security plan

• Safety plan:
– Safety standard: pick a suitable standard
– Hazards and risks: Hazard logs, criticality analysis
– Goals: Safety strategy, safety requirements
– Mitigation and analysis: HAZOP, FTA, FMEA,…
– Safety case: Safety argument

April 8, 2019 67
Hazards and
Safety
risks
Goals

Mitigation
approaches

Safety Case

68
Safety standards

• Usually we mean functional safety:


– Example: a switch operates in a case of unsafe
operation
– There are several safety functions in an embedded
system set!
– A Generic point of start is IEC 61508
• Primary functional standard for chemical industry
– ISO 26262 for automotive applications, EN-50126/8/9 for
rail applications, MIL-STD 882 for combat systems, IEC
60730 for consumer product, Do 178 for aircrafts, etc.
• Key element of standard:
– Method for risk determination:
• Usually a safety integrity level (SIL) is assigned
– In IEC 61508, a mishap with huge chance of fatalities will
be rates SIL 4, minor one SIL 1 (SIL 4 requires lots of
redundant paths and attention)
– Life-cycle approach to safety!

April 8, 2019 69
Safety goal
• Safety goal: is the definition of “safe”
– Example: speed control
• Hazard: unintended vehicle acceleration
• Goal: Engine power proportional to pedal position
• Safety strategy: Correct computation of the pedal position
– Engine shut down in case of problem

• Safety requirements:
– Goals are at system level, requirements provide supporting details
– Supporting requirements generally allocated to subsystems
– Example:
• Make sure the engine torque is completely based on the torque curve !

April 8, 2019 70
FMEA: Failure Mode Effect
Analysis
• Idea: start with a component failure, analysis the results and identify hazards
• Significant shortage generating hazards!
– Embedded systems!
– Software (not just working or not working)
• Lots of bugs
– Integrated circuits
– Accumulated failure/concurrent ones

April 8, 2019 71
HAZard and repeatability
analysis (HAZOP)
• Hazard structured brain streaming
– Does the result suggest a hazard?
– Effective start point

• Example: in a PLC system,


– When pressure exceeds 6000psig,
relief valve shall not actuate

• System shall come to complete stop


after activation of emergency stop!

April 8, 2019 72
Hazard and Risks
• Hazard: a potential source of injury or damage
– A potential cause of mishap or loss event (people, property, financial,…)
– Hazard log:
• Captures hazards for a system
• Lessons learned for previous projects
• Legacy, analysis and field experience
• HAZOP: structured analysis method
– Risk evaluation
• Risk= probability * Consequence
– In term of cost, or Pareto graph
• Hard to obtain values for prob. and consequences
• High impact low probable events!!!
• Risk table!
• Correlation to SIL

April 8, 2019 73
Safety analysis and mitigation

• FMEA: Failure Mode Effect Analysis deals


with system reliability (looks forward!)
– Limited since neglect correlated,
concurrent,…
– As result, we can move to Fault Tree
Analysis (FTA) (looks backward)
• OR gates are bad, And gates are good
• Should not have single point of failure
– Should not be a path from leaf to root only
through OR gates
– Computational elements are crucial
– High SIL software techniques should be
deployed

April 8, 2019 74
Safety Case
• Methodical identification of hazards
• Risk evaluation
• Mitigation of risk by analysis the impact and probability
• Design safety requirements against faults and malicious fault
– Example GSN (Goal structuring notation)

• Written safety plan


– Risk, hazard
– Following safety standard
– Independent audit of safety
– Software, hardware, communication!

April 8, 2019 75
Concurrency bugs & race
condition
• Race condition: multiple threads compete
– Computation outcome depend on timing
– Concurrent access to the same variable
– Not accounting for multi tasking

• Anti patterns: If operator types two fast on


keyboard on a 8 second
– unprotected access to the shared variables window, wrong amount was
injected
– Shared variables are time consuming to have access to
– Not accounting for the interrupts and task switching in time analysis
– Ignoring non-reproducible faults

April 8, 2019 76

You might also like