1.introduction CPS Security
1.introduction CPS Security
2
Why CPS?
3
Why we need CPS?
• Example: Operating States of a System
– Normal
– Emergency
– Restorative
• Normal State:
– Secure
– Insecure
• Hugely depends on the definition of security!
• Emergency State:
– Violation of some of the operating constraints
– Must be back to normal state using corrective action
• Restorative State:
– Restore the operation of the system
4
Why we need CPS?
• What we should do?
– Continuous monitoring of system conditions
– Identification of the operating state
– Determination of the necessary preventative actions
• Insecure
• Restorative
5
CPS Created
Opportunities
6
CPS Security
7
Example: Automotive Telematics
8
Vehicle Attack Surface
9
Interaction between Cyber and Physical Behaviors
10
A Holistic Viewpoint of CPS
11
CPS Security
12
Confidentiality
• Ensuring information is not revealed to unauthorized
– individuals, programs, or processes.
• Data should be secure
– In input/output ports, as well as each node of data transfer
– Some information is more sensitive than others, e.g., in military,
Secret, Top Secret, and so forth
• Individuals personal information such as SIN numbers
• It is not only about unauthorized personnel always
– Privacy
• Confidentiality has different levels:
– Personal: financial, personal revenge,…
– Cooperation's: financial gains
– National Security: damage to infrastructures, …
13
Integrity
• Assuring that accuracy and reliability of information and systems are
preserved
14
Example of Integrity!!
Operator
Power system
15
Availability
• Ensures reliable and timely access to data and resources to
authorized individuals.
16
Threat model
• We need to define some vocabulary in context of security
– Risk, exposure, vulnerability,…
17
Threat
• A threat is a potential danger to information or systems
18
Risk
• Risk is the likelihood that a threat agent being able to take
advantage of a vulnerability and cause the operational impact
• Risk ties together the vulnerability, threat and likelihood of
exploitation, and the resulting operational loss.
• Nothing is 100% secure!!
– Define what, why and against who!
19
Risk
• Risk increases if
– Users are not educated on proper procedures
– An intrusion detection system is not installed or configured
correctly,
– The system is exposed to public
• If a known vulnerability is not addressed, risk goes up
• Threat agents pay attention to discovered vulnerabilities, (they
did not discover it)
20
Asset
• Assets can be
21
Exposure
• Exposure:
– Exposure is an instance of being subjected to losses or asset damage
from a threat agent.
– poor password management exposes an organization to password
capture by threat agents who would then gain unauthorized access to
systems and information
22
23
Countermeasure
• A safeguard (or countermeasure) mitigates potential risk.
24
M. H. Oboudi, M. Mohammadi, and M. Rastegar, “Resilience-oriented intentional islanding of reconfigurable distribution power systems”
Safeguard (Countermeasure)
• Questions:
– What are the asset that you want to protect?
• Or in case of design you want to build
– Who might be interested to attack it?
– How they might attack the system?
– How the attacker can be beneficial from the attack?
– What is your plan to prevent, detect, and respond?
– Are your methods sufficient?
25
Access Control
• Controlling the access to data is one of the major parts of security!!
• Identification: being sure that who wants to access the data or resources.
– It is procedure of ensuring a the identity of a subject (User name or account number)
• Authorization: having a second piece of credential beyond identification
– Ease of faking or spoofing identification
– Stronger defense against social engineering (password, passphrase,…)
• Accountability: Keeping track by identity of what each subject did in the
system is called accountability. (forensics, detection, and to support
auditing.)
26
Security Policy
27
CPS Security
• CPS are a critical part of the national cyber infrastructure. Security threats to
CPS pose significant risk to the health and safety of human lives, threaten
severe damage to the environment, and could impose an adverse impact on
the U.S. economy. (Homeland Security, Dr. Nabil Adam, 2010)
• …as the United States deploys new Smart Grid technology, the Federal
government must ensure that security standards are developed and adopted to
avoid creating unexpected opportunities for adversaries to penetrate these
systems or conduct large-scale attacks. (President’s Cyberspace Policy
Review)
Canda’s critical infrastructure Public safety Canada
1. Health
2. Food
3. Finance
4. Water
5. Information and Communication Technology
6. Safety
7. Energy and utilities
8. Manufacturing
9. Government
10. Transportation
28
Internet Accessible Control Systems at Risk
ICS-CERT = Industrial Control Systems Cyber Emergency Response Team (Part of US Department of Homeland Security)
(https://ics-cert.us-cert.gov )
2
Security Headlines
3
1
Importance of CPS Security
– Example: antiviruses!!!
32
CPS Components
33
CPS: Systems View
x: state
Physical System
Actuator Sensor
u: input y
Control System
: Network
34
CPS Network-based Attacks
Physical System
Actuator Sensor
y’ not y:
u’ not u
Sensor
controller
Control System compromised
compromised
Network
jammed
35
Supervisory Control and Data Acquisition System
(SCADA)
• SCADA is an industrial control system that consists of RTUs, PLCs, and
HMIs to control an industrial process.
36
37
CPS: Medical Device Attack
38
CPS: Medical Device Attack
• More than 2.5 million people rely upon implantable medical devices into their bodies
to help them treat conditions ranging from cardiac arrhythmias to diabetes.
• They are becoming part of IoT systems with vulnerabilities inside them
39
https://www.youtube.com/watch?v=YJ8PZeRwweA
Industrial Control System (ICS)
42
Aspects to Consider
• Trust models: Trust in human users and devices, e.g., sensors and
actuators
43
CPS Security Techniques
• Existing Techniques
• Authentication
• Digital signatures
• Access control
• Intrusion detection
44
Traditional versus CPS security
• Traditional
• Confidentiality: Ability to maintain secrecy from unauthorized users.
• Integrity: Trustworthiness of data received; lack of this leads to
deception.
• Availability: Ability to access and use the system being
• CPS
• Timeliness: responsiveness, freshness of data
• Availability: unexpected outages
• Integrity: genuine data displayed and received by the controller
• Confidentiality: Information regarding SCADA not available to any
unauthorized individual
• Graceful degradation
(A Taxonomy of Cyber Attacks on SCADA Systems, Zhu et al., UC Berkeley.)
45
CPS Security Requirements
• Robust to withstand
• Deception attacks
• Denial of service attacks
• Resilient to physical attacks
• High availability (service continuity)
• Defending against device capture attack: Physical devices in CPS
systems may be captured, compromised and released back by
adversaries.
• Real-Time Security: CPS often requires real-time responses to physical
processes
• Collaboration and Isolation: CPS needs to effectively isolate attackers
while maintaining collaborations among distributed system components. In
addition, cascading failures should be avoided while minimizing system
performance degradation.
• Concurrency: CPS is concurrent in nature, running both cyber and
physical processes.
• N-1 criteria for reliability
46
Fault Tolerant Control (FTC)
47
Fault Tolerant Control (FTC)
Desired value
- Controller System
Feedback
48
Example: Electric Power Grid
• Current picture
– Equipment protection devices trip locally, reactively cascading failure.
• Future
– Real-time cooperative control of protection devices
– Self-healing, aggregate islands of stable bulk power
– Coordinate distributed and dynamically interacting participants
– Issue: standard operational control concerns exhibit wide-area characteristics
(bulk power stability and quality, flow control, fault isolation)
49
Smart Grid
Smart Grid: The integration of power, communications, and information
technologies for an improved electric power infrastructure serving loads
while providing for an ongoing evolution of end-use applications.
50
Why Smart Grid?
• Two main drivers: (1) sustainable generation, (2) sustainable return on
investment in infrastructure
• Resilient to failures, disasters, attacks
• Motivates demand response
• Resource-efficient
• Accommodates distributed generation
• Quality-focused
Steve Jetson, “Smart meters – helping industry save money by using energy efficiently,” Sustainability and Technology Forum, 2011.
51
Energy Management System
Energy management system
• The “central nervous system” of a
transmission grid
• A suite of software tools for monitoring,
controlling as well as optimizing generation
and transmission operations
Generators
Advanced
Control center Metering
Substation
Infrastructure
WAN
WAN
EMS SCADA
servers master
Modem Substation bus Residential
Relays Meters consumers
VPN Firewall
DB server Internet Process bus
Internet
Merging units
Industrial
consumers
Corporate LAN Site engineers
52
Smart Metering Privacy Issue
53
Privacy Possible Solutions
54
North Pole Toys
55
References and Videos
• Secure Control: Towards Survivable Cyber-Physical Systems. Alvaro A. Ca ŕ denas Saurabh
Amin Shankar Sastry, The 28th International Conference on Distributed Computing
Systems Workshop, IEEE 2008.
• Common Cybersecurity Vulnerabilities in Industrial Control Systems. US Department of
Homeland Security. May 2011.
• Cyber-Physical Systems Security for Smart Grid. White Paper. Manimaran Govindarasu,
Adam Hann, and Peter Sauer. February 2012.
• Improving the Security and Privacy of Implantable Medical Devices, William H. Maisel and
Tadayoshi Kohno, New England Journal of Medicine 362(13):1164-1166, April 2010.
• Cardenas, S. Amin, and S. Sastry: “Research challenges for the security of control
systems". Proceedings of the 3rd Conference on Hot topics in security, 2008, p. 6.
• Special Issue on CPS Security, IEEE Control Systems Magazine, February 2015
• D. Urbina et al.: ”Survey and New Directions for Physics-Based Attack Detection in
Control Systems”, NIST Report 16-010, November, 2016.
• Henrik Sandberg, Security of Cyber-Physical Systems
https://www.youtube.com/watch?v=CS01Hmjv1pQ
Single Point of Failure
• Single points of failure:
– Specific points of design that can cause the system to fail or be dangerous even if it one
point
• In designing all the life or data critical systems composed of embedded systems
– The system should withstand outage of one device due to Dos Attack
• N-1 criteria
– Consider the correlated multi point failure
– You should not have any assumption about failure
• Assuming computer will crash following fault
– You should not assume the software will not fail
• (dependent and independent to the hardware!)
April 8, 2019 57
Example
• In 2000’s Toyota has a problem of unintended acceleration! (UI)
– Caused 89 deaths, hundreds injury, and more than one billion dollars
damages
– Software and Hardware!
– Shared A/D converter
April 8, 2019 58
Eliminating the Problem
• Having the redundancy for critical systems
– The redundant systems should not have a common point of failure!
– Multiple computation paths
– Crosschecking to detect inconstancy
– Software/Hardware/vendor
• Solutions! Multi-channel
– Multi-channel
– Doer/cheker Doer/checker
• Simpler software/Hardware compared to the doer
• Check process may take some time! => some false data may be transmitted as
well
– Safety gate
• A doer does the computation
• Sends the data to checker
• Checker opens the safety gate if the data is correct
• Accumulated faults
– Not detected and repaired before next mission
April 8, 2019 60
Product release into the filed
April 8, 2019 61
System level testing
• System test is the last line of defense against shipping products with bugs
– System level acceptance test emphasizes on costumer-type usage
• Is the safety mechanism and watchdog timer turned on or not? (it should never happen)
• “A watchdog timer is used to detect and recover from computer malfunctions. During normal
operation, the computer regularly resets the watchdog timer to prevent it from elapsing. If, due
to a hardware fault or program error, the computer fails to reset the watchdog, the timer will
elapse and generate a timeout signal. The timeout signal is used to initiate corrective action or
actions.”
• Anti-patterns:
– Excessive defects escaped from field testing
– If majority of testing effort is ad hoc exploratory testing
• “Ad hoc is a Latin phrase meaning literally "to this". In English, it generally signifies a solution
designed for a specific problem or task, non-generalizable, and not intended to be able to be
adapted to other purposes.”
– Acceptance test is the only test
April 8, 2019 62
Effective System Testing
• System test should cover all requirements
– Every product requirement is tested
• Ad hoc testing helps but should not be the primary method
– Non-costumer visible requirement should be tested as well
• Specifically non-functional requirements
• Make sure the system does not crash too often !
• Physical testing
– Each bug found in the system is a huge deal
• You should find few ones
– Found bugs in the system test is a process failure
• Make sure this is not a ice-burg (once in while)
April 8, 2019 63
Product testing wont find all
bugs
• Testing a system for the bugs does not make it good
– It makes it less bad!
• If test is the only line of defense, your customers will regularly
complains
• Example: F22
– 360 m$ each!!
– When they passed the time line in pacific, the internal computer
system crushed!!
• No navigation or communications!
• Visually followed back to Hawaii
• “it was a computer glitch in the millions lines of code, somebody made
an error”
• F-16 had same problem, when passing equator, it flipped upside down!!
April 8, 2019 64
Buggy reports
• Role of thumb, 90/10
– 90% of bugs are in 10 % of modules
– Mostly in complex modules
• Bug farms are more than bad code, they may be bad design
– Poorly defined, confusing interface
• When found bug farms, you should not only fix them, you
need to redesign the modules!
• Risk of poor embedded software quality!
– your module fails unit test
– a bug found in peer review
– system fails integration testing or software testing
– system fails acceptance testing
– filed problem report,…
April 8, 2019 65
System test best practices
• Pitfall
– Impractical to get high coverage for system (finding all bugs)
– Test in system level!
April 8, 2019 66
Safety plan
• Anti-patterns
– Normally the safety plan does not deal with software integrity
– It does not link with security plan
• Safety plan:
– Safety standard: pick a suitable standard
– Hazards and risks: Hazard logs, criticality analysis
– Goals: Safety strategy, safety requirements
– Mitigation and analysis: HAZOP, FTA, FMEA,…
– Safety case: Safety argument
April 8, 2019 67
Hazards and
Safety
risks
Goals
Mitigation
approaches
Safety Case
68
Safety standards
April 8, 2019 69
Safety goal
• Safety goal: is the definition of “safe”
– Example: speed control
• Hazard: unintended vehicle acceleration
• Goal: Engine power proportional to pedal position
• Safety strategy: Correct computation of the pedal position
– Engine shut down in case of problem
• Safety requirements:
– Goals are at system level, requirements provide supporting details
– Supporting requirements generally allocated to subsystems
– Example:
• Make sure the engine torque is completely based on the torque curve !
April 8, 2019 70
FMEA: Failure Mode Effect
Analysis
• Idea: start with a component failure, analysis the results and identify hazards
• Significant shortage generating hazards!
– Embedded systems!
– Software (not just working or not working)
• Lots of bugs
– Integrated circuits
– Accumulated failure/concurrent ones
April 8, 2019 71
HAZard and repeatability
analysis (HAZOP)
• Hazard structured brain streaming
– Does the result suggest a hazard?
– Effective start point
April 8, 2019 72
Hazard and Risks
• Hazard: a potential source of injury or damage
– A potential cause of mishap or loss event (people, property, financial,…)
– Hazard log:
• Captures hazards for a system
• Lessons learned for previous projects
• Legacy, analysis and field experience
• HAZOP: structured analysis method
– Risk evaluation
• Risk= probability * Consequence
– In term of cost, or Pareto graph
• Hard to obtain values for prob. and consequences
• High impact low probable events!!!
• Risk table!
• Correlation to SIL
April 8, 2019 73
Safety analysis and mitigation
April 8, 2019 74
Safety Case
• Methodical identification of hazards
• Risk evaluation
• Mitigation of risk by analysis the impact and probability
• Design safety requirements against faults and malicious fault
– Example GSN (Goal structuring notation)
April 8, 2019 75
Concurrency bugs & race
condition
• Race condition: multiple threads compete
– Computation outcome depend on timing
– Concurrent access to the same variable
– Not accounting for multi tasking
April 8, 2019 76