Informatics
Lecture 3
Trust and validity of information
What can we trust in the IT world?
The trustworthiness of computer
systems
secure, dependable (reliable and
available), correct, safe, private, and
survivable
The trustworthiness of e-Commerce
Electronic payments.
What is there to trust in the IT world?
The trustworthiness of people?
Cyber criminals, human error
The trustworthiness of the source
or evidence?
The quality of the information you are
accessing
We shall focus on these last two
The Corporate Data Perspective
Decision makers need data
Data: facts
Information: organised data
Knowledge: experiential
Internal: corporate data, sales
External: commercial, satellite,
internet data etc.
Once organised they need to be
interrogated SQL or Data Mining
and be presented.
Data Collection, Problems, Quality
Collection: the previous lecture
Problems: Incorrect, untimely,
inappropriate for analysis, required
data may not exist
Quality obviously affects the decision
process e.g. fidelity, timescale
Sources of Quality Problems
Data entry
Changes to source systems
Data migration or conversion
Mixed expectations by users
External data
System errors
Customer data entry
Data Quality factors revisited
can it be directly applied?
is it stored in predictable way?
does it represent reality?
is the info close enough to true?
what is the maximum accuracy?
What about these sources?
What would be your level of trust in:
Academic databases
Wikileaks
Wikipedia
Twitter
TripAdvisor.com reviews
BBC News
Sales pitch
Your own companys data
Opponents information in times of conflict
Well designed web site
Poorly designed web site .. Etc.
Crowdsourcing
The Wisdom of Crowds is
increasingly being used to obtain and
analyse data:
Get people to analyse huge data
sets
Get a crowd opinion on a problem
Monitor crowd behaviour e.g.
predict a flu outbreak from Google
searches
Evaluation of Evidence
Weighing evidence is basic to its
credibility
3 steps to evaluation
The source
The method of communication
The evidence itself
The basic principle is to keep it
simple Occams Razor
Source Evaluation
Need to answer 3 questions
Is the source competent?
Expertise required. Are you able to describe
the source?
Did the source have the access
needed?
The sources claim to access is credible?
Is there a vested interest or bias?
Is HUMINT being sold to the highest
bidder?
Communications Evaluation
How did the evidence arrive?
The accuracy decreases with length
of chain
Analyse the channel itself
Is the information being intentionally
provided?
Is it true or deception?
Is it for the opponent?
X can become a fact through validity
creep
..may.. Possibly .. Probably .. IS!
Credentials of Evidence
Credibility: how believable is it?
Reliability: consistent, replicable,
corroborated?
Inferential value: what weight does it
carry think about the motives of the
source and is the data relevant to the
problem?
Pitfalls in Evaluation
There are at least 7 pitfalls to avoid in
weighing evidence
1. Vividness weighting
Statistics least persuasive, then text, with
video most persuasive to decision-makers
2. Weighting based on the Source
Downplaying the value of open source
data
Pitfalls
3. Recent Evidence
Should the most recently acquired
evidence have the highest weight?
4. The Unknown
How to judge a question when evidence is
absent or little is known
5. Trusting Hearsay
Do you trust the words of a person? What
if the target knows they are being
monitored?
Pitfalls
6. Expert Opinion reliance
Can we rely on their opinion to be
objective?
7. Premature Closure
Tendency to affirm existing beliefs
instead of discrediting them
Calibration of data
It is possible to essentially test the data
being supplied by the use of
calibration:
Release a dataset and wait for it to
return check the contents
Insert a dataset with a known
outcome into the analysis chain and
monitor the effect
Set a honeytrap with a false dataset
Denial, Deception and Signalling
Is the analyst seeing what the opponents
want him/her to see?
Denial and deception (D&D) are core to
counterintelligence
In some cases they are a major weapon
against those with access to sophisticated
technology
Whereas signalling is the opposite in that
the opponent is sending a deliberate
message
Denial
Comms and radar can be denied to
SIGINT
Intermittent operation, using land lines,
encrypting or jamming the SIGINT with
interfering signals.
IMINT denial
Using camouflage or masking techniques, going
underground, operating during darkness or
cloudy weather. Also cleaning up chemical
emissions to deny spectral imagery collection.
Deception
Passive deception deploys decoys
Dummy ships, missiles, tanks etc
Active deception includes misinformation,
misleading activities, double agents
E.g. criminal groups have developed
deception strategies to evade international
restrictions on narcotics and arms
trafficking
i.e. hiding financial transactions using
intermediaries
It happens in business too but the trick
here is to deceive the opposition, NOT the
public
The Defence against D & D
Your best strategy is to deny access
to your intelligence capabilities
Protection of intelligence applies to
the product and the sources and
methods
Protection is higher for COMINT (e.g.
phone tapping), less so for IMINT
(e.g. drone) etc. OSINT has none
because it is not classified at all
Higher Level D & D
Deception has to be subtle (not
recognised by opponent) but not too
subtle (missed by opponent) sufficient
to provoke action
Many countries are using Multi-INT in
their deception strategies perception
management
If opposing intelligence makes an initial
wrong estimate then longer term
deception is easier the opponent then
faces undertaking an unlearning process
to take out these effects
Social Engineering
The psychological tricking of
legitimate computer system users to
gain information
Phishing attacks are now routine for
most users
Spear phishing attacks are getting
more sophisticated with the use of
social network data
Misinformation
Propaganda has long since been
used by governments or other
groups to influence a population
Classic cases during conflict and
war
In cyberspace we need to guard
against unauthorised access, web
defacement and even open source
postings
Countermeasures
Preventing the human hack is in its infancy
We need systems that can raise alerts with
information sources
Single sources
Spotting spoofs of servers using metrics
Some historical model of the source
The style of writing; linguistic cues
Multiple sources
Calibration against other sources
Stats of reliability and overall trustworthiness
E.g. amazon vendors