Informatics
Lecture 2
Information Sources
Introduction
This lecture is concerned with the
sources available to us for the huge
variety of information that could be
useful.
Later lectures will consider how to
process the information and prioritise
it.
To begin with consider why we are
collecting information in the first place
Why
We are presumably collecting data in
order to inform a decision that has to
be made by an organisation,
individual or government:
What products to manufacture
What items to market and to who
Make financial projections
Should we monitor an individual
Should we arrest an individual
And so.
The issues associated with this are
varied:
There is often no shortage of data
There is often too much
Its rare that the decision is obvious
Some data will contradict
Some data is simply incorrect
Some data is being hidden from us
The range of sources huge
Publications, books and reports
Retail data
Personal data NHS, education
Social media
Government passports, registers
Gadgets mobile phones etc.
Media news and viewer data
CCTV
Emails, texts, photos, twitter etc.
Off the grid
This range shows how difficult it can
be to be regarded as truly off the grid
Think of a typical day in your life and
the data that you are creating each
second that can be linked to you.
For this reason countries where
government has broken down are
always attractive to criminal elements
Type of content
The range of data sources is very
varied and so as a result is the range
of type of data a limited number of
examples could include:
Numeric quantities to calculate
totals and averages
Text names of people and
objects, significant words
Type of content
Locations and times derived from
GPS data
Video records of events place
people in time and place
Photos also place people,
relationships and identify locations
Narrative opinions and views
Dialogue deception, grooming
Fidelity of data
It is often thought that digital (binary)
data can be stored and transmitted
without any loss of fidelity
We must remember however that a
lot of data originates either from
sensors (e.g. cameras) or is
compressed to reduce file size
This can lead to a loss in fidelity and
so uncertainty
Example - CCTV
The UK is awash with CCTV cameras
but many of them are of such poor
quality that identifying an individual is
very difficult
This is improving and high quality HD
cameras are becoming more
affordable and less storage is being
done on recycled VHS tapes
CCTV
CCTV and photo
Audio
People can be recognised from
their voice and words can be
identified from the dialogue
Most speech (phone etc.) is heavily
compressed to save space and this
can compromise the processing
Of course it is possible to
deliberately disguise a voice with
gadgets made for the purpose
Audio - dialogue
It is possible to capture a
conversation and analyse this for a
number of items of interest:
Age of the speaker
Nationality accents etc.
Angry, stressed, frightened..
Expressing a view or opinion
Being ironic etc.
Ethics
Of course it is possible to gather data
that is in the public domain and this is
increasingly useful
Most governments can covertly
monitor their citizens sometimes
after a legal application
There are of course ethical issues in
gathering data without the knowledge
of the individual see later lectures
Ethics
Is it ethical to gather data from an
individual provided that you dont look
at it without legal process?
Official hacking
One important source of data is of
course that obtained by statesponsored hacking
In this way many nations are turning
to an offensive mode of dealing with
cybercrime
The Flame virus gathers data and is
20x more sophisticated than Stuxnet
Intelligence analysis
Within both the security and
commercial worlds the analysis of the
masses of data, to extract meaning,
to inform decisions, is becoming more
sophisticated
The remainder of this lecture
considers data sources from the
intelligence analysis viewpoint
Literal and Non-Literal Sources
All data sources can be categorised
as either literal or non-literal
Within these two categories further
classifications occur
A taxonomy of sources has been
developed and this can assist in
giving appropriate weight to each
piece of data
Literal Sources
In a form suitable for human
communication.
Open Source
Human
Communication
Cyber
Open Source OSINT
Publicly available information
Internet
The largest repository (not surprisingly)
But what about the quality?
Intentionally misleading?
Perhaps a good starting point for searching
Is this material easily overlooked BECAUSE it is
not classified?
Online databases
Overload, how to extract useful information
Commercial
Imagery satellites
Commercial databases
All for a price!
archives
Human - HUMINT
HUMINT focuses on humans and their access
to information (takes time to acquire)
Often best method of dealing with illicit
networks or for finding:
Opponents plans, trade secrets, certain indicators
And tip-offs
Often gathered by working with others
Liaison relationships with other intelligence networks
Elicitation (drawing out information from
conversations)
Emigrs (legal) / defectors (illegal)
Clandestine sources (spies, moles etc)
Sampling (e.g. a poll to get an indication of opinion)
Communication COMINT
Generally a governmental thing rather than
private (illegal)
The interception, processing and reporting of an
opponents communications
E.g. voice, fax, data comms, internet, any other
deliberate transmission
Collected by aircraft, satellites, ground bases, sea etc.
Insights into plans/intentions (people, organisations,
financial, facilities, budgets, procedures etc)
Relationships? Classified projects?
Intensive on labour to translate the
communications
Radio comms, code cracking, encryption/decryption
Microphone (wire)/audio (radio), telephone
tapping, bugging, satellite storage (illegal to
use?), liaison relationships
Cyber
Collection from an information system or
network (a mix of humint/comint/osint)
Becoming a rich source of intelligence
E.g. target personnel databases for personal
information and possible recruitment as
HUMINT
Low risk of obtaining it rather than spying
The hacker is the offender (and usually
wins), defense is much harder
Does the defender think like the attacker?
Large systems have more vulnerabilities
Cyber
Gain access, exploit with tools, remove
evidence
Survey possible networks, ping a network
(for vulnerabilities), hack it (install software
backdoors), use backdoors to sustain
collection
Sustained collection uses:
Trojan horses
Worms (entirely concealed)
Rootkits (software to avoid detection)
keystroke loggers
Design social engineering attacks
Nonliteral Sources
Require human interpretation
Remote and In situ
Imaging
Radiofrequency
Radar
Geophysical / Nuclear
Material and Materiel
Biometrics
Remote and In-situ
Remote Sensing from satellites of Earth
or vice versa
Can cover large areas quickly
In-situ sensors detect changes in water,
air, earth immediately in the vicinity of the
sensor
E.g. an aircraft carrying such a sensor to
measure effects of a nuclear test on the
atmosphere
They dont possess the broad-area search of
remote sensors i.e. smaller ranges
E.g tracking the trace element signature in
the wake of a submarine
Imaging IMINT
Visible Photography
Camera, aircraft, spacecraft
Open source
Ever zoomed into your own house (or someone
elses) using Google Earth?
Photography/video (handheld)
Imaging radar (mounted on craft)
Electro-optical imaging (not good when cloudy)
Radiometry/spectrometry (heat related
emission)
Spectral imaging (combines above two)
Passive Radio Frequency
These are emitted during normal human
activity
ELINT (Electronic e.g. a motorist detecting
police radar), FISINT (Foreign
Instrumentation)
ELINT useful for tactical intelligence
E.g. tracking a vehicle by pinpointing the
location of the radars they carry
FISINT is telemetry, a means of
deliberately sending signal data back to
ground sites during/after failure of, say,
an aircraft
Radar
Tracking of targets - satellites,
missiles, ships, aircraft, other
vehicles in combat
E.g. a missile trajectory can be
detected by radar
We are all familiar (and thankful) for
radar navigation during inclement
weather
Although the quality of radar imagery
does not match optical it is very good
in poor visual conditions
Geophysical / Nuclear
Collection, processing, exploitation of
environmental disturbances transmitted
through earth, water or atmosphere
E.g. magnetic sensing of vehicles,
submarines
Acoustic (ACINT or ACOUSTINT)
emissions
Seismic intelligence from underground/
water explosions (akin to earthquake
data)
Nuclear radiation detectors
Materiel/ Materials
Usually clandestine and HUMINT
Materials
Particulate, trace elements, effluence,
debris
Nuclear, chemical, biological issues
DNA
Materiel
Equipment, apparatus, supplies
Stealing competitors sample products
Biometrics
Capturing of a persons physical or behavioural
characteristics that identify them
Morse code did this in World War II fist
Face, voice, iris, fingerprint, ear, vein, DNA,
odour!, thermal, gait, hand, palm, typing, soft
biometrics like height, weight, clothes, hair
Research is moving towards multi-modal systems
to integrate the uni-modal systems
Intelligence for border controls in particular
How good are the detection methods?
Summary
Categorisation of data sources
Primary, secondary, tertiary
Storage
Databases
Data warehouses
Social media
Security and business intelligence
Security Sources
Literal: Osint, Humint, Comint, cyber
Nonliteral: measurement
Imint. Elint, Fisint
Radar, photographic
Geophysical
biometrics
Readings
R Clark Intelligence Analysis
Chapter 6 for security part of lecture
Wikipedia is good for definitions of
all acronyms (HUMINT etc) and
further reading