Brief Introduction to Spatial Data Mining
Spatial data mining is the process of discovering
interesting, useful, non-trivial patterns from large spatial
datasets
Reading Material: http://en.wikipedia.org/wiki/Spatial_analysis
Examples of Spatial Patterns
Historic Examples
1855 Asiatic Cholera in London: A water pump identified as the source
Fluoride and healthy gums near Colorado river
Theory of Gondwanaland - continents fit like pieces of a jigsaw puzzle
Modern Examples
Cancer clusters to investigate environment health hazards
Crime hotspots for planning police patrol routes
Bald eagles nest on tall trees near open water
Nile virus spreading from north east USA to south and west
Unusual warming of Pacific ocean (El Nino) affects weather in USA
Why Learn about Spatial Data Mining?
Two basic reasons for new work
Consideration of use in certain application domains
Provide fundamental new understanding
Application domains
Scale up secondary spatial (statistical) analysis to very large datasets
• Describe/explain locations of human settlements in last 5000 years
• Find cancer clusters to locate hazardous environments
• Prepare land-use maps from satellite imagery
• Predict habitat suitable for endangered species
Find new spatial patterns
• Find groups of co-located geographic features
Exercise. Name 2 application domains not listed above.
Why Learn about Spatial Data Mining? - 2
New understanding of geographic processes for Critical questions
Ex. How is the health of planet Earth?
Ex. Characterize effects of human activity on environment and ecology
Ex. Predict effect of El Nino on weather, and economy
Traditional approach: manually generate and test hypothesis
But, spatial data is growing too fast to analyze manually
• Satellite imagery, GPS tracks, sensors on highways, …
Number of possible geographic hypothesis too large to explore manually
• Large number of geographic features and locations
• Number of interacting subsets of features grow exponentially
• Ex. Find tele connections between weather events across ocean and land
areas
SDM may reduce the set of plausible hypothesis
Identify hypothesis supported by the data
For further exploration using traditional statistical methods
Autocorrelation
Items in a traditional data are independent of each other,
whereas properties of locations in a map are often “auto-correlated”.
First law of geography [Tobler]:
Everything is related to everything, but nearby things are more related
than distant things.
People with similar backgrounds tend to live in the same area
Economies of nearby regions tend to be similar
Changes in temperature occur gradually over space(and time)
Waldo Tobler in 2000
Papers on “Laws in Geography”: http://www.geog.ucsb.edu/~good/papers/393.pdf
http://homepage.univie.ac.at/Wolfgang.Kainz/Lehrveranstaltungen/Theory_and_Methods_of_GI_Science/Sui_2004.pdf
Geographical
information
systems
Text book : “ Concepts and techniques of
Geographic information systems” by Lo and
Yeung, PHI publication
We Live in Two Worlds
Natural World Constructed World
Self-Regulating Managed
. . . These Are Increasingly In
Conflict
contents
• First look : GIS as a digital map storing system
• Second look : GIS as a Geospatial analysis tool
for analyzing the data to obtain the knowledge
• Third look: Integration of domain knowledge with
geospatial data for planning the resources
8
9
Components of GIS
10
Topology of information systems
Geographical information
systems
1) Geographic space and 2) geographic scale 11
Why GIS?
Can handle geographically referenced data or
spatial data as well as non-spatial data
Can handle relational numerical expressions
between these data sets
Ideal for natural resource management
12
Representation of Spatial Data
13
14
The Evaluation of GIS
The Formative Maturing GIS infrastructure
years Technology
15
GIS as infrastructure
Mobile GIS
PC, PDA
Phone
Desktop GIS
ArcInfo
ArcEditor
ArcView Network
ArcReader Virtual Globes
ArcGIS Explorer
Google Earth
Server GIS Virtual Earth
ArcGIS Server
Portal Toolkit
Geodatabases XML
Files DBMS
GIS in Education
Over 7,000 universities worldwide teach GIS
GIS used in multiple disciplines:
Agriculture
Archaeology
Architecture/Lanscape Arch. Geography
Business Geology
Computer Science Meteorology
Environmental Science Oceanography
Engineering Law Enforcement
Journalism Public Health
Military Science History
Natural Resource Management Sociology
Urban/Regional Planning
Agriculture
Farm management
Pest/Disease tracking
Crop monitoring
Yield prediction
Soil analysis
Natural Resource Management
Forestry
Ecology
Mining
Petroleum
Water Resources
Planning and
Economic Development
Land Use/Zoning
Emergency Preparedness
Population Forecast
Market Analysis
Property Tax Assessment
Transportation
GIS: A Framework for Understanding and
Managing Our Earth
Geographic Knowledge
Creating
Measuring
Organizing Holistic
Analyzing
Modeling
Comprehensive
Systematic
Analytic
Applying Visual
Planning
Managing
Acting
Geography matters
Today’s challenges require geographic approach
Climate Change
Urban Growth
Sustainable Agriculture
Water Quality and Availability
International and National Security
Energy
Epidemiology/Disease Tracking
Natural Hazards: Seismicity, Weather Events
GIS users and their relationships
23
The mapping processes
PLANNING
Needs study Project Management
Specifications
DATA ACQUISATION Georeferencing
Surveying Remote sensing
Photogrammetry Scan-digitizing
CARTOGRAPHIC PRODUCTION
Drafting Cartographic design
Printing
Proof Reading 24
Thank You.
25