GIS and Spatial Data Mining
Week 2
What is Spatial data?
● Data involving any type of the specific geographical area or location
information is called “spatial” data (or “geospatial” data).
● Spatial data, encompassing Earth Observation, GPS, and mapping
information, plays a significant role in our daily data landscape.
Spatial Data Analysis tools
● GeoPandas, a Python open-source package tailored for geospatial
data science. Built on pandas and other popular Python data science
tools like matplotlib, GeoPandas extends data manipulation
capabilities to include spatial operations on geometric types.
Reading and writing spatial data
● Analogous to pandas transforming input data into dataframes,
GeoPandas reads spatial data and transforms it into
GeoDataFrames.
● Spatial data comes in two main types:
○ Vector
■ Vector data represents geographic features through
discrete geometries — Points, Lines, and Polygons.
■ Point: a single (x, y) point. Like the location of your
house.
■ Line: two or more connected (x, y) points. Like a road.
■ Polygon: three or more (x, y) points connected and
closed. Like a lake, or the border of a country.
■
○ Raster
■ Raster data encodes the world as a continuous surface
represented by a grid.
○ Both types often accompany non-spatial attributes, such as
the name or address of a location.
○ GeoPandas is adept at working with vector data, seamlessly
integrating with other Python packages like rasterio for raster
data.
Spatial Data Exploration
● GeoDataFrames, akin to traditional pandas data frames, offer a
familiar structure, as GeoDataFrame is a subclass of
pandas.DataFrame, inheriting methods, and attributes.
● A distinctive trait is the ability to store a geometry column
(GeoSeries) for spatial operations. While a GeoDataFrame can
have multiple GeoSeries, one column serves as the active geometry,
the basis for spatial operations.
● Each GeoSeries in a GeoDataFrame carries crucial Coordinate
Reference System (CRS) information.
● CRS informs GeoPandas about the location of coordinates on Earth,
crucial for accurate spatial analysis.
● There are two primary CRS categories: Geographic coordinates (e.g.,
EPSG:4326) in degrees, widely used in GPS, and Projected
coordinates for two-dimensional maps, allowing convenient unit
representation (e.g., meters).
● To facilitate analysis in meters, the GeoDataFrame can be
transformed using the `to_crs()` method.