Machine Learning Classification in Qgis

This document is a tutorial on using the Dzetsaka plugin in QGIS for machine learning-based multi-class classification of land cover types using raster data. It outlines the installation process, preparation of training data, classification methods, and validation techniques, including confusion matrices and confidence maps. The tutorial emphasizes the iterative nature of training and classification, highlighting the importance of refining training data for improved accuracy.

Uploaded by

Geo Spatialist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views17 pages

Machine Learning Classification in Qgis

Uploaded by

Geo Spatialist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

MACHINE LEARNING CLASSIFICATION IN QGIS

March 1, 2024
GIS, ai, data science, machine-learning, python, qgis

David Parr, March 1, 2024

Background
In this brief tutorial, we’ll examine machine learning through multi-class classification in the Dzetsaka classification
plugin in QGIS. Dzetsaka was written by Nicolas Karasiak. The Dzetsaka plugin works in QGIS to take raster (often
satellite imagery) data and uses a set of training data to build land type classifications. Traditionally, the tool was
developed to determine different types of vegetation in the landscape, although it works well (with training and validation)
on different types of land covers.
Land cover refers to visible categories of land use in a given area. These could include:
● Tree cover (high coverage >80%, medium coverage, low coverage)
● Water
● Built-up areas
● Grassland
● Settlements / homes
● Shrub
● Bare soil

This map shows land cover in the conterminous U.S. in 2016. Image credit: USGS
The types of land cover you choose will be up to you, depending on the area you’re classifying. In this example, we’ll be
using a high-resolution orthoimage from the USGS EROS Archive at USGS’s EarthExplorer.
This tutorial covers the following steps:
1. Background
2. Installation
1. Install QGIS
2. Install scikit-learn
3. Installing the dzetsaka plugin
3. Preparation
4. Building Region Classes
1. Modifying the Training Data Classes
2. Adding Regions of Interest
5. Classifying using Training
6. Smoothing and Vectorizing the Results
1. Smoothing the Results
2. Vectorizing the Results
7. Validating the Model
1. Confusion Matrix
2. Confidence Map
3. Cross-Validation
8. Next Steps

Installation

Install QGIS
If you haven’t installed QGIS, please download and install QGIS on your Windows, Mac, or Linux computer. The
Dzetsaka plugin should work with recent versions of QGIS.

Install scikit-learn
Before running QGIS, be sure to have the scikit-learn Python module installed in your QGIS path. To do this, follow one
of these methods:
☐ On Windows, open the OSGeo4W Shell and run
python3 - m pip install scikit-learn -U --user

☐ On linux and Mac, open a shell and run

python3 - m pip install scikit-learn -U --user
Installing the dzetsaka plugin
☐ Open QGIS, open the Plugins menu and select Manage and Install Plugins.
☐ Search for dzetsaka and choose Install Plugin.

Preparation
For this tutorial, I’ve prepared a data set ready to use. You can, of course, substitute your own files and the process will
largely be the same.
☐ Download this zip folder, and extract the data into a folder you can access.
machine-learningDownload
☐ To open the project, double-click on the machine-learning.qgz file.
The data consists of the following files:
● oc6iO_37_000_10978801_20130220_0304r0.tif – a 3-band corrected orthoimage of Chapel Hill, NC from
February 20, 2013.
● A prepared roi.shp (Region of Interest) shapefile with three land cover types: roads, buildings, and fields. This file
will hold our training data.
● An example output raster (out.tif) from an earlier run.
● An example smoothed raster (sieve.tif) from an earlier run.
● And an example vectorized, classified shapefile (vectorized.shp) from an earlier run.
All of the data is the tutorial is in the EPSG:2264, NAD83 / North Carolina (ftUS) projection. The cell size of the raster is
.5 feet. It’s important that you keep your projection consistent between data and appropriate for the location that you’re
running analysis on.
NOTE: If you haven’t extracted the data from the zip folder, it may not appear in QGIS. Be sure to extract the data.

Building Region Classes

The sample data includes a Region of Interest file (roi.shp) that has predefined classes in the data. These are meant to help
train the classifier in progress. You can modify / add / remove these training data and classes.
Training data: road, field, building classes.

Modifying the Training Data Classes

The classes in the roi file are defined in an attribute called class. Class itself is an integer attribute in a Value Map which
adds an associated label.
☐ To modify the class types, right click on the roi layer and go to Properties.
☐ Under Properties, go to Attribute Form, and click on the class Field to access the Widget Type.
☐ Add / modify / remove values in the Value/Description box. Be sure that each class has a unique numerical value.
Changing the classes of your training data may make sense depending on the area you’re analyzing, the type of data you
want to discover, and the time of year. For example, this image is from February, so tree cover analysis would be better
suited for an image from a summer month.

Adding Regions of Interest

Once your classes have been defined, you want to take special care to identify regions in the landscape that match the
classes you’re looking for. You can do this by digitizing regions in your roi shapefile.
☐ Highlight the roi layer in the Layers panel. If you don’t have the Layers panel visible, go to View > Panels > Layers,
and make sure it’s enabled.

☐ Enable editing by clicking the Toggle Editing button in the Digitizing Toolbar. If the Digitizing Toolbar isn’t visible, go
to View > Toolbars > Digitizing Toolbar to check it.

☐ Click the Add Polygon Feature button to create a new region.

☐ Zoom into the area you want to create, and left-click around a polygon the area on the map that you want to create. To
finish the polygon, right click.
☐ A menu will popup to add attributes. Add a unique id, the class type, and the name of the class in classname. (The
classname attribute isn’t used in the analysis, so you can use any notes here.)

☐ To remove polygons, you can open the attribute field and delete rows or modify them. If you’ve made any mistakes,
you can remove it from the attribute table or toggle editing off and on.
☐ Once you’re ready to save the layer, click the save button, and then click the Toggle Editing button.

Classifying using Training

The dzetsaka plugin supports several algorithms for classifying the data.
● Gaussian Mixture Model
● Random forest
● Support Vector Machines
● K-Nearest Neighbors
Normally, a supervised classification analysis would be an iterative, cyclical process where training results would be
evaluated and then used to refine the training data so that each pass gets closer to having fewer errors in the dataset.
NOTE: This step can take a LONG time to run. Depending on the size of your data, the algorithm chosen, and the speed
of your computer.
☐ To run a classification, go to Plugins > dzetsaka and open the Classification dock.
☐ The first box, image to classify, is your raster image (oc6i0…)
☐ The second box, Your ROI, is the Regions of Interest shapefile (roi.shp). Be sure that you’ve saved and stopped editing
the file.
☐ The third box, “Column name where class number is stored”, is Class.
☐ The fourth box is the name of the output raster. Click on the three dots button to choose a folder and label an output file
name.
☐ Click on the Settings button to choose a Classifer. Gaussian may be the fastest classifier.
☐ When ready, click “Perform the classification”
When the output finishes, you should see a raster with a range of values equal to the number of classes that you chose. The
raster value is equivalent to the class attribute value in the roi shapefile.
☐ To view the values separately, right-click and go to Properties, and under Symbology, change the Render type to
Paletted / Unique Values, and then click Classify.
The sample run of the classifier using the tutorial data. Blue = roads, Green = buildings, Red = fields.
Evaluating the output from the tutorial data, it hasn’t done a bad job, although there are some obvious problems:
1. It’s overestimated roads and to some extent fields. This may be due to roads having tree coverage or roads having
pixel colors similar to buildings. Using a dataset with more bands (infrared, uv) might provide better
classification, but would also likely reduce the resolution of the results.
2. It looks like it’s underestimated buildings, likely for the same reasons above.
3. Overall, there’s a lot of noise – pixels of one type sprinkled in among pixels of other types. To produce a workable
output, we’ll smooth the data, and then convert to vector.
This is a first pass at classification. It would make sense, then, at this point to revisit our training data and run the process
again.

Smoothing and Vectorizing the Results

Smoothing the Results

It’s normal to have some noise in the output from your classification. To help smooth the data, we can use the Raster
Analysis > Sieve tool in the Processing Toolbox. If you can’t see the Processing Toolbox, go to View > Panels >
Processing Toolbox.
Raster smoothing is the process of using neighbors of pixels to smooth values between data. It’s similar to the process that
video streaming data uses to “smooth” video when network lags occur. Outlier pixels can be smoothed by choosing from
the majority groups of neighbors.
☐ Open the Raster analysis > Sieve tool.

☐ For the input layer, choose the output raster from your classification step.
☐ The Threshold is the number of pixels to be removed. For a noisy dataset, you will need to set this very high – perhaps
100 or higher. This is quite high – fixing the training data and rerunning the classification will also improve this problem.
☐ Set the output file to a new raster.
If you compare the output from sieve to the output from the classifier, you should see a clear difference in noise.
Output from classifier

Output from sieve

Vectorizing the Results
Finally, we can convert the data to a vector, which allows us to edit the data, remove errors, and then use to retrain the
data.
☐ To Rasterize, go to Raster > Conversion > Polygonize (Raster to Vector)
☐ The input layer will be the output from the Sieve function.
☐ The name of the field to create will be “Class”
☐ The vectorized file will be a new shapefile. Run the Polygonize. NOTE: May take a long time!
The results suggest a partial success – let’s evaluate the model and think about next steps.

Validating the Model

Note that there are plenty of limitations with our model and classification.
First, this is a site-specific, time-specific, non-generalized classification method. Different images, locations, or
time-periods will require retraining.

Confusion Matrix
To validate the model, we have several options. One is a confusion matrix, which involves comparing regions of a known
class and comparing them with what the classifier defined them as:
Confidence Map
Djetsaka will generate a confidence map, which is how confident the classifier appears to be on a range of 0 to 100%
based on the training data for each pixel. This doesn’t meant that the results are valid, only that the output matches the
training data. This helps to location potential spatial error in the data.

Confidence Map. Darker colors mean higher confidence.

Cross-Validation
Another technique to test the classifier is cross-validation. This involves comparing the outputs to a known classification –
either another dataset or by leaving some of the training data out of the initial learning process.
Overall accuracy can be calculated from the omitted training data by counting the number of pixels in the region that are
correctly classified vs. the total number of pixels in the region.
Next Steps
Training and classification of land cover data requires patience and practice. There are plenty of other techniques for
image classification available in open source data, including the Semi-Automatic Classification Plugin, used to
automatically find and download satellite imagery and process it.
Other software platforms, including ArcGIS, provide pre-made classifiers that can be used for specific purposes.
Once classification has been verified, then using multiple images over time can be used to determine land cover change.
We can determine some critiques of this technique as well, since it requires skill in determining training data, and, as with
all AI, Garbage In = Garbage Out. But even basic classification techniques can provide a stepping stone to understanding
the landscape and examining change in land.

Coding For Kids Python A Playful Way For - Mark B Bennet
100% (1)
Coding For Kids Python A Playful Way For - Mark B Bennet
143 pages
Classification Tutorial From ENVI
No ratings yet
Classification Tutorial From ENVI
8 pages
Sa1 Frame
No ratings yet
Sa1 Frame
51 pages
QGIS Image Classification Guide
No ratings yet
QGIS Image Classification Guide
16 pages
K9HGCI - Water Quality in Chesapeake Bay
No ratings yet
K9HGCI - Water Quality in Chesapeake Bay
17 pages
HW3 Processing Steps
No ratings yet
HW3 Processing Steps
17 pages
Image Classification With RandomForests in R
No ratings yet
Image Classification With RandomForests in R
49 pages
Image Classification
No ratings yet
Image Classification
10 pages
Lab 6
No ratings yet
Lab 6
11 pages
Chapter 5 Classification of Land Cover
No ratings yet
Chapter 5 Classification of Land Cover
13 pages
Image Segmentation With Kmeans
No ratings yet
Image Segmentation With Kmeans
17 pages
Supervised - Classification 2
No ratings yet
Supervised - Classification 2
6 pages
Shoreline Extraction Tutorial
No ratings yet
Shoreline Extraction Tutorial
25 pages
Report Practical 3 Remote Sensing
No ratings yet
Report Practical 3 Remote Sensing
27 pages
Machine Learning Tech Talk
No ratings yet
Machine Learning Tech Talk
29 pages
Remote Sensing Classification Methods
No ratings yet
Remote Sensing Classification Methods
47 pages
Video Summary
No ratings yet
Video Summary
5 pages
Semi-Automatic Classification Plugin - Tutorial
No ratings yet
Semi-Automatic Classification Plugin - Tutorial
20 pages
TD4 Classification Collect Data
No ratings yet
TD4 Classification Collect Data
8 pages
Image Classification
No ratings yet
Image Classification
8 pages
MHRS Lab Tutorial
No ratings yet
MHRS Lab Tutorial
7 pages
Remote Sensing Image Analysis With R: Aniruddha Ghosh and Robert J. Hijmans
No ratings yet
Remote Sensing Image Analysis With R: Aniruddha Ghosh and Robert J. Hijmans
49 pages
PHD Thesis
No ratings yet
PHD Thesis
200 pages
GEE - 0 GEE Explorer and Supervised Classification
No ratings yet
GEE - 0 GEE Explorer and Supervised Classification
7 pages
QGIS Semi-Automatic Classification Plugin Guide
No ratings yet
QGIS Semi-Automatic Classification Plugin Guide
25 pages
ENVI Tutorial: Classification Methods
No ratings yet
ENVI Tutorial: Classification Methods
16 pages
Classification Methods
No ratings yet
Classification Methods
16 pages
'Digital Image Classification' - 250330 - 115910
No ratings yet
'Digital Image Classification' - 250330 - 115910
38 pages
Tutorial de Clasificación Supervisada de Imágenes de Satétite Con QGIS y R Statistics
No ratings yet
Tutorial de Clasificación Supervisada de Imágenes de Satétite Con QGIS y R Statistics
21 pages
FragScape v2.03 QGIS Plugin Guide
No ratings yet
FragScape v2.03 QGIS Plugin Guide
14 pages
12.6 Advanced Analysis and Classification
No ratings yet
12.6 Advanced Analysis and Classification
21 pages
Image Classification Supervised
No ratings yet
Image Classification Supervised
12 pages
Classifying Landsat
No ratings yet
Classifying Landsat
8 pages
RS T9 Classification
No ratings yet
RS T9 Classification
61 pages
ENVI Classic Tutorial: Classification Methods
No ratings yet
ENVI Classic Tutorial: Classification Methods
26 pages
Semiautomaticclassificationmanual Readthedocs Io Semiautomaticclassificationmanual Es Es Latest
No ratings yet
Semiautomaticclassificationmanual Readthedocs Io Semiautomaticclassificationmanual Es Es Latest
198 pages
Jedidiah GEE4331 LAB6
No ratings yet
Jedidiah GEE4331 LAB6
13 pages
Python QGIS
No ratings yet
Python QGIS
6 pages
Chapter - 19 - Classification of A Landsat Image (Supervised)
No ratings yet
Chapter - 19 - Classification of A Landsat Image (Supervised)
20 pages
Semiautomaticclassificationmanual v5
No ratings yet
Semiautomaticclassificationmanual v5
223 pages
Image Classification Using Python Api, A Case Study of Dhulikhel Municipality
No ratings yet
Image Classification Using Python Api, A Case Study of Dhulikhel Municipality
28 pages
Isprs Archives XLVIII M 3 2023 183 2023
No ratings yet
Isprs Archives XLVIII M 3 2023 183 2023
5 pages
Semiautomatic Classification Manual
No ratings yet
Semiautomatic Classification Manual
180 pages
Semi Automatic Classification
No ratings yet
Semi Automatic Classification
208 pages
028 Sup Class
No ratings yet
028 Sup Class
2 pages
End Term Project CE671 LULC
No ratings yet
End Term Project CE671 LULC
13 pages
End Term Project CE671 LULC
No ratings yet
End Term Project CE671 LULC
13 pages
Decision Tree Classification
No ratings yet
Decision Tree Classification
12 pages
Sem Text
No ratings yet
Sem Text
26 pages
Manual of Applied Spatial Ecology
No ratings yet
Manual of Applied Spatial Ecology
190 pages
Quick Bird
No ratings yet
Quick Bird
7 pages
GIS Raster Analysis Guide
No ratings yet
GIS Raster Analysis Guide
19 pages
Assingment 2
No ratings yet
Assingment 2
3 pages
ENVI Classic Tutorial: Decision Tree Classification
No ratings yet
ENVI Classic Tutorial: Decision Tree Classification
12 pages
Practical: Assignment 05 - Due: Friday, February 24
No ratings yet
Practical: Assignment 05 - Due: Friday, February 24
6 pages
Fragstats Landscape Metrics Guide
No ratings yet
Fragstats Landscape Metrics Guide
13 pages
Image Classification Unsupervised
No ratings yet
Image Classification Unsupervised
8 pages
FSX - P3D Autogen Automatization
No ratings yet
FSX - P3D Autogen Automatization
9 pages
Community Planning For Sustainable Development
No ratings yet
Community Planning For Sustainable Development
2 pages
Guyanas Extractive Industries Issues and Recommendations
No ratings yet
Guyanas Extractive Industries Issues and Recommendations
78 pages
Community Based Enterprise The CIGuyana Experience
No ratings yet
Community Based Enterprise The CIGuyana Experience
2 pages
Annual Report Guysuco 2022
No ratings yet
Annual Report Guysuco 2022
83 pages
Georgetown Disaster Risk and Climate Change Vulnerability Assessment
No ratings yet
Georgetown Disaster Risk and Climate Change Vulnerability Assessment
62 pages
Advanced GIS in Planning
No ratings yet
Advanced GIS in Planning
54 pages
Rupununi Innovation Fund Factsheet
No ratings yet
Rupununi Innovation Fund Factsheet
4 pages
Indigenous People in GUyana
No ratings yet
Indigenous People in GUyana
36 pages
Intrastructure and The Environment Internalizing Costs
No ratings yet
Intrastructure and The Environment Internalizing Costs
2 pages
Annual Report Guysuco 2020
No ratings yet
Annual Report Guysuco 2020
76 pages
Geological Map of Georgia
No ratings yet
Geological Map of Georgia
1 page
Annual Report Guysuco 2021
No ratings yet
Annual Report Guysuco 2021
86 pages
Guyana Photovoltaic Power Potential
No ratings yet
Guyana Photovoltaic Power Potential
1 page
Guyana Global Horizontal Irradiation
No ratings yet
Guyana Global Horizontal Irradiation
1 page
Guyana Mining Site
No ratings yet
Guyana Mining Site
1 page
Hydropower Cost Analysis
No ratings yet
Hydropower Cost Analysis
21 pages
Our Land Our Life
No ratings yet
Our Land Our Life
27 pages
LCDS July 2022 Chapter 1
No ratings yet
LCDS July 2022 Chapter 1
36 pages
Guyana Mining Site2
No ratings yet
Guyana Mining Site2
1 page
Guyana Direct Normal Irradiation
No ratings yet
Guyana Direct Normal Irradiation
1 page
Salish Sea2
No ratings yet
Salish Sea2
1 page
Australia Classical Politic
No ratings yet
Australia Classical Politic
1 page
2024-06-21 First Draft of Zoning Ordinance PDF
No ratings yet
2024-06-21 First Draft of Zoning Ordinance PDF
1 page
Salish Sea
No ratings yet
Salish Sea
1 page
Geological Map of Northern Emirates
No ratings yet
Geological Map of Northern Emirates
1 page
Magnrove Forest Distribution
No ratings yet
Magnrove Forest Distribution
1 page
Flight Path New York
No ratings yet
Flight Path New York
1 page
SAR Canopy Height Model
No ratings yet
SAR Canopy Height Model
2 pages
Surat Undangan Peserta ADIA
No ratings yet
Surat Undangan Peserta ADIA
9 pages
123 624 1 PB
No ratings yet
123 624 1 PB
14 pages
National Conference Hybrid
No ratings yet
National Conference Hybrid
5 pages
DataTables Export Guide
No ratings yet
DataTables Export Guide
2 pages
Decision Theory for Leaders
No ratings yet
Decision Theory for Leaders
12 pages
QMS Internal Audit - 1 Day Trainng
100% (2)
QMS Internal Audit - 1 Day Trainng
104 pages
EMC Engineering Exam Insights
No ratings yet
EMC Engineering Exam Insights
3 pages
E-Invoicing in Malaysia Client Data Request Through Know-Your-Client (KYC) Form
No ratings yet
E-Invoicing in Malaysia Client Data Request Through Know-Your-Client (KYC) Form
4 pages
Critical Book Review Guide
No ratings yet
Critical Book Review Guide
4 pages
B.Arch. Curriculum Map Overview
No ratings yet
B.Arch. Curriculum Map Overview
1 page
MRSPTU M.tech. Mechanical Engg. (Sem 1-4) Syllabus Updated On 19.3.2017
No ratings yet
MRSPTU M.tech. Mechanical Engg. (Sem 1-4) Syllabus Updated On 19.3.2017
15 pages
Create Gantt Chart and Cash Flow Using Excel With A File
No ratings yet
Create Gantt Chart and Cash Flow Using Excel With A File
6 pages
Summary of Learning
No ratings yet
Summary of Learning
10 pages
5 PG TRB Unit 10 Phrases and Patterns
No ratings yet
5 PG TRB Unit 10 Phrases and Patterns
109 pages
EfkaPB2001 TDS
No ratings yet
EfkaPB2001 TDS
2 pages
Air Cadet Pumps Manual
No ratings yet
Air Cadet Pumps Manual
12 pages
Research On The Business Model of Pinduoduo Based
No ratings yet
Research On The Business Model of Pinduoduo Based
6 pages
English MAINS Practice Shot 200
No ratings yet
English MAINS Practice Shot 200
4 pages
Malaysian School Counsellors' Challenges in Job Description, Job Satisfaction and Competency
No ratings yet
Malaysian School Counsellors' Challenges in Job Description, Job Satisfaction and Competency
7 pages
BSC - Microbiology - Sem - 1 (Minor With Practicals)
No ratings yet
BSC - Microbiology - Sem - 1 (Minor With Practicals)
3 pages
by Lord Asa Briggs 2001
100% (2)
by Lord Asa Briggs 2001
430 pages
Additive Manufacturing For 3-Dimensional (3D) Structures: (Emphasis On 3D Printing)
No ratings yet
Additive Manufacturing For 3-Dimensional (3D) Structures: (Emphasis On 3D Printing)
153 pages
Joseph Matthews - The Renegade Rapport
No ratings yet
Joseph Matthews - The Renegade Rapport
21 pages
CHE486 - EXPERIMENT 7 (Film Boiling Condensation) UiTM
No ratings yet
CHE486 - EXPERIMENT 7 (Film Boiling Condensation) UiTM
11 pages
Oscor Blue
No ratings yet
Oscor Blue
6 pages
Definition, Health by WHO
No ratings yet
Definition, Health by WHO
13 pages
Elements of Aeronautics Notes
No ratings yet
Elements of Aeronautics Notes
37 pages
شرح مخطط backup
100% (1)
شرح مخطط backup
31 pages