0% found this document useful (0 votes)

15 views13 pages

Tutorial Raw

Uploaded by

KoustavRoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views13 pages

Tutorial Raw

Uploaded by

KoustavRoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

MicrobiomeAnalyst 2.

0
Comprehensive statistical, functional and
integrative analysis of microbiome data

xialab@mcgill 2023-Mar-02
Tutorial for Raw Data Processing
Background
• Amplicon sequencing has enabled comprehensive profiling of microbial communities, bypassing
traditional wet lab culturing methods.

• Traditional Operational Taxonomic Units (OTU) picking methods work by clustering sequences
based on a similarity threshold (usually around 97%). However, this method tends to introduce
sequencing level errors into the reads due to the arbitrary clustering threshold.

• The Divisive Amplicon Denoising Algorithm (DADA) was introduced to improve the accuracy of
amplicon sequence variant (ASV) inference from high-throughput sequencing data.

• DADA2 uses a statistical model-based approach that corrects these incorporated errors and
infers higher quality and more accurate ASVs, which help improve our understanding of complex
and previously understudied microbial ecosystems.
Overview
• Goal: To provide a user-friendly web-based platform for the raw data processing of marker gene
sequencing data of microbial communities.

• Workflow:
Dereplication to filter
Filtration and
unique sequences Merging of forward Chimera removal to Taxonomy
trimming of reads Estimation of error
and denoising for and reverse reads by filter spuriously assignment from a
bases on quality rates
inference of overlapping. formed reads. chosen database.
profiles.
sequence variants.

• Data requirements:
o Demultiplexed individual fastq files with no primers or any other non-biological nucleotides.
o For paired-end data, the forward and reverse fastq files should have matching ordered names with “_R1”
for forward reads and “_R2” for reverse reads, as shown in the example data.
o Additionally, a metadata file indicating the groups is required to facilitate a streamlined input into the
other MicrobiomeAnalyst modules.

• Other considerations for paired-end data:

o What is the length of the forward and reverse reads? For e.g., 2x200bp
o What was the target region of the 16S rRNA gene that was sequenced and what were your primer
lengths? For e.g., V4, V3-V4, etc.
Data Upload:

Notes:
• You can choose to upload
multiple sequence files at once,
but please upload all files at a
Click “Select” to start time to avoid any potential
uploading your .zip/.fastg.gz exceptions caused by internet
files. connection issues

• A metadata file is necessary

for the downstream analysis
Proceed to the Data
Integrity Check.

Submit to try our example

here.
Data Integrity Check:
Each column gives information about the fastq
files submitted.

For paired-end data cross-

check that each forward read
has a corresponding reverse
read

Check if the
groups are
named
The corresponding correctly.
R script can be
downloaded from
here

Click proceed
This is the most critical step of the entire pipeline where the read
Parameter Settings: quality profiles need to be examined to determine the filtering and
trimming parameters.
Select the type of
marker gene used: Choose the cut-off length for the forward and reverse reads based on the
16S for bacteria, 18S quality profile (see below). This will truncate the reads to a maximum
for eukaryotes, and length, maintaining reads of uniform length which is important during
ITS for fungi. taxonomy assignment.

This is used to trim low quality

bases on the 5’ end (TrimLeft) and 3’
end (TrimRight).

MaxN determines the number of

ambiguous bases allowed. Typically, this is
by default=0 which means no ambiguous
bases would be allowed to pass through.
The expected errors cut- MinQ and TruncQ are used to
off in a read (default-2). respectively filter out bases below a min.
quality score and to truncate reads at the
first instance of quality drop, below the
specified score in the read. RemPhix
removes reads that match an Illumina
Select the database of choice control genome called Phix. This ensures
that only reads originating from the sample
for taxonomy assignment. pass through.
Quality control:
The quality score of the raw sequences can be viewed on the Parameter
Settings page to help adjust the parameters.

Typically any reads dropping below a

quality score of 30 are considered to be
low quality and are trimmed.

Forward reads tend to have better quality

profiles than reverse reads.

For the forward reads (left panel) the

quality drops off slightly at the end and
so we will set the forward trunc length
as 240.
For the reverse reads (right panel) the
quality drops off around 170 cycles and
so the reverse trunc length should be
set as 170.
Note: In order to ensure overlap of forward and reverse reads, the
trunc length parameters depend on the type of primer used. Refer
to the “other considerations section on slide 4.
Parameter optimisation:
• Do your results have very few reads passing through? Consider changing the following parameters:
o For multi-V-regions such as V3-V4, the overlap of merged reads is determined as follows:
o For 2x250bp, 16S-341F and 16S-805R primers of the V3-V4 region,
(forward read) + (reverse read) - (length of amplicon) = overlap
250 + 250 - (805-341) = 36
o If the forward read is truncated at 240 and reverse read is truncated at 150,
240 + 150 - 464 = -74 (No overlap!!!!)
o Thus the parameters should be adjusted accordingly to ensure an overlap of >20nt.
o For the V4 region, there is usually less variability and the parameters can be directly based off the quality
profiles.
o For more information visit- https://forum.qiime2.org/t/merging-quality-control-and overlapping/12618/2ps://forum
.qiime2.org/t/merging-quality-control-and-overlapping/12618/2
• Do you still find very few reads passing through? Consider increasing the Max EE parameter which would allow
less stringent filtering, especially for reverse reads. E.g.: Max EE of reverse= 5

• Is the percentage of chimera removal >25%? Check if all non-biological nts such as adapters and primers were
removed properly. Consider trimming your sequences more using the Trim parameters. If the chimera removal is
still high but the number of reads passing through are sufficient, you could consider moving ahead with the
results. More information - https://forum.qiime2.org/t/loss-of-reads-after-dada2-as-chimeras/9503/2
Job Status Tracking:

The job may take some time to

complete, so click “Create
Track the processing Bookmark URL” to save the job
status here. The job link to check the job status at a
status will update later time.
here in real-time.

Note: Keep only one

active web page open.
Multiple tabs/windows will
interfere with each other,
Once the job is
leading to unpredictable completed, click
results proceed.
Track reads through the pipeline
Result:
Summary of denoising
Take a look at the % of
and chimera removal
chimera removal. Refer
results.
to the “parameter
optimization” slide if
this is >25%.

Check taxonomy annotation here. It is common to have

lesser assignment at the Species level with 16S sequencing.
Input files for
MDP module of
MicrobiomeAna
lyst.

Click here to directly

go to the maker data
profiling module for
downstream
analysis
The End
For more information, visit Tutorials, Resources
and Contact pages on www.microbiomeanalyst.ca
Also visit our forum for FAQs on www.omicsforum.ca

Qualimap 1.0: Installation & Usage Guide
No ratings yet
Qualimap 1.0: Installation & Usage Guide
35 pages
Extracted From Onechannelgui Vignettes.: Figure 1: Microarray Analysis Pipe-Line
No ratings yet
Extracted From Onechannelgui Vignettes.: Figure 1: Microarray Analysis Pipe-Line
35 pages
Illumina Sequencing Introduction
No ratings yet
Illumina Sequencing Introduction
12 pages
12th Biology Investigatory Project
100% (1)
12th Biology Investigatory Project
17 pages
Heredity & Reproduction Basics
No ratings yet
Heredity & Reproduction Basics
14 pages
Chemoface User Guide
No ratings yet
Chemoface User Guide
22 pages
High Pure Plasmid Isolation Kit
No ratings yet
High Pure Plasmid Isolation Kit
17 pages
CLC Genomics Workbench User Manual Subset
No ratings yet
CLC Genomics Workbench User Manual Subset
222 pages
NBT 1710-S1
No ratings yet
NBT 1710-S1
135 pages
Novogene Amplicon Standard Analysis DEMO REPORT
100% (1)
Novogene Amplicon Standard Analysis DEMO REPORT
37 pages
Introduction To Differential Gene Expression Analysis Using RNA-seq
No ratings yet
Introduction To Differential Gene Expression Analysis Using RNA-seq
97 pages
Manual Thermo Fisher Cloud RQ
No ratings yet
Manual Thermo Fisher Cloud RQ
94 pages
Homer: Mapping Reads To The Genome
No ratings yet
Homer: Mapping Reads To The Genome
5 pages
XCAL Quan UG
No ratings yet
XCAL Quan UG
170 pages
3 RNAseq-Mapping LO
No ratings yet
3 RNAseq-Mapping LO
98 pages
Lab 8 Homepage
No ratings yet
Lab 8 Homepage
4 pages
Intro 2 RNAseq
No ratings yet
Intro 2 RNAseq
98 pages
2020 Book AdvancesInBioinformaticsAndCom
No ratings yet
2020 Book AdvancesInBioinformaticsAndCom
284 pages
Genomics For Beginner
No ratings yet
Genomics For Beginner
9 pages
Integromics QPCR Statistics White Paper
No ratings yet
Integromics QPCR Statistics White Paper
9 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
MAN0010408 QuantStudioDesign Analysis Desktop Software UG
No ratings yet
MAN0010408 QuantStudioDesign Analysis Desktop Software UG
110 pages
Chapter 3 Inspection of Sequence Quality PDF
No ratings yet
Chapter 3 Inspection of Sequence Quality PDF
18 pages
Galaxy Nanopore
No ratings yet
Galaxy Nanopore
11 pages
TFS-Assets LSG Manuals MAN0010408 QuantStudioDesign Analysis Desktop Software UG
No ratings yet
TFS-Assets LSG Manuals MAN0010408 QuantStudioDesign Analysis Desktop Software UG
115 pages
Iqf 2015 3375
No ratings yet
Iqf 2015 3375
6 pages
Ultra-Fast FASTQ Preprocessor: fastp
No ratings yet
Ultra-Fast FASTQ Preprocessor: fastp
7 pages
NGS QC
No ratings yet
NGS QC
10 pages
What Does A FASTQ File Look Like?
No ratings yet
What Does A FASTQ File Look Like?
7 pages
3 Metagenomics Exp Design and Data Analysis - PPinto
No ratings yet
3 Metagenomics Exp Design and Data Analysis - PPinto
23 pages
Nihms 683195
No ratings yet
Nihms 683195
30 pages
Ion Library TaqMan Quantitation Kit User Guide (Pub. No. MAN0015802 D.0)
No ratings yet
Ion Library TaqMan Quantitation Kit User Guide (Pub. No. MAN0015802 D.0)
18 pages
Colon Report
No ratings yet
Colon Report
23 pages
Taxonomic Profiling
No ratings yet
Taxonomic Profiling
13 pages
Annotate Microbial Reference Data
No ratings yet
Annotate Microbial Reference Data
20 pages
Analysis of SARS-CoV-2
No ratings yet
Analysis of SARS-CoV-2
11 pages
Hisat 2
No ratings yet
Hisat 2
7 pages
Bioinformatics Lab: ASFV Analysis
No ratings yet
Bioinformatics Lab: ASFV Analysis
8 pages
16S RRNA Gene Analysis With QIIME2
100% (1)
16S RRNA Gene Analysis With QIIME2
17 pages
Bioinformatics for Biology Students
No ratings yet
Bioinformatics for Biology Students
16 pages
生工生物宏全基因组测序项目分析报告模板 (01 36)
No ratings yet
生工生物宏全基因组测序项目分析报告模板 (01 36)
36 pages
Diniz Et Al - 2018
No ratings yet
Diniz Et Al - 2018
30 pages
MMCB QPCR Primer Efficiency DataAnalysis 2024
No ratings yet
MMCB QPCR Primer Efficiency DataAnalysis 2024
47 pages
The Virus Misconception Explained
100% (4)
The Virus Misconception Explained
14 pages
Pre-Test Answers
No ratings yet
Pre-Test Answers
12 pages
Golden DPP of Biotechnology Principles and Processes by Garima Mam
No ratings yet
Golden DPP of Biotechnology Principles and Processes by Garima Mam
5 pages
Folia Primatologica 2019 Lagothrix Lagotricha Tschudii
No ratings yet
Folia Primatologica 2019 Lagothrix Lagotricha Tschudii
25 pages
Lecture 28 Unit6 1
No ratings yet
Lecture 28 Unit6 1
16 pages
RNA-Seq Data Analysis with R Guide
No ratings yet
RNA-Seq Data Analysis with R Guide
76 pages
3 1000000135989 v03 NextSeq 10002000 Run Monitoring
No ratings yet
3 1000000135989 v03 NextSeq 10002000 Run Monitoring
49 pages
10.1007@s42161 021 00787 4
No ratings yet
10.1007@s42161 021 00787 4
9 pages
Lab02 - Reading Results
No ratings yet
Lab02 - Reading Results
16 pages
The Human Genome Project (HGP)
No ratings yet
The Human Genome Project (HGP)
2 pages
Sequencing 101 Ebook 03082023
No ratings yet
Sequencing 101 Ebook 03082023
22 pages
X. Li, Y. Zhao, X. Tu Et Al. Plant Diversity XXX (XXXX) XXX
No ratings yet
X. Li, Y. Zhao, X. Tu Et Al. Plant Diversity XXX (XXXX) XXX
1 page
16S Illumina Library Prep Guide
No ratings yet
16S Illumina Library Prep Guide
11 pages
11 Molecular Marker and Its Applications
No ratings yet
11 Molecular Marker and Its Applications
43 pages
2yrs Mca Sem3
No ratings yet
2yrs Mca Sem3
9 pages
CentoCancer® Comprehensive With Genes 1
No ratings yet
CentoCancer® Comprehensive With Genes 1
5 pages
Manual
No ratings yet
Manual
12 pages
Falco
No ratings yet
Falco
22 pages
Analysis Results
No ratings yet
Analysis Results
29 pages
Balancing Genomic Selection Efforts For Allogamous Plant Breeding Programs
No ratings yet
Balancing Genomic Selection Efforts For Allogamous Plant Breeding Programs
10 pages
NGS Data Analysis
No ratings yet
NGS Data Analysis
4 pages
Metabarcoding Software Overview
No ratings yet
Metabarcoding Software Overview
18 pages
2023 Genomics Question Bank Ans
No ratings yet
2023 Genomics Question Bank Ans
39 pages
Metabarcoding Protocol
No ratings yet
Metabarcoding Protocol
8 pages
Microbial Data Analysis
No ratings yet
Microbial Data Analysis
786 pages
NGS Data Formats & Quality Control
No ratings yet
NGS Data Formats & Quality Control
21 pages
Vsearch Manual
No ratings yet
Vsearch Manual
58 pages
Genome Sequencing Scenario in Bangladesh Original Copy New
No ratings yet
Genome Sequencing Scenario in Bangladesh Original Copy New
9 pages
Multivariate Data Analysis Mvda For The Beginner 2021 en B 1 Data
No ratings yet
Multivariate Data Analysis Mvda For The Beginner 2021 en B 1 Data
39 pages
Inii
100% (1)
Inii
5 pages
Toldo 2009
No ratings yet
Toldo 2009
3 pages
EBTY348L - Comp Genomics Lectures - Even Sem - 2024-25 - Set 2
No ratings yet
EBTY348L - Comp Genomics Lectures - Even Sem - 2024-25 - Set 2
29 pages
Data Anatara SPECIES
No ratings yet
Data Anatara SPECIES
38 pages
Maintenance of Propeller Shafts and Double Jointed Shaftsfg
No ratings yet
Maintenance of Propeller Shafts and Double Jointed Shaftsfg
6 pages
Helicobacter Pylori PHD Thesis
100% (3)
Helicobacter Pylori PHD Thesis
12 pages
Sequencing Quality Control
No ratings yet
Sequencing Quality Control
104 pages
Molecular Biology Course Guide
No ratings yet
Molecular Biology Course Guide
13 pages
TRHGNHVCHMV
No ratings yet
TRHGNHVCHMV
13 pages
M.SC Transcriptome Analysis 2025
No ratings yet
M.SC Transcriptome Analysis 2025
21 pages
Intro To RNA-seq Concepts
No ratings yet
Intro To RNA-seq Concepts
85 pages
Unit 3 Biotechnology
No ratings yet
Unit 3 Biotechnology
51 pages
Beer Microbiology with Nanopore Tech
No ratings yet
Beer Microbiology with Nanopore Tech
6 pages
Get Genome Sequencing Technology and Algorithms 1st Edition Sun Kim Free All Chapters
100% (3)
Get Genome Sequencing Technology and Algorithms 1st Edition Sun Kim Free All Chapters
51 pages
G4™ Best Practices and Quality Control Guide
No ratings yet
G4™ Best Practices and Quality Control Guide
11 pages
Top 100 Biotech Universities in The World
No ratings yet
Top 100 Biotech Universities in The World
109 pages
Dario Amodei - Machines of Loving Grace
No ratings yet
Dario Amodei - Machines of Loving Grace
32 pages
Reads Filter-1
No ratings yet
Reads Filter-1
38 pages
Forensic DNA Profiling A Practical Guide To Assigning Likelihood Ratios, 1st Edition Full-Feature Download
No ratings yet
Forensic DNA Profiling A Practical Guide To Assigning Likelihood Ratios, 1st Edition Full-Feature Download
17 pages
Microfluidics and Biomems Devices and Applications 1st Edition Tuhin Subhra Santra Download
100% (1)
Microfluidics and Biomems Devices and Applications 1st Edition Tuhin Subhra Santra Download
69 pages
Bioinformatics Tools For Analysing Viral Genomic Data
No ratings yet
Bioinformatics Tools For Analysing Viral Genomic Data
16 pages
02-05-PreAlignment QC - Griffith Lab
No ratings yet
02-05-PreAlignment QC - Griffith Lab
3 pages
SDA Journal
No ratings yet
SDA Journal
19 pages

Tutorial Raw

Uploaded by

Tutorial Raw

Uploaded by

MicrobiomeAnalyst 2.

• Other considerations for paired-end data:

• A metadata file is necessary

Submit to try our example

For paired-end data cross-

This is used to trim low quality

MaxN determines the number of

Typically any reads dropping below a

Forward reads tend to have better quality

For the forward reads (left panel) the

The job may take some time to

Note: Keep only one

Check taxonomy annotation here. It is common to have

Click here to directly

You might also like