Chapter 2
Chapter 2
41
2.1 Parasite and snail maintenance
The species of schistosome used throughout this thesis was S. mansoni. For
Chapter 3 & 4 a parasite strain originally recovered from Puerto Rico were used.
Livers from S. mansoni infected (6-7 weeks p.i.) mice were soaked in sterile 1.2%
temperature (RT) for 10 min. Livers were then transferred to a plasic beaker,
and about 20 ml of sterile 1.2% NaCl solution added. A blender (Bosch MSM
6B150) was used to homogenise the livers for 3 min at the lowest speed and the
resulting liquid then filtered through two sieves with a pore size of 180 µm, to
remove larger liver particles, and then 45 µm to collect eggs. Eggs were collected
in the lower sieve and further washed with 1.2% NaCl solution. The eggs were
then transferred to a volumetric flask with a narrow neck and the flask filled
with Lepple water (see below). After incubating for 1 hour at 28°C, all but the top
2 cm were covered in aluminium foil and the flask incubated for another 1 hour
period. Next, miracidia were collected from the top layer of the solution. Five to
six miracidia were used for mixed sex snail infections. For infections small snails
together with 1 ml of Lepple water. The miracidia were then carefully added to
each well and left to infect the snail for 2 hours. Next the snails were placed back
42
into an aquarium for 5 weeks when snails were ready to be used to obtain
cercariae.
Lepple water was usually prepared as a 10x concentrated solution and diluted
To obtain mice infected with male or female schistosomes only, 50 snails were
infected with single miracidia. In the four weeks following infection, the snails
were exposed to light for two hours to shed clonal, single sex, cercariae (Tucker
et al., 2013). Approximately 500 cercariae from each snail, as well as single male
and a single female adult worm (positive controls) were processed for DNA
extraction with a DNA Mini kit (Qiagen, 51304) PCR was used to identify sex-
rhodopsin gene as positive control gene (Lepesant et al., 2012). The following
W2 primers:
PCR reactions were assembled using 10 µl of HiFI Ready Mix (Kapa, KK2602), 2.5
µl of genomic DNA solution, 1.0 µl of the primer mix (10 µM) and 6.5 µl of ddH20
for a total of 20 µl. After denaturing the DNA for 5 min at 95°C the samples then
The PCR products were run on an 1% agarose gel in TBE buffer at 80 V and 100
well as W2-specific PCR reaction, whereas male samples only had an amplicon in
44
Adult ♂ - Rho
Adult ♀ - Rho
Cerc 1 – Rho
Cerc 2 - Rho
Adult ♂ - W
Adult ♀ - W
1 kb ladder
Cerc 1- W
Cerc 2- W
1000
500
250
Cerc 3 – Rho
Cerc 4 – Rho
Cerc 5 – Rho
Cerc 6 - Rho
1 kb ladder
Cerc 3- W
Cerc 4- W
Cerc 5- W
Cerc 6- W
1000
500
250
Figure 2.1: The sex of cercariae from single-miracidium infections was determined by PCR.
Male S. mansoni samples do not have an amplicon in the W chromosome specific region, whereas
female S. mansoni samples do have an amplicon for the W specific reaction. The Rho reactions
serve as a positive control and should have an amplicon regardless of sex. The cercariae from
four snails (Cerc 1, 3, 4 and 5) were found to be female, the other (Cerc 2 and 6) male.
To determine whether single sex infections with only male or only female worms
had been achieved, the livers of sacrificed mice were blended as described above.
(Bosch MSM 6B150) for 3 min at the lowest speed. The resulting liquid was then
filtered through two sieves with a pore size of 180 µm, to remove larger liver
particles, and then 45 µm to collect eggs. If any eggs were found, the
corresponding mouse was classed as having had a mixed sex infection, otherwise
(if any worms had been recovered from the mouse by perfusion) the mouse was
45
2.1.4 Collection of cercariae
Infected snails were kept in a dark cabinet until cercariae were required. Then
they were moved into small glass beakers with enough Lepple water to cover all
snails and left under a bright light for one hour allowing the cercariae to emerge
from the snails. After one hour, the water was carefully poured into 50 ml falcon
tubes and the snails transferred back into their tanks in the dark cabinet.
Cercariae numbers were estimated by taking the average count from five 10 µl
aliquots of the cercariae. The cercariae were killed with Lugol’s iodine solution
All animal experiments were conducted under Home Office Project Licence No.
80/2596. All protocols were presented and approved by the Animal Welfare and
Ethical Review Body (AWERB) of the Wellcome Trust Sanger Institute. The
BALB/c mice were infected with about 250 mixed sex or single sex cercariae via
worms residing in the blood system. The worms used in Chapter 5 were grown
46
Chapter 5 was performed by the group of Prof. Dr. Christoph Grevelding (Justus-
experiments were set up, one with a total duration of seven days, and one with a
DMEM media (see below) in a dark incubator at 37°C with 5% CO2. In the case of
the seven day incubation, the media was replaced every two days (see RNAi
DMEM media and Basch media on worm fertility was tested. The following
Supplemented DMEM
47
Basch media
In Chapter 4 worms were kept in vitro for seven days and Basch media (Basch,
1981). This media was used to was used to maintain female maturity as much as
were again maintained in an incubator at 37°C with 5% CO2 and half the media
was replaced every two days. Modified Basch media was prepared as described
Directly prior to use, 1% (by volume) horse red blood cells (RBCs) were added to
the media. For this whole blood (ThermoFisher, SR0050) was first centrifuged at
4°C and 500 g for 10 min, then serum and the upper layer of blood were
The isolation of gonads was performed by the Dr. Zhigang Lu from the group of
48
2.2 Molecular biology techniques
2.2.1 Cloning
length was chosen and primers designed manually to clone that fragment. The
PCR reactions. Total RNA was isolated from adult worms using the Trizol
reagents (Invitrogen, 15596026) (see 2.2.3). cDNA was synthesised using the
incubated for 2 min at 25°C before 1 µl of SuperScript® III was added. Secondly,
samples were incubated at 25°C, 10min; 50°C, 60 min; 70°C, 15min. cDNAs were
49
Boule (Smp_144860) primers:
Target genes were amplified from the cDNA using 35 cycles of PCR (95°C, 5 sec;
variable temperature, 5 sec; 72°C, 30 sec) and finally 1 min at 72°C. The
annealing temperature was generally set to be 5°C lower than the higher melting
PCR products were then run on a 1% agarose gel in TBE buffer for 40 min at 100
V and 120 mA to confirm product size. The PCR reaction was then cleaned up
manufacturer’s instructions.
The required volumes of insert and vector (3000 bp; 50 ng/µl) necessary for
ligation at a 1:1 and 3:1 ratio were determined using Promega’s Bio calculator.
http://www.promega.com/a/apps/biomath/index.html?calc=ratio
The pGEM-T easy vector system (Promega, A1360) was used following
manufacturer’s instructions.
50
The reactions were then incubated for 1.5 h at RT (rather than 1 h according to
the protocol). Luria-Bertani (LB) Ampicillin (0.1 mg/mL) agar plates were
Next, 950 µl Super Optimal broth with Catabolite repression (SOC) medium
samples were then placed in large 50 ml Falcon tubes and incubated at 37°C
(shaking at 200 RPM) for 75 min. Transformed bacteria were removed from the
incubator and centrifuged for 2 min at 1000 x g. 900 µl of clear supernatant was
suspended cells were plated onto the prepared LB agar plates and incubated at
37°C overnight.
This method relies on RNA molecules that are complementary to the target
“knocked down”..
51
The genes of interest, CD63a (Smp_173150) and CD63R (Smp_155310), as well
as a positive control, TSP-2 (Smp_181530), were cloned (see 2.2.1) into the
pGEM-T easy vector (Promega, A1360). The zebrafish (Danio rerio) gene
primers were designed that had both a gene specific sequence (same as the
FWD: TAATACGACTCACTATAGGGATGTGTACTGTCGTATTGAGATTAAC
REV: TAATACGACTCACTATAGGGGAACAACAAATTTGGCATC
FDW: TAATACGACTCACTATAGGGATGGCCTCTTTAAGCTGTGG
REV: TAATACGACTCACTATAGGGCAGGGATTGTTTGTCCACTC
FWD: TAATACGACTCACTATAGGGTGATTGTGGTTGGTGCACTT
REV: TAATACGACTCACTATAGGGGACCAATGCGAACAGAAACA
FWD: TAATACGACTCACTATAGGGTCTTTCATGCAGGCCACAGT
REV: TAATACGACTCACTATAGGGCCGGAACCAACCCGATTACA
All primers contain a T7 binding site (underlined). The PCR was performed
52
AM1626), first using the annealing temperatures (see above) of the gene-specific
region of the primers for 5 cycles and then increasing the annealing temperature
by 5°C for a further 30 cycles. The template was purified using a PCR clean up kit
(Qiagen, 28104) following the manufacturer’s instructions and was then used as
input for the “MEGAscript T7 dsRNA” kit. Incubation of the in vitro transcription
reaction was performed for 4 hours, and then the dsRNA was isolated from the
reactions using the manufacturer’s instructions. The kit used a column based
method were the dsRNA is first precipitated with salts and ethanol, then bound
to a column, washed several times and finally eluted. Length and concentration
of the dsRNA was confirmed on the NanoDrop 1000 (ThermoFisher) and the
Agilent tapestation.
dsRNA soaking
modifications. Briefly, worms were perfused from mice at 49 d.p.i. and paired
couples cultured in Basch media while being soaked for seven days in 10 µg/ml
dsRNA. The following four dsRNA treatment groups were used: CD63R, CD63a,
TSP-2 (positive control) and a negative control RNA (see above). Media was
replaced every two days and new dsRNA added. After seven days worms were
transferred into Trizol (Invitrogen, 15596026) for RNA extractions and the laid
dsRNA electroporation
53
DMEM. Then, 20 µl of dsRNA solution (1 µg/ml) was added to the tube and
incubated for 15 min at RT. The worms and the dsRNA solution were then
20 msec square wave pulse was delivered to the worms using the Bio-Rad Gene
Pulser Xcell system at 125 V; worms were then transferred into pre-warmed
Basch medium (37°C) and cultured in vitro for 48 h. After 48 h the worms were
transferred into Trizol (Invitrogen, 15596026) for RNA extractions and their
80°C until further processing. Once ready, the samples were left to thaw on ice
and then transferred to MagNA Lyser Green Beads tubes (Roche, 03358941001),
Biomedicals) and homogenised three times for 20 sec at intensity 5, with 5 min
rest on ice between runs. Following homogenisation, samples were incubated for
volumes of chloroform:isoamyl alcohol (24:1) (Sigma, C0549) was added and the
Following centrifugation, the aqueous phase was transferred into a fresh RNase
free micro centrifuge tube. Total RNA was isolated from the aqueous phase using
54
the RNA Clean & Concentrator kit (Cambridge Bioscience, R1019) following
manufacturer’s instructions. Samples for qPCR were treated with DNase in the
isolation columns as described in the kit protocol. Total RNA was eluted in 22 µl
of elution buffer and checked for degradation using an Nano Chip (Agilent, 5067-
quality if one sharp ribosomal RNA peak could be seen. Samples were then
Figure 2.1: Example of good quality extracted total RNA. The sharp peak of
RNA at 2000 nucleotides (nt) is the ribosomal RNA peak.
cDNA was used for cloning as well as qPCR. 11 µl of DNase treated, full length
total RNA was used as input. 1 µl dNTPs (10 mM for each nucleotide) (NEB,
SO131). Samples were denatured for 5 min at 65°C and then cooled on ice. Then,
55
25°C, 10 min; 50°C, 60 min; 70°C, 15 min and then stored at 4°C until further
processing.
2.2.5 qPCR
Table 2.1) for the target genes cd63r, cd63a and tsp-2. Settings were chosen to
return primers providing approximately 100 bp long amplicons that did not
overlap with the dsRNA probes used for RNAi (2.2.2). psmd4 (Smp_090340) was
(Smp_069770) and gapdh (Smp_056970). Liu et al. (2012) found the house
Table 2.1: List of qPCR targets and their forward and reverse primers. tsp-2
primer sequences were first described by Tran et al. (2010). Primer efficiencies
were calculated using a standard curve.
56
Primer efficiencies were determined using a dilution series of primer
The qPCR was performed using KAPA SYBR FAST universal qPCR kit (KAPA
and a control with total RNA instead of cDNA to detect genomic DNA
contamination. If genomic DNA has also been extracted during the RNA
extraction, it could cause a false positive signal; the total RNA used in this control
was not reverse transcribed and therefore should not allow for PCR
amplification.
(KAPA Biosystems, 2016) (denaturation: 95°C, 3 min; then 40 cycles of: 95°C, 1
sec; 60°C, 20 sec). The amplification data was analysed and plotted manually in
Microsoft EXCEL (v14.2.3) using the ΔΔCt method (Livak & Schmittgen, 2001).
Initially the relative difference in expression between the internal reference and
target gene was measured for both treated and control samples:
57
For multiple reference genes the average Ct value was calculated:
Fold-change = 2 ^ ΔΔCt
To measure the efficiency of DIG labelling in the RNA probe, I used a dot blot
linked using UV light. The membrane was washed with maleic acid buffer,
blocking buffer. Samples were developed until the desired signal was reached.
Dallas, Texas). For the in vitro synthesis of DIG-labelled RNA probes, the insert of
the pGEM-T easy plasmid (Promega, A1360) was amplified using M13 primers
58
(Sigma, P3098; Sigma, P2973). The amplicon included the insert flanked by a T7
control) and an anti-sense probe (95°C, 5 sec; 50°C 5 sec; 72°C, 30 sec for 35
The amplicon was then purified using a PCR purification Kit (Qiagen, 28104) and
the DNA template concentration was measured using the Qubit 2.0 system
(ThermoFisher).
A DIG RNA labelling KIT (Roche, 11175025910) was then used to synthesise
protocol.
The solution was then cleaned up using the Cambridge Bioscience RNA clean and
concentration were determined using the Agilent Tapestation and the NanoDrop
1000 (ThermoFisher). DIG-labeling efficiency was tested using a dot blot test
(see 2.2.9).
1) Preserving specimens
Parasites were collected by perfusion from mice seven weeks after infection with
250 mixed sex cercariae. The worms were then washed with DMEM to remove
host material. Once clean, worms were killed using 0.6 M magnesium chloride
59
Next, the worms were rinsed with PBSTx, placed in 50% methanol in PBSTx for 5
min on a gentle shaker and then transferred to 50 ml tube with 100% methanol
The protocol for whole mount in situ hybridisation was run over 3 days; on day 1
the DIG-RNA probe is hybridised to the target RNA, on day 2 the DIG label is
bound by the antibody and on day 3, the alkaline phosphatase on the antibody is
Samples were rehydrated in 50% methanol in PBSTx for 5min on a gentle shaker
and then for 5 min in PBSTx at RT. Pigments were removed, especially from the
guts and the vitellarian tissue of female worms by placing the worms in
bleaching solution for 1 hour at RT under bright light. Next, samples were placed
in small baskets, transferred to 24-well plates and rinsed in PBSTx. The baskets
were used throughout the protocol to allow quicker and less damaging transfer
shaking. Following proteinase K treatment, the worms were rinsed in PBSTx and
then fixed again in 4% formaldehyde in PBSTx for 10 min at RT. Then, the worms
buffer solution for 5 min at RT while being gently shaken. The washing solution
was then replaced with pre-warmed 100% PreHyb buffer and samples incubated
at 52°C for 2 h while gently shaking. 1 h prior to adding the probe, 1000ng of
DIG-RNA probe was mixed with 500 µl of hybridisation buffer and heated to
60
72°C for 5 min to denature the RNA probe using a heat block (Stuart scientific,
SBH130D). The probe was then allowed to cool slowly to 52°C in the heat block
and was held at that temperature until needed. The PreHyb buffer was then
replaced with the hybridisation buffer with the probe. Next, the samples and
probes were allowed to hybridise overnight at 52°C while being gently shaken.
The hybridisation buffer was removed and samples washed at 52°C with the
citrate (SSC) buffer and 0.1% triton-X (2 x 30 min), and 0.2 x SSC buffer and 0.1%
solution for 2 h at RT, still shaking. Next, samples were transferred to blocking
Next, the samples were washed six times for 10 min at RT in TNT buffer.
Then, samples were developed in alkaline phosphatase (AP) buffer with nitro
11681451001) until the desired signal intensity had been reached. To stop the
reaction, samples were transferred to PBSTx and placed in 100% ethanol for 10-
20 min at RT. Samples were then removed form the ethanol, placed back in
PBSTx for about 5 min and then stored in 80% glycerol in PBS.
61
2.2.8 Imaging of WISH specimen – light microscopy
specimen after staining, images were captured with a Leica DFC 340 FX digital
camera.
cDNA libraries were produced from pools of male or female worms, with the
were used, each representing a biological replicate. In the case of pooled worms,
all worms originated from the same mouse, forming a biological replicate.
100 ng of total RNA was used for each RNA-Seq libraries made from pooled
worms (Chapter 3 & 5). For the cDNA libraries produced from single worms
(Chapter 4) only 50 ng of total RNA was available per sample. Oligo dT beads
(ThermoFisher, 61002) were used to increase the mRNA content of the samples,
polyA tails. The rRNA which is not bound is then be washed off.
62
2.3.2 mRNA fragmentation
mRNA was made up to 200 µl and sheared to around 200 bases using a AFA
sec; NB: Duty cycle refers to the proportion of treatment time during which the
Similarly to the cDNA synthesis step in the cloning section (2.2.1), RNA was used
to create cDNA in this step. However, in this step the mRNA had been isolated
(2.3.1) as well as fragmented (2.3.2) and rather than using oligo-dT primers,
random hexamer primers (NEB, S1330S) were used to reverse transcribe all
dNTP) (NEB, N0446S) were added to11 µl of fragmented RNA. The mixture was
denatured at 65°C for 5 min and chilled on ice. Next, 4 µl of 5x First Strand buffer
inhibitor (40 U/µl; ThermoFischer, 10777019) were added. The sample was
sample was then incubated in a thermocycler (25°C, 10 min; 50°C, 60 min; 70°C,
15 min). This denatures the SuperScript® III enzyme but leaves the RNA/DNA
duplex intact. After this step, the RNA/DNA duplex was cleaned using the
63
2.3.4 Second strand DNA synthesis
Second strand synthesis was performed with dUTP rather than dTTP allowing
the two DNA strands to be differentiated into the sense strand (mRNA sequence
labelled). 22.6 µl of RNA/DNA duplex from the last step were mixed with 3 µl of
buffer 2 (NEB, B7002S), 2 µl dNTP mix (dUTP, dATP; dGTP; dCTP at 10 mM each)
(NEB, E6114), 0.4 µl RNase H (5000 U/ml) (NEB, M0297S) to nick (i.e. create a
break in) the RNA strand, allowing the remaining RNA fragments to act as
primers) and 2 µl of DNA pol I (5000 U/ml) (NEB, M0210S). The sample was
incubated at 16°C for 2.5 h, then held at 4°C. After second strand synthesis, the
sample was cleaned using AMPure XP beads (Agencourt, AQ 60050) (at a ratio of
The Sanger Sequencing Kit I (NEB, E6000B-SS) was used to perform end repair,
dA tailing, adapter ligation and size selection of the cDNA. All steps were
In the next step, the Uracil-specific excision reagent (USER) enzyme (NEB,
M5505S) was used to digest the second strand of all DNA molecules in the
sample. For this, 1 µl of USER enzyme was added to 10 µl of library. The sample
was then incubated in a thermo cycler for 15 min at 37°C, then for 10 min at
64
2.3.6 PCR amplification
After removal of primer dimers and digestion with the USER enzyme, the sample
was amplified to reach a sufficiently high DNA concentration for sequencing. For
1 µl each of PE 1.0 Illumina primer (10mM), Illumina index primer (10 mM), and
water. A unique barcode sequence was added to the cDNAs of each sample using
the pCR primer. This allowed DNA sequences to be assigned to one sample in
silico once sequencing was complete. This tagging allowed for mixing of libraries
introduced by differences across sequencing lane and runs. The sample was then
sec; 60°C, 15 sec; 72°C, 60 sec; and next 72°C, 5 min). Following the PCR, the
samples were size selected, using 0.8x sample volume of Agencourt AMPure XP
2.3.7 Sequencing
sequencing runs (six lanes in total). The biological replicates for each condition
Chapter 5 on the other hand were all sequenced together in one run (two lanes
65
2.4 Bioinformatics
Tophat2 (v2.0.8b) was used to map RNA-Seq data to version 5.2 of the S. mansoni
genome (Protasio et al., 2012). RNA-Seq reads span splice junctions where
introns have been removed from the transcript. Many mapping tools do not take
splicing into account when mapping reads to the genome as they are specialised
for mapping genomic DNA reads. Tophat2 was specifically designed for RNA-Seq
data to allow for optimal mapping of RNA-Seq data to the genome several
loci that are not expressed, therefore the parameter was set to allow only for
uniquely matching reads to be mapped (-g 1). Next, the appropriate library type
using dUTP. Also based on the method of library preparation, the expected
(mean) inner distance between mate pairs was set to 200 (-r 200) and the mate
spliced reads the minimum length of sequence on either side of the splice
junction was set to be at least six bases long (-a 6). Finally, the minimum and
66
RNA-seq data to the S. mansoni genome was provided in Binary sequence
SAMtools (v0.1.19) was used for the processing of mapped RNA-seq data in BAM
format (Li et al., 2009). All BAM files were sorted, and the two BAM files (one for
HTSeq (v0.5.4) was used to summarise the mapped data and produce a list of
read counts per gene for each sample (Anders et al., 2015). Using a file in Gene
Transfer Formate (GTF) as reference for gene boundaries, HTSeq takes into
account the strandedness of the data, only counting reads in the orientation of
the mRNA and not of antisense RNA. Reads were also only counted for the
longest splice variant of each gene to avoid complications in regions where the
RNA-Seq data was analysed in all three results Chapter (3, 4 & 5) to identify
Team, 2015) using DESeq2 Love et al. (2014). DESeq2 was created specifically to
address the challenges, i.e. small numbers of replicates, large dynamic range and
67
the presence of outliers within the replicates (Love et al., 2014), that arise when
Using the output of HTSeq, the number of unambiguously mapped reads per
gene, a count matrix is created containing the read count for each gene in each
sample. DESeq2 fits a Generalised Linear Model (GLM) for each gene. It models
the read counts to follow a negative binomial distribution and calculates a size
factor using the median-of-ratios method described in DESeq (Love et al., 2014).
This corrects for the depth of sequencing across the different samples, allowing
curve through the distribution of dispersion. DESeq2 then “shrinks the gene-
wise dispersion estimates toward the values predicted” to obtain final dispersion
values.
One of the biggest challenges when calculating fold changes using HTS data is the
strong variance especially for genes with low read counts where the signal to
68
noise ratio is less favourable. DESeq2 shrinks log fold-change (LFC) estimates
towards zero so that shrinkage is stronger for genes with low read counts, high
another round of GLM fitting, and the corrected estimates are kept as final LFC
estimates.
reduce false positive results using the Benjamini and Hochberg method
used by DESeq2 was designed to remove genes from the analysis that have little
excluded from further analysis using criteria independent of the statistics used
strength for filtering, removing lowly expressed genes first. By default DESeq2
removes as many genes from the analysis as necessary to maximise the number
(FDR) value (default 10%). In this thesis the automatic independent filtering was
69
Detection of count outliers
To reduce the impact of outliers on the average distribution of dispersion and log
fold change, DESeq2 detects and removes individual outliers that do not fit the
assumptions of the model such as a replicate with a read count several orders of
magnitude higher than all other samples. To achieve this, DESeq2 uses a
standard outlier diagnostic called Cook’s distance. It measures how much the
GLM for a given gene would be affected, if a particular sample was removed.
However, this could only be done were three or more replicates were available
whole male and female worms as well as isolated S. mansoni gonads - sequencing
mansoni genome (version 5.2) using bowtie (v1.1.0), a fast and memory-efficient
mapping tool for short reads (Langmead et al., 2009), using default settings. A
GTF file that contained the location of each mapped probe in the genome was
created using the genome coordinates for each sequence from the BAM file and a
custom python script. The number of RNA-Seq reads mapped to the microarray
count)“ (2.4.3), using the newly created GTF file as reference. The number of
RNA-Seq reads mapped to the probes was then compared to the normalised
70
signal intensity measured for the probe as reported by Nawaratna et al. (2011).
The correlation coefficient was calculated for the RNA-Seq and microaarray
PCA plots are used here to visualise data, especially the differences between
variables underlying the data set that explain the maximum amount of data
variance with as few principal components as possible, with the aim to visualise
plot, the matrix of normalised counts created in DESeq2 was used. A regularised
log transformation was performed on the data matrix. This has a variance
for rows with small counts. DESeq2 then provided a function to create the PCA
2.4.7 Heatmaps
The R package “pheatmaps” (V1.0.8) was used to draw heatmaps (Kolde, 2015).
To produce a heatmap, the program then calculates a Z-score, i.e. the number of
standard deviations a data point is from the mean, which allows for better
scaling across all samples than for example plotting the log-fold change. K-means
71
2.4.8 Gene Ontology (GO) term enrichment
genes, topGO (Alexa et al., 2006) was used. GO terms provide a hierarchical
the biological process in which they are involved (GO Consortium, 2004). The
adjusted p-value of < 0.01) in a particular condition were used as input for topGO
and GO terms were considered significantly enriched if their p-value was < 0.05.
The program InterProScan 5.0.7 (Quevillon et al., 2005; Zdobnov & Apweiler,
2001) was used to identify conserved protein domains. The sequences of all
annotated Schistosoma mansoni proteins were used as input. This produces a list
of all matches between the provided sequences and annotated protein domains,
including Pfam matches. From the output, all Pfam (Finn et al., 2014) domains
were used that were identified with high confidence (p-value < 0.01). Duplicate
domains were also removed, i.e. those that occurred more than once in a protein,
all domains, only found in a single S. mansoni protein were removed because no
72
Using a custom python script this information was combined with the results of
domains. The script created a table as output that was opened using Microsoft
was found more frequently than the average domain. This resulted in excluding
domains which were actually depleted in a sample, rather than enriched. For
examples it might exclude the egg shell synthesis domains from being shown as
The Kyoto Encyclopedia of Genes and Genomes (KEGG) database online resource
(Kanehisa & Goto, 2000) was used to provide pathway information, as they have
html files of all KEGG entries of S. mansoni genes were downloaded and the
corresponding GeneDB IDs as well as KEGG pathways associated with the gene in
question were extracted from the files. Using this information, a table containing
all S. mansoni genes and all pathways that the given gene belongs to was created.
In total 2046 genes with KEGG pathway annotation were identified. In total
73
After DESeq2 analysis of RNA-seq data, a python script was used to count the
number of DEGs found to belong to each pathway. The script would create a
table that can be opened by Microsoft Excel to calculate if the number of genes
significantly enriched if p < 0.05 and if it was found more frequently that the
actually depleted, rather than enriched were excluded from the analysis, as they
DESeq2 analysis.
KEGG pathway. Across all pathways, about 37% of annotated genes were up-
On the other hand, in SS females, 1939 DEGs were identified, but none of the 110
about 14% of the genes associated with KEGG pathways were up-regulated in SS
74
significantly different from the expected (14%) DEGs in SS females (p-value =
pathway is lower than expected by change (it is in fact 0). As a result the
Two R packages were used to cluster RNA-Seq data by the pattern of gene
expression across different samples: MBCluster (Si et al., 2013) and Kohonen
normalised for library size as input; the size factors used to correct for library
size were provided by DESeq2 (see 2.4.4). MBCluster models the data to follow a
to estimate model parameters and divides genes into groups, or clusters, with
was designed to cluster genes according to their expression profile (i.e. the
75
data sets using unsupervised learning. It uses data that has undergone
was then processed to have a mean expression of zero for each gene. This allows
point across the map, but then trains the map until it reaches an optimal
The annotation of DEGs was checked using a combination of BLAST, against gene
apoptosis-related genes was compiled. These genes were identified from the
literature (Lee et al., 2011; Lee et al., 2014; Peng et al., 2010), as well as by using
elegans gene sequences were used to BLAST against all S. mansoni genes on the
Tool, RNA-Seq evidence was used to improve predicted gene models of the
exons that had not been previously annotated, as well as the exon-intron
76
2.5 Scanning Electron Microscopy
Freshly perfused worms were washed in PBS and then fixed in 2.5%
steps were performed by Dave Goulding, WTSI, Hinxton. The samples were
dehydrated in an ethanol series (30%, 50%, 70%, 90% and 100%). Then a
critical point drying was performed in a Bal-Tec CPD030 and specimens were
mounted on aluminium stubs with silver dag. Finally samples were coated with a
electron microscope.
77