Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views10 pages

Moving Profiling Spatial Proteomics Beyond Discrete

The document discusses advancements in spatial proteomics, emphasizing the importance of understanding protein localization and dynamics for elucidating protein function. It critiques current analytical approaches and suggests Bayesian modeling as a superior method for addressing challenges in multi-localization and dynamic relocalization of proteins. The authors advocate for the development of robust statistical frameworks to enhance the utility of spatial proteomics in studying protein behavior in various cellular states.

Uploaded by

Yanbo Pan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

Moving Profiling Spatial Proteomics Beyond Discrete

The document discusses advancements in spatial proteomics, emphasizing the importance of understanding protein localization and dynamics for elucidating protein function. It critiques current analytical approaches and suggests Bayesian modeling as a superior method for addressing challenges in multi-localization and dynamic relocalization of proteins. The authors advocate for the development of robust statistical frameworks to enhance the utility of spatial proteomics in studying protein behavior in various cellular states.

Uploaded by

Yanbo Pan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

REVIEW

www.proteomics-journal.com

Moving Profiling Spatial Proteomics Beyond Discrete


Classification
Oliver M. Crook, Tom Smith, Mohamed Elzek, and Kathryn S. Lilley*

cellular state and protein localization.[8]


The spatial subcellular proteome is a dynamic environment; one that can be Altogether, this suggests that to un-
perturbed by molecular cues and regulated by post-translational derstand protein function, we need to
modifications. Compartmentalization of this environment and management interrogate the proportional subcellu-
of these biomolecular dynamics allows for an array of ancillary protein lar distribution of proteins and the
role of post-translational modifications
functions. Profiling spatial proteomics has proved to be a powerful technique (PTMs) in regulating protein localization
in identifying the primary subcellular localization of proteins. The approach dynamics.
has also been refashioned to study multi-localization and localization There is a multitude of experimental
dynamics. Here, the analytical approaches that have been applied to spatial techniques to study proteome localiza-
proteomics thus far are critiqued, and challenges particularly associated with tion, including interactome mapping
using proximity tagging,[9,10] high
multi-localization and dynamic relocalization is identified. To meet some of
throughput microscopy,[11] and quanti-
the current limitations in analytical processing, it is suggested that Bayesian tative mass spectrometry.[12] Here, we
modeling has clear benefits over the methods applied to date and should be focus on profiling spatial proteomics,
favored whenever possible. Careful consideration of the limitations and a high-throughput mass spectrometry-
challenges, and development of robust statistical frameworks, will ensure that based technique to establish protein
profiling spatial proteomics remains a valuable technique as its utility is subcellular localization in cells by
quantifying protein abundance within
expanded. subcellular fractions created by biochem-
ical fractionation such as centrifugation
or detergent solubility. The principle of
1. Protein Subcellular Localization is a Key this approach is that proteins from the same subcellular niche
will share a distinct abundance profile across the fractions.[13]
Component of Function
Protein localization can then be determined using semi-
Correct subcellular localization affects protein function. This in- supervised analyses and prior information regarding sets of
cludes the availability of binding partners, cofactors, and sub- marker proteins from a limited selection of subcellular niches.
strates, as well as the presence of regulatory factors such as This has proved to be a very powerful and flexible technique
kinases.[1] The interrogation of localization is therefore a neces- but, to date, it has largely been limited to analyzing the primary
sary step to elucidate function. The localization of the proteome protein localization under a single condition.[14–19] Profiling
is highly dynamic, for example, trafficking of secretory path- spatial proteomics is increasingly being used to map multiple lo-
way components through membrane-bound compartments[2] calizations of proteins, dynamic localization upon perturbation,
and shuttling of proteins, such as transcription factors, be- and the role of post-translational modifications.[20–26] Now is an
tween compartments based on their phosphorylation status,[3–6] opportune moment to reflect on the current paradigm of pro-
in response to activation of signaling pathways.[7] Furthermore, filing spatial proteomics and the inherent technical challenges
there is mounting evidence that many proteins “moonlight” and limitations of these powerful techniques. Here, we review
and perform multiple discrete functions depending on the the challenges associated with profiling spatial proteomics and
recommend formal testing frameworks and modeling with
explicit consideration of limitations of the method as a means to
O. M. Crook, Dr. T. Smith, Dr. M. Elzek, Prof. K. S. Lilley
Cambridge Centre for Proteomics, Department of Biochemistry maximize its utility.
University of Cambridge
Cambridge, UK
E-mail: [email protected]

The ORCID identification number(s) for the author(s) of this article 2. Identifying the Main Localization
can be found under https://doi.org/10.1002/pmic.201900392
© 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH
A single protein copy can only be present in one localization
& Co. KGaA, Weinheim. This is an open access article under the terms of at any given time, although it may transit or traffick within
the Creative Commons Attribution License, which permits use, the cell during its lifecycle from its point of synthesis to the
distribution and reproduction in any medium, provided the original work location(s) where it functions, onto where it is finally degraded.
is properly cited. Since profiling spatial proteomics assays multiple cells, each
DOI: 10.1002/pmic.201900392 containing multiple protein copies, it captures protein copies

Proteomics 2020, 1900392 1900392 (1 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

at different stages of their lifecycle, as well as cells in different


Oliver M. Crook is a Wellcome
cell states. With profiling spatial proteomics, a pool of cells is
funded Ph.D. student in
fractionated into subcellular fractions which contain cellular
the Cambridge Centre for
material from multiple localizations in differing proportions.
Proteomics with Kathryn Lilley
Thus, quantified abundance profiles represent an aggregation of
and is co-supervised by Paul
protein localization across different cell cycle stages and cellular
Kirk in the MRC Biostatistics
states.
Unit. He is keen to work with all
Most proteins have been thought to adopt a single primary
kinds of tricky data generated
localization in which they are resident for most of their life cy-
in system biology applications.
cle. Profiling spatial proteomics methods have largely focused on
He is developing Bayesian
identifying this primary localization. By establishing localization-
approaches to analyzing
exclusive “marker” protein profiling, profiles of proteins with un-
spatial proteomics data, so
known localization can be assigned to their respective primary
we can answer new questions
localizations. This approach was initially used to separate centro-
with the data. Previously, he obtained a Master’s degree
somal proteins from non-specific proteins[27] and rapidly adopted
in mathematics at the University of Warwick focusing on
for cell-wide localization assignment.[14,15] In these pioneer stud-
dynamical systems and infectious disease modeling. When
ies, the proteome coverage remained relatively low and assign-
he is not working, he enjoys cooking and playing the flute.
ment was carried out either by correlating profiles[15] or using
partial least squares-discriminant analysis.[14] Tom Smith is fascinated by the
Improvements in mass spectrometry have enabled deeper complexities of intracellular
proteome coverage and this has facilitated the use of machine macromolecular processes,
learning classifiers which learn the marker profiles for each with a particular interest in the
localization and then assign proteins with unknown localization. function of alternative splicing
The primary or main localization is then defined as the marker and the determinants of RNA
class which best reflects the profile of a given protein, with some and protein localisation. His
filtering process to remove uncertain assignments.[28,29] This is a doctoral studies focused
task to which many algorithms are suited, though, support vector on transgenerational stress
machines (SVM) or neural networks are typically used.[18,19,30] An memories in A. thaliana.
exploratory process of marker selection and detection of unanno- He made the move into
tated niches is usually performed prior to classification to ensure computational biology via a
definition of an optimal set of marker classes. Existing generic fellowship with CGAT (Oxford,
algorithms can be used for this process, including K-means,[31] UK), where he developed and applied bioinformatic tools to
Mclust, hierarchical clustering,[31] DBSCAN,[32] and by visual analyse diverse high-throughput data. In the group of Prof.
inspection through hexbin plots or t-SNE.[33] Alternatively, a Kathryn Lilley (Cambridge, UK), he brings together novel
profiling spatial proteomics-specific method has been developed laboratory and data analysis approaches to interrogate the
based on building an outlier statistic from iterative mixture role of protein and RNA interactions in driving their respective
modeling called phenoDisco.[34] To facilitate all these analyses subcellular localisation.
in a reproducible environment, an extensive R package, pRoloc, Mohamed Elzek is a physician
has been developed to visualize spatial proteomics profiles and scientist with a passion in
reliably infer protein localization using machine learning,[35] everything omics-related, es-
and similar functionality has been added to Perseus.[36] pecially proteomics. He is now
Machine learning-based classification with profiling spatial working on his Ph.D. at Kathryn
proteomics has been hugely successful at expanding knowledge Lilley’s group at the University
of protein subcellular localization. Recent example publications of Cambridge, where his focus
have used SVM to assign 2855 protein groups to 14 subcellular is on giving each protein and
niches in mouse E14TG2a embryonic stem cells;[18] 2423 pro- RNA molecule a postcode
teins into nine membrane organelles in HeLa cells;[20] and 9286 inside the cell through the pro-
proteins to four major localizations across five human cancer cell cess of DNA damage repair. He
lines.[30] received his medical training at
Alexandria University in his
3. Technical Considerations for the Main home country, Egypt. He then went on to gain experience in
Localization Question cancer proteomics research with Dr. Karin Rodland at the
Pacific Northwest National laboratory in the United States
Although here we focus on considerations for studies deter- and at Prof. Reudi Aebersold’s lab in Switzerland.
mining multiple and dynamic localization of the proteome,
workflows designed to return the main location of a protein are
extremely worthwhile. The analysis of resulting data requires
careful attention with perennial challenges associated with the
classification approach taken.

Proteomics 2020, 1900392 1900392 (2 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

In the first instance, the selection of appropriate marker pro- ing spatial proteomics cannot distinguish between these two
teins is a vital step in the classification workflow and the cu- possibilities - contrary to single cell methods, including (amongst
ration of markers is typically dataset dependent to ensure they others) microscopy-based approaches.[11] Furthermore, we can-
properly represent a single localization.[28,29] However, problems not distinguish the root of multi-localization; the process of
can arise when annotating compartments with only a few well pools of proteins dynamically interchanging is fundamentally
annotated marker proteins including, amongst others; endo- different from discrete pools of translated proteins where the
somes, poorly characterized localizations such as cytosolic gran- components never interchange, yet both will simply be observed
ules, and cells with highly specialized organelles and/or with as a multi-localization.
poor annotation.[37]
Independently of the precise manner in which markers are
selected, they inevitably represent a biased sample,[38] with a 3.1.2. Protein Relocalization
skew toward well documented proteins and sub-cellular loca-
tions. Therefore, the accuracy of marker classification does not This may occur in response to internal and external cues. We de-
indicate classification accuracy for non-marker proteins, as is fine these changes in either comparative or time series profiling
sometimes suggested,[21] as non-markers can be expected to be spatial proteomics experiments as instances of differential local-
classified with lower accuracy.[38] ization. Alterations in protein synthesis or protein degradation
Application of SVM for classification purposes has been a may also be observed as changes in localization and hence we re-
very popular approach to date, however, the interpretation of fer to this eventuality as differential localization to avoid any in-
SVM scores requires particular care. The SVM is a discriminative ference of active protein movement. Where a consistent change
model, rather than a generative one, and so SVM scores do not in localization occurs for the majority of protein copies assayed,
represent probabilities. Indeed, the top SVM score for a multi- one may observe a discrete differential localization from one lo-
class classification may not be the most probable localization;[39] calization to another. However, in many cases, the observation
that is, the SVM probabilities need not be consistent. If proba- will be one of proportional changes in localization.
bilities are desired, these can be approximated using, for exam-
ple, quadratic optimization,[39] as others have done in profiling
spatial proteomics studies,[30] but such approximations may be
arbitrarily inaccurate. Alternatively, they can be estimated using 3.1.3. Post-Translational Modifications
additional hold-out data not used to train the classifier, but this is
rarely available given the number of marker proteins per class. These may regulate protein localization. A commonplace ex-
ample is phosphorylation.[4–6] In spatial proteomics studies, re-
searchers typically aim to identify differences in protein localiza-
tion governed by PTM status. A crucial consideration is that we
3.1. The Extended Questions cannot infer whether the modification is specifically regulating
localization as it is also possible that the localization is regulated
While classification algorithms are well established and valuable by other factors and the protein differentially modified accord-
to identify the main localization, they are inadequate to address ing to its localization. Therefore, it is appropriate to avoid assert-
the extended questions of mixed protein localization, differential ing these events as PTM-dependent localization; rather, these are
localization upon cell state perturbation, and the interplay be- best described as concurrent changes in PTM and protein local-
tween post-translational modifications and protein localization. ization.
Below, we set out what we consider to be clear definitions for All of the above questions, in theory, are answerable with cur-
these terms as the intuitive interpretations from a biological rent profiling spatial proteomics techniques. However, address-
standpoint may not match up to what we can assay with profiling ing these questions requires the development of new computa-
spatial proteomics. tional approaches. With these extended questions in hand, we
turn to the important technical considerations for profiling spa-
tial proteomics as we proceed beyond primary localization classi-
3.1.1. Multi-Localized Proteins fication.

These include secretory pathway components which cycle


through membrane organelles[2] and RNA binding proteins 4. Technical Considerations for the Extended
such as HuR that shuttle between the cytosol and nucleus.[40] Questions
Multi-localization is a poorly defined term in profiling spatial
proteomics, with the definition arising from the schema used to Before we address the potential solutions to the extended ques-
assign the primary localization.[19,30] Here, we define a protein tions, we consider the additional technical considerations that
to have multi-localization if it resides in more than one cellular need to be addressed when collecting and analyzing spatial pro-
compartment across the assayed cells. By this definition, the teomics data with these questions in mind (see Table 1).
identification of primary localization for a protein is not, nec- First, as a matter of course, the proportion of features (peptide-
essarily, mutually exclusive with an additional assertion that it spectrum matches, peptides or proteins) with missing values will
has multi-localization. This definition is agnostic to whether necessarily increase as the overall number of samples increases
the multi-localization is within or between cells, since profil- for the more extensive experimental designs required. Here it is

Proteomics 2020, 1900392 1900392 (3 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

Box 1.
Proportional and Relative Quantification

It is not possible to estimate the proportion of protein resident ĉa fa and ĉb fb , to place them on the same scale (for example

in each of its subcellular niches using relative protein quantifica- ĉj = c∕ i ci fji for j = a, b).
tion. To see this, consider the following argument. We have defined the measured profile of a protein with respect
During relative MS quantification there is a sampling rate c which to its true proportion and now consider a protein fd which has
represents the proportion of protein analyzed, relative to the ab- mixed localization between niches a and b, with proportions 𝜋
solute amount of protein in that fraction. This sampling rate c and 1 − 𝜋, respectively. Thus, fd = 𝜋fa + (1 − 𝜋)fb and so when
is corrupted by the loss of material as a result of protein extrac- measured by MS quantification we obtain
tion and the proportion of the resultant sample analyzed by MS. ĉd (𝜋fa + (1 − 𝜋)fb ) (1)
The former is unknown, whilst the latter is measurable - conse-
quently, c is unknownable. However, mixing the relative proportions of observed profiles ĉa fa
First, let fa be the proportions of a protein resident in organelle and ĉb fb results in

a across the fractions, likewise for fb . Thus, i fai = 1 and

i fbi = 1, where i denotes the ith fraction. Under MS quantifi- 𝜋̂ca fa + (1 − 𝜋)̂cb fb , (2)
cation, the proportions are transformed according to c, such
that we observe cfa and cfb . These are normalized to obtained which are not equal.

important to consider that missing values may be “missing not how they would appear in a profiling spatial proteomics experi-
at random,” for example, dependent on treatment, or “missing ment. The comparison with directly combining relative profiles
at random,” for example, due to stochastic processes inherent to in the same ratios indicates that relative profiles do not capture
data-dependent acquisition MS.[41] The latter is especially prob- multi-localizations accurately (Figure 1). Resolution of this prob-
lematic for label free quantitation (LFQ) approaches and despite lem requires conversion of relative protein abundances within
efforts to identify the optimal imputation approach for LFQ,[42] fractions to proportional abundance across fractions but this is
the effect of imputation on profiling spatial proteomics experi- difficult to achieve by simply quantifying the total proportion of
ments have not been considered. Isobaric labeling, used in many protein in each fraction, since extraction of the protein results in
spatial proteomics studies,[14,18–21] significantly reduces the pro- a loss of material.[44] An alternative and promising approach for
portion of missing values.[43] However, to address most of the SILAC-compatible systems is to spike-in a consistent heavy la-
extended problems, a greater number of isobaric multiplexes are beled reference sample to achieve proportional quantification.[20]
required, which reduces the total number of peptides or proteins Third, when studying differential localization, it is important
quantified in all samples. This is made worse when separate iso- to consider that cell lysis, organelle morphology and/or cellular
baric multiplexes are used for different experimental conditions sub-structure may be considerably altered across conditions. For
as it becomes more difficult to determine whether the missing example, in order to determine the content of specific vesicles
values between multiplexes occur at random. captured by specific golgins, Shin et al. relocated the golgins
Second, non-targeted proteomics does not measure the abso- to the mitochondria by replacing their Golgi targeting domains
lute copy number of proteins but rather the abundance of a given with a mitochondrial transmembrane domains.[24] This relo-
protein, relative to the total amount of protein in the sample. calization leads to increased mitochondrial “zippering”[45] and
To compare profiles between proteins with very different cellu- thus the mitochondria and interacting peroxisomes sediment
lar abundances, the fraction abundances are typically scaled to at a lower centrifugation speed (Figure 2). Furthermore, we
generate an abundance profile across the fractions. Isobaric tag- have previously observed that the truncated G1 and S phases in
ging allows the quantification values for each fraction to be de- mouse embryonic stem cells have a significant impact on the
rived from the exact same peptide-spectrum match (PSM), which resolution of Golgi profiles.[18] Given that the morphology of
reduces the variance of quantification significantly compared to many organelles is altered during mitosis,[46] one would expect
LFQ.[43] However, the resultant abundances are still relative to that conditions which alter the cell cycle stage distribution may
the amount of protein which was labeled. In either case, there significantly affect organelle morphology.
are different total quantities of protein present in each experi- Finally, the lack of a suitable number of ground truths or strong
mental fraction and thus, we cannot estimate the mixture pro- prior expectations for all the extended questions severely ham-
portions from the observed profiles (see Box 1). To demonstrate pers the development, implementation and comparison of tools
this, we simulated multi-localization using previously published to address them. Consider PTMs as an example; the role of phos-
data.[19] Relative quantification values were adjusted to approxi- phorylation in signaling pathways is well appreciated,[5,7] but the
mate proportions of protein in each fraction. Multi-localization number of phosphorylation sites with known impact on localiza-
between the cytosol and mitochondria was then simulated by tion which have been experimentally validated is limited when
combining proportional quantification profiles for respective compared to our knowledge about the main subcellular localiza-
marker proteins and converted to relative abundances to observe tion of proteins. As such, computational methods to examine the

Proteomics 2020, 1900392 1900392 (4 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

Figure 1. Relative quantification cannot be used to estimate the representation of proteins in multiple localizations. A) Profiles for 10–90% cytosol from
top left to bottom right. Simulated multi-localization profiles presented in color gradient. Equivalent profiles from mixing relative profiles shown in black.
B) Projection of simulated profiles onto principal components (color gradient). Cytosolic and mitochondrial marker proteins shown in yellow and blue,
respectively. C) Projection of relative profile mixtures.

role of phosphorylation in protein localization cannot take ad- approximately half the proteome is multi-localized, since it
vantage of ground truths, or even strong prior expectations, and cannot be classified to a discrete localization, although we noted
the assessment of the validity of the results from any method is there are many explanations for this, not all of which relate
not straightforward. With these considerations in mind, we now to multi-localization.[19] Taking a similar approach, Orre et al.
examine the attempts that have been made so far to address the observed consistent classification between biological replicates,
extended questions with profiling spatial proteomics. with further analysis indicating that inconsistent classifications
were likely due to inaccurate quantification.[30] From this, they
suggest that less than 10% of the proteome is multi-localized.
5. First Attempts to Address the Extended In both these studies, a classification schema designed to
Questions determine the main localization was repurposed into an esti-
mation of the proportion of multi-localized proteins and the
The classification approach has been highly effective at identify- disagreement is likely the result of poorly considered definitions
ing the main localization of proteins; however, this framework for multi-localization. Indeed, as previously noted, successful
has led the field to adopt sub-optimal methods to answer the ex- identification of the main localization for a given protein is not
tended problems (see Table 2). mutually exclusive with that protein being multi-localized, and
As a first example, differing estimates have been reported absence of primary classification has many explanations.
for the proportion of proteins which are multi-localized using Similarly, dynamic localization has also been studied within
profiling spatial proteomics. We have previously suggested that the classification framework by treating the problem as two
separate classification tasks and comparing classifications dif-
ferentially between control and treatment.[47] Whilst this is a
valid approach, it can only identify clear cases of discrete differ-
ential localization and misses smaller changes in proportional
localization.
To address the extended question of multi-localization, dif-
ferential localization and PTMs in a classification-independent
manner, informal methods have been introduced. These ap-
proaches have not clearly stated their assumptions or adequately
justified their methodology and the community has not adopted a
standardized rigorous approach. To elaborate, let us consider the
approaches of Krahmer et al., where they attempt to address all
three of these questions.[23] First, they extend the protein correla-
tion profiling approach with the goal of determining dual local-
izations (see Box 2). Second, to identify differences in the profiles
between conditions, a test based on correlating intra- and inter-
condition profiles was proposed (see Box 3). Finally, to analyze
the spatial phosphoproteome, they apply several filtering steps
Figure 2. Mitochondrial and peroxisomal proteins show a shift toward along with their proposed method for correlating intra- and inter-
earlier fractions in 2 out of 3 replicates when golgin-97 is ectopically ex- condition profiles (see Box 4). In all three cases, assumptions are
pressed. Data used with permission from [24] . not explicitly stated or evaluated, the testing framework involves

Proteomics 2020, 1900392 1900392 (5 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

Table 1. Technical challenges in profiling spatial proteomics with particular relevance to addressing the extended problems. Missing values are prob-
lematic for the simple task of determining primary localization but even more so when addressing the extended problems. Similarly, ground truths are
required for the simple task but these are usually readily available. Relative quantification and inconsistent organelle morphology are only challenging
for the extended problems.

Technical challenge Explanation Questions affected

Missing values More samples result in more missing values, especially when • Differential localization
comparing diverse conditions. • PTMs
Relative quantification Protein quantification relative to the fraction cannot be readily • Multi-localization
converted to protein proportion in each fraction.
Inconsistent organelle morphology and/or cell composition Treatments can alter organelle Morphology/cell composition, • Differential localization
invalidating implicit assumptions of the testing framework.
Insufficient ground truths or strong prior expectations We have strong expectations for protein main localization but • Multi-localization
far fewer expectations for the extended questions. Thus, • Differential localization
proposed solutions cannot use prior knowledge and • PTMs
assessment of the quality of results obtained is difficult.

Table 2. Studies addressing the extended problems of multi-localization, differential localization, and the role of post-translational modifications. The
computational approach used and study findings are briefly summarized.

Study Computational approach Findings

HeLa cells Informal Mahalanobis distance-based test 2824/5265 proteins multi-localized


(Itzhak et al.) (MR method) Four protein differentially localized in response to
EGF stimulation
HCMV infection in human lung fibroblasts Comparative SVM Classification 270/2775 proteins multi-localized
(Beltran et al.) 83 host protein differentially localized during infection
Mouse neurons with EGF stimulation SVM Classification 66 protein differentially localized with an estimated
(Itzhak et al.) FDR <10%
AP-4 Knockout in HeLa cells (Davies et al.) MR method Three proteins differentially localized (1% FDR)
AP-5 Knockout in HeLa cells (Hirst et al.) MR method and correlation based filtering 26 proteins differentially localized (23% FDR)
Mouse hepatic steatosis Correlation-based informal testing 910 proteins differentially localized (20% FDR)
(Krahmer et al.) 2542 phosphopeptides differentially localized (10%
FDR)
U-2 OS cell line SVM Classification 58–65% of proteins are multi-localized (no primary
(Geladaki et al.) localization assigned at estimated 5% FDR)
Five human cell lines and response to EGFR Comparative SVM classification with heuristic <10% protein multi-localized (inconsistent SVM
inhibition in HCC327 filtering classification between replicates
(Orre et al.) 295 candidate differentially localized proteins
(different SVM classification)
13 differentially localized candidates after heuristic
filtering
Relocalization of golgin tethered transport Bayesian non-parametric two sample test and 139 proteins with profile shifted toward mitochondria
vesicles in HEK 293T cells filtering by Mahalanobis distance to
(Shin et al.) mitochondria

heuristic filtering step(s) and the test itself is frequently not de- how appropriate a distance approach is. To be more precise,
fined clearly. what is meant by a big or interesting distance, as a formal test
As an alternative method to identify differential localization, statistic, needs the context of which organelles the relocalization
Itzhak et al. proposed an informal testing approach denoted as is between. Thus, it is hard to completely justify an approach
the movement-reproducibility (MR) method.[20] This method agnostic to its spatial context.
uses the Mahalanobis distance between inter-condition profiles, To estimate the FDR of the MR method, the suggestion is to
as well as the correlation between these distances. Though perform a “mock” experiment, comparing control versus control
this approach could be formalized, a null hypothesis is never (requiring three additional replicates) so that the number of false
stated. Given that the distances between proteins of the same positives can be estimated for a given cut-off value. This is based
organelle and between proteins of different organelles can vary on the implicit but unstated assumption that organelle profiles
considerably within an experiment it is also unclear, in general, remain similar across conditions. Assuming that most proteins

Proteomics 2020, 1900392 1900392 (6 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

Box 2.
Krahmer et al. determining dual localizations

To determine dual localization Krahmer et al. take the following strated in Box 1, one has to be careful when making conclusion
approach. For each protein from mixed profiles. The first unstated part of the method is how
• The most likely primary organelle is determined using the the marker profiles are combined, presumably they were mixed
SVM. in different possible proportions; however, they might have sim-
• The median profile of the primary organelle is combined with ply been averaged. Furthermore, it is unclear which method of
the median organelle profiles of the markers. correlation was applied. It appears that the authors incorrectly
• The correlation value between the protein/peptide profile and posit correlation > 0.4 was unreliable rather than < 0.4. Finally,
these in silico profiles are determined. the authors report an alpha value, but never state how this was
• Proteins/peptides with correlation > 0.4 are considered unre- computed nor what it actually represents. The distribution of
liable assignments. this measure is never reported so that we can determine how
• The alpha value is reported as a quantitative measure of sec- the alpha value changes from proteins with known single or dual
ond organelle localization. localizations.

Critique
The mixing of observed marker profiles can provide an expecta-
tion for multi-localization. However, as we have already demon-

Box 3.
Krahmer et al. determining changes in localization

To determine changes in localization Krahmer et al. take the fol- • Then, for each protein, compute the difference in these quanti-
lowing approach. ties.
• Within each condition compute the correlation between the
quantitative profiles and retain them only if their maximum 𝛿𝜌 = 𝜌̄B − 𝜌̄W (6)
correlation is greater than 0.5. Precisely, the maximum correla-
tion for the ith protein is • The list is then ranked from largest to smallest.
• The approach is repeated for Spearman and Pearson correla-
𝜌max = max{𝜌(fi1 , fi2 ), 𝜌(fi2 , fi3 ), 𝜌(fi1 , fi3 )}, (3) tions and the results combined.
• An FDR threshold of 0.2 is set.
where fij denotes the quantitative profiles for protein i in repli- Critique
cate j. Then if 𝜌max > 0.5 for protein i then the profiles are re- The initial filtering steps are arbitrary and there is not a justifica-
tained for further analysis. tion of why they are performed or what was the motivation for the
• Repeat this process for all the conditions.
threshold. Both Spearman and Pearson correlations are used
• Take the top two most correlated profiles from each condition,
which have different assumptions, so it is unclear what is the
and compute the average of the within conditions correlations meaning of combined correlation results. If Pearson correlations
and the average of the between condition correlations for the are used then there is an implicit assumption of bivariate nor-
(c)
same replicates. Making this explicit, we write fij for protein i mality that is never checked. Furthermore, correlation cannot be
in replicate j and in condition c. Then compute the average of averaged because they are not additive. Rather they obey the law
the within replicate correlations of cosines and should be treated as such. It is also unclear how
( ) ( ) the different correlations were combine and what an appropriate
(1) (1) (2) (2)
𝜌 fij , fij′ + 𝜌 fij , fij′ null hypothesis is in this scenario. This makes it hard to under-
𝜌̄W = (4) stand how a p-value is computed and whether there was correc-
2 tion for multiple testing. Without clearly stating the assumptions
it is challenging to assess how appropriate these methods are
and the average of the between condition correlations
and how they apply to other datasets or even if the methodology
( ) ( ) presented is valid. Unfortunately, because of the lack of clarity
(1) (2) (1) (2)
𝜌 fij , fij + 𝜌 fij′ , fij′ and the use of propriety software it is impossible to reproduce the
𝜌̄B = . (5) analysis.
2

Proteomics 2020, 1900392 1900392 (7 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

Box 4.
Krahmer et al. determining changes in the phosphoproteome

To determine phosphoproteome changes∖textit (Krahmer et al.) Critique


take the following approach. All of the assumptions of determining changes in localization
• The subcellular localization of phosphopeptides is identified are applied without evaluation that these assumptions hold true.
using PCP. Furthermore, there is no apparent requirement for the intersect-
• A filter to retain those sites where phosphorlyation changes ing change in phosphorylation, change in protein localization
across condition is applied. and phosphopeptide localization to be concordant.
• For the remaining phosphosites filter to those within pro-
teins whose localization changes according to: determining
changes in localization.

(at least 90%) do not relocalize, these additional replicates are un- works, so long as the proteins’ profiles still cluster, the precise
necessary as proper calibration of FDR can be achieved by com- normalization approach is unlikely to have a significant impact.
puting an empirical null using large-scale data analysis methods However, when modeling the data, we often have to make as-
such as permuting sample labels.[48] More importantly however, sumptions about the underlying distributions and the normaliza-
the assumption of consistent organelle profiles was not tested in tion method may well invalidate assumptions about the distribu-
any studies where MR method was applied and, as shown here tions of the profiles upon which our models are based. As such,
(Figure 2), organelle profiles can change significantly across con- it will be necessary to devote more attention to the proper pro-
ditions, and thus, this approach is not generalizable. cessing of spatial proteomics data. Furthermore, modeling repli-
cates allows a direct assessment of classification confidence and
reliability, in contrast to the common approach of concatenating
6. Modeling to Address the Extended Questions replicates before SVM classification,[18,19,23] which precludes such
inferences.
The success of the MR method should not be undersold, having
been used successfully to find differential localizations in several
studies.[20,21,25,26] However, the question is whether some protein 7. Quantifying Uncertainty with Bayesian Modeling
relocalization has been overlooked and if the same results could
be obtained with fewer experiments. Further benefit can be drawn by moving to a Bayesian inference
Formal testing procedures do have their place when per- framework, which enables the quantification of uncertainty.[49]
formed appropriately. Consider the study of Shin et al. where For the classification task, we have already presented solutions to
a Bayesian non-parametric two sample test was used to de- this problem through the T-Augmented Gaussian Mixture model
termine whether protein profiles were perturbed between two (TAGM; and its non-parametric counterpart) which attempts to
conditions.[24] Amongst those proteins with perturbed profiles directly model the data.[50,51] The modeling framework allows
those that demonstrated relocalization toward the Mitochondria treatment of markers as strong priors and learning of the truly
were of particular interest within this study. However, the mi- representative distribution of the organelle from the data. This
tochondrial profile was condition-dependent; hence, a marker allows us to distinguish proteins that have confident localizations
agnostic approach was not appropriate and instead the squared from those that are uncertain between two or more localizations.
Mahalanobis distance to the profile of mitochondrial markers in Modeling approaches are not without their limitations and of-
each condition was used to identify the movements toward the ten resulting in increased computational burden. For example,
mitochondria. TAGM is currently limited to modeling the data at the protein
Multi-localization, differential localization and the role of level and discards potentially valuable information at the peptide
PTMs in localization are challenging questions for profiling spa- and PSM level.
tial proteomics. To develop a methodology to answer these ques- More elaborate models can enable simultaneous assignment
tions they first need precise definitions. From here unified meth- of proteins to organelles and novelty detection by allowing
ods can be developed with clear assumption so that the extent proteins to either be assigned to an annotated organelle or one
of their applicability can be assessed. We believe that to address that has not been manually annotated. For example, Crook et al.
the extended questions of profiling spatial proteomics we must propose a semi-supervised Bayesian approach and uncover a
model the data. This brings with it new challenges. To date, clas- novel group of Saccharomyces cerevisiae proteins trafficking from
sification approaches have usually made few assumptions and the ER to the early Golgi apparatus.[52] Additionally, differential
considerations for data processing, for example, peptide spec- localization could also be elucidated using joint models across
tral match (PSM) aggregation and profile normalization have re- conditions, with uncertainty quantification to assist in ranking
ceived little attention. In particular, a variety of methods are used candidates for future experimental investigation. Similarly, the
to normalize profiles (max signal, sum normalization, relative relationship between post-translational modification and differ-
to heavy spike-in). For support vector machines and neural net- ential localization could be examined by comparing profiles for

Proteomics 2020, 1900392 1900392 (8 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

the modified and unmodified protein forms. Through careful credibly fruitful approach to study macromolecular localization
consideration of the question in hand, well-designed models and activity.[13,53] Future studies can help to uncover the degree
have the capacity to uncover many valuable insights which may to which proteins reside in multiple subcellular niches and ex-
be missed by ad hoc approaches. pand our understanding of post-translational regulation of pro-
tein localization. Through consideration and refinement of the
experimental techniques and computational analyses, the endur-
8. Appropriate Models Depend on Experimental ing approach of cellular fractionation can be fully leveraged to
capture the intricacies of protein localization underlying physio-
Design
logical processes and disease aetiology.
The biological system of interest, experimental design and statis-
tical or mathematical model in question are not independent or Acknowledgements
modular entities. They are better thought of as parts of a whole
and can vitally inform each other. If quantification of protein pro- O.M.C. is a Wellcome Trust Mathematical Genomics and Medicine
student and acknowledges generous funding from the School of Clini-
portions in different compartments is desired, then the model,
cal Medicine, Cambridge. T.S. and M.E. are supported by the Medical
design and system should reflect this, perhaps at the cost of other Research Council, Grant/Award number: 5TR00; Wellcome Trust,
important quantities such as the resolution of the subcellular Grant/Award numbers: 110170/Z/15/Z, 110071/Z/15/Z.
niches interrogated. Furthermore, subcellular fractionation may
not need to be exactly replicable if the desire is to achieve the max- Conflict of Interest
imum possible separation of organelles, but the applied model
should be aware of this choice. Finally, if the translocation of pro- The authors declare no conflict of interest.
teins of interest is between two organelles with similar biochem-
ical properties, then the experiment can be designed to ensure Author Contributions
maximal separation of these subcellular niches and prior infor-
O.M.C. and T.S. contributed equally to this work. Author contributions are
mation about these properties can be embedded into a statistical
described according to CRediT standards. O.M.C. and T.S. contributed to
model. Design, modeling, and experiment are an iterative pro- the conceptualization, writing the original draft, writing the review and
cess that allows each to build upon the former to gain the most editing, visualization, and formal analysis. M.E. contributed to the con-
information possible. ceptualization, writing the original draft, writing the review and editing.
K.S.L. contributed to the conceptualization, writing the original draft, su-
pervision, project administration, and funding acquisition.

9. Discussion and Outlook


The modern inception of profiling protein abundance across Keywords
organelle-enriched fractions has been hugely successful in fur-
thering our understanding of protein function. organelle, protein localization
As we re-purpose this method to examine multi-localization,
Received: March 20, 2020
differential localization, and the interplay of posttranslational
Revised: May 18, 2020
modification and localization, it is an opportune moment to con- Published online:
sider how best to marry novel experimental and computational
strategies. One framing of all these questions could be achieved
by quantifying the proportion of the protein in each localization
assayed. From these proportions, one could then assign the pri-
mary localization of a protein, whether it is multi-localized within [1] T. J. Gibson, Trends Biochem. Sci. 2009, 34, 471.
[2] C. K. Barlowe, E. A. Miller, Genetics 2013, 193, 383.
a single condition, and whether the proportions are affected by
[3] R. Puertollano, S. M. Ferguson, J. Brugarolas, A. Ballabio, EMBO J.
condition or post-translational modification. However, our exper-
2018, 37.
imental constraints preclude straightforward determination of [4] A. Y. Lee, W. Chen, S. Stippec, J. Self, F. Yang, X. Ding, S. Chen, Y. C.
localization proportions. Thus, in the near term, we expect that Juang, M. H. Cobb, Proc. Natl. Acad. Sci. USA 2012, 109, 16841.
approaches that tackle specific questions – through either a for- [5] E. A. Balta, M. T. Wittmann, M. Jung, E. Sock, B. M. Haeberle, B.
malized testing approach or question-specific model – are likely Heim, F. von Zweydorf, J. Heppt, J. von Wittgenstein, C. J. Gloeckner,
to predominate. D. C. Lie, Front. Mol. Neurosci. 2018, 11, 211.
Precisely stated questions and careful interpretation of results [6] F. Christian, E. L. Smith, R. J. Carmody, Cells 2016, 5, 12.
are crucially important, regardless of the approach taken. In [7] H. Kiu, S. E. Nicholson, Groundwater 2012, 30, 88.
this respect, combining profiling spatial proteomics with lower [8] M. Mani, C. Chen, V. Amblee, H. Liu, T. Mathur, G. Zwicke, S. Zabad,
B. Patel, J. Thakkar, C. J. Jeffery, Nucleic Acids Res. 2015, 43, D277.
throughput experiments will remain a valuable approach going
[9] V. Hung, S. S. Lam, N. D. Udeshi, T. Svinkina, G. Guzman, V. K.
forward. For example, by combining profiling spatial proteomics
Mootha, S. A. Carr, A. Y. Ting, eLife 2017, 6, e24463.
with targeted microscopy-based techniques, differential localiza- [10] J. Y. Youn, W. H. Dunham, S. J. Hong, J. D. R. Knight, M. Bashkurov,
tion can be further investigated to determine the cell-to-cell vari- G. I. Chen, H. Bagci, B. Rathod, G. MacLeod, S. W. M. Eng, S. Angers,
ability. Q. Morris, M. Fabian, J. F. Côté, A. C. Gingras, Mol. Cell 2018, 69, 517.
The concept of cellular fractionation as originally described by [11] P. J. Thul, L. Åkesson, M. Wiking, D. Mahdessian, A. Geladaki, H. Ait
Albert Claude and Christian de Duve has proved to be an in- Blal, T. Alm, A. Asplund, L. Björk, L. M. Breckels, A. Bäckström, F.

Proteomics 2020, 1900392 1900392 (9 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

Danielsson, L. Fagerberg, J. Fall, L. Gatto, C. Gnann, S. Hober, M. [30] L. M. Orre, M. Vesterlund, Y. Pan, T. Arslan, Y. Zhu, A. Fernandez
Hjelmare, F. Johansson, S. Lee, C. Lindskog, J. Mulder, C. M. Mulvey, Woodbridge,O. Frings, E. Fredlund, J. Lehtiö, Mol. Cell 2019, 73, 166.
P. Nilsson, P. Oksvold, J. Rockberg, R. Schutten, J. M. Schwenk, Å. [31] G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Sta-
Sivertsson, E. Sjöstedt, et al., Science 2017, 356, eaal3321. tistical Learning, Vol. 112, Springer, New York, NY 2013.
[12] B. Cox, A. Emili, Nat. Protoc. 2006, 1, 1872. [32] M. Ester, H. P. Kriegel, J. Sander, X. Xu, in KDD’96: Proc. of the Sec-
[13] C. de Duve, J. Cell Biol. 1971, 50, 20d. ond Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press,
[14] T. P. J. Dunkley, R. Watson, J. L. Griffin, P. Dupree, K. S. Lilley, Mol. Portland, OR 1996.
Cell. Proteomics 2004, 3, 1128. [33] L. J. P. van der Maaten, G. E. Hinton, J. Mach. Learn. Res. 2008, 9,
[15] L. J. Foster, C. L. de Hoog, Y. Zhang, Y. Zhang, X. Xie, V. K. Mootha, 2579.
M. Mann, Cell 2006, 125, 187. [34] L. M. Breckels, L. Gatto, A. Christoforou, A. J. Groen, K. S. Lilley, M.
[16] D. J. L. Tan, H. Dvinge, A. Christoforou, P. Bertone, M. A. Arias, K. S. W. B. Trotter, J. Proteomics 2013, 88, 129.
Lilley, J. Proteome Res. 2009, 8, 2667. [35] L. Gatto, L. M. Breckels, S. Wieczorek, T. Burger, K. S. Lilley, Bioinfor-
[17] S. L. Hall, S. Hester, J. L. Griffin, K. S. Lilley, A. P. Jackson, Mol. Cell. matics 2014, 30, 1322.
Proteomics 2009, 8, 1295. [36] S. Tyanova, T. Temu, P. Sinitcyn, A. Carlson, M. Y. Hein, T. Geiger, M.
[18] A. Christoforou, C. M. Mulvey, L. M. Breckels, A. Geladaki, T. Hurrell, Mann, J. Cox, Nat. Methods 2016, 13, 731.
P. C. Hayward, T. Naake, L. Gatto, R. Viner, A. Martinez Arias, K. S. [37] K. Barylyuk, L. Koreny, H. Ke, S. Butterworth, O. M. Crook, I. Lassadi,
Lilley, Nat. Commun. 2016, 7, 8992. V. Gupta, E. Tromer, T. Mourier, T. J. Stevens, L. M. Breckels, A. Pain,
[19] A. Geladaki, N. K. Britovšek, L. M. Breckels, T. S. Smith, O. L. Vennard, K. S. Lilley, R. F. Waller, bioRxiv: 2020.04.23.057125 2020.
C. M. Mulvey, O. M. Crook, L. Gatto, K. S. Lilley, Nat. Commun. 2019, [38] O. M. Crook, K. S. Lilley, L. Gatto, P. D. W. Kirk, arXiv:1903.02909 [stat]
10, 331. 2019.
[20] D. N. Itzhak, S. Tyanova, J. Cox, G. H. Borner, eLife 2016, 5, e16950. [39] T. F. Wu, C. J. Lin, R. C. Weng, J. Mach. Learn. Res. 2004, 5, 975.
[21] D. N. Itzhak, C. Davies, S. Tyanova, A. Mishra, J. Williamson, R. [40] X. C. Fan, J. A. Steitz, Proc. Natl. Acad. Sci. USA 1998, 95, 15293.
Antrobus, J. Cox, M. P. Weekes, G. H. H. Borner, Cell Rep. 2017, 20, [41] B. Zhang, L. Käll, R. A. Zubarev, Mol. Cell. Proteomics 2016, 15, 1467.
2706. [42] C. Lazar, L. Gatto, M. Ferro, C. Bruley, T. Burger, J. Proteome Res. 2016,
[22] M. Jadot, M. Boonen, J. Thirion, N. Wang, J. Xing, C. Zhao, A. Tan- 15, 1116.
nous, M. Qian, H. Zheng, J. K. Everett, D. F. Moore, D. E. Sleat, P. [43] J. D. O’Connell, J. A. Paulo, J. J. O’Brien, S. P. Gygi, J. Proteome Res.
Lobel, Mol. Cell. Proteomics 2017, 16, 194. 2018, 17, 1934.
[23] N. Krahmer, B. Najafi, F. Schueder, F. Quagliarini, M. Steger, S. Seitz, [44] P. Feist, A. B. Hummon, Int. J. Mol. Sci. 2015, 16, 3537.
R. Kasper, F. Salinas, J. Cox, N. H. Uhlenhaut, T. C. Walther, R. Jung- [45] M. Wong, S. Munro, Science 2014, 346, 1256898.
mann, A. Zeigerer, G. H. H. Borner, M. Mann, Dev. Cell 2018, 47, [46] M. L. M. Jongsma, I. Berlin, J. Neefjes, Trends Cell Biol. 2015, 25, 112.
205.e7. [47] P. M. J. Beltran, R. A. Mathias, I. M. Cristea, Cell Syst. 2016, 3, 361.e6.
[24] J. J. H. Shin, O. M. Crook, A. Borgeaud, J. Cattin-Ortolá, S. Y. Peak- [48] B. Efron, Large-scale inference: empirical Bayes methods for estimation,
Chew, J. Chadwick, K. S. Lilley, S. Munro, bioRxiv: 841965 2019. testing, and prediction, Vol. 1, Cambridge University Press, Cambridge
[25] A. K. Davies, D. N. Itzhak, J. R. Edgar, T. L. Archuleta, J. Hirst, L. P. 2012.
Jackson, M. S. Robinson, G. H. H. Borner, Nat. Commun. 2018, 9. [49] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, D. B.
[26] J. Hirst, D. N. Itzhak, R. Antrobus, G. H. H., Borner, M. S., Robinson, Rubin, Bayesian Data Analysis, Chapman and Hall/CRC, Boca Raton,
PLoS Biol. 2018, 16, e2004411. FL 2013.
[27] J. S., Andersen, C. J., Wilkinson, T., Mayor, P., Mortensen, E. A. Nigg, [50] O. M. Crook, C. M. Mulvey, P. D. W. Kirk, K. S. Lilley, L. Gatto, PLoS
M. Mann, Nature 2003, 426, 570. Comput. Biol. 2018, 14, e1006516.
[28] L. M. Breckels, C. M. Mulvey, K. S. Lilley, L. Gatto, F1000Research [51] O. M. Crook, L. M. Breckels, K. S. Lilley, P. D. W. Kirk, L. Gatto,
2016, 5, 2926. F1000Research 446, 2019, 8.
[29] L. Gatto, L. M. Breckels, T. Burger, D. J. H. Nightingale, A. J. Groen, [52] O. Crook, A. Geladaki, D. J. H. Nightingale, O. Vennard, K. S. Lilley,
C. Campbell, N. Nikolovski, C. M. Mulvey, A. Christoforou, M. Ferro, L. Gatto, P. D. Kirk, bioRxiv: 2020.05.05.078345 2020.
K. S. Lilley, Mol. Cell. Proteomics 2014, 13, 1937. [53] A. Claude, J. Exp. Med. 1946, 84, 61.

Proteomics 2020, 1900392 1900392 (10 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

You might also like