Moving Profiling Spatial Proteomics Beyond Discrete
Moving Profiling Spatial Proteomics Beyond Discrete
www.proteomics-journal.com
The ORCID identification number(s) for the author(s) of this article 2. Identifying the Main Localization
can be found under https://doi.org/10.1002/pmic.201900392
© 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH
A single protein copy can only be present in one localization
& Co. KGaA, Weinheim. This is an open access article under the terms of at any given time, although it may transit or traffick within
the Creative Commons Attribution License, which permits use, the cell during its lifecycle from its point of synthesis to the
distribution and reproduction in any medium, provided the original work location(s) where it functions, onto where it is finally degraded.
is properly cited. Since profiling spatial proteomics assays multiple cells, each
DOI: 10.1002/pmic.201900392 containing multiple protein copies, it captures protein copies
Proteomics 2020, 1900392 1900392 (1 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
Proteomics 2020, 1900392 1900392 (2 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
In the first instance, the selection of appropriate marker pro- ing spatial proteomics cannot distinguish between these two
teins is a vital step in the classification workflow and the cu- possibilities - contrary to single cell methods, including (amongst
ration of markers is typically dataset dependent to ensure they others) microscopy-based approaches.[11] Furthermore, we can-
properly represent a single localization.[28,29] However, problems not distinguish the root of multi-localization; the process of
can arise when annotating compartments with only a few well pools of proteins dynamically interchanging is fundamentally
annotated marker proteins including, amongst others; endo- different from discrete pools of translated proteins where the
somes, poorly characterized localizations such as cytosolic gran- components never interchange, yet both will simply be observed
ules, and cells with highly specialized organelles and/or with as a multi-localization.
poor annotation.[37]
Independently of the precise manner in which markers are
selected, they inevitably represent a biased sample,[38] with a 3.1.2. Protein Relocalization
skew toward well documented proteins and sub-cellular loca-
tions. Therefore, the accuracy of marker classification does not This may occur in response to internal and external cues. We de-
indicate classification accuracy for non-marker proteins, as is fine these changes in either comparative or time series profiling
sometimes suggested,[21] as non-markers can be expected to be spatial proteomics experiments as instances of differential local-
classified with lower accuracy.[38] ization. Alterations in protein synthesis or protein degradation
Application of SVM for classification purposes has been a may also be observed as changes in localization and hence we re-
very popular approach to date, however, the interpretation of fer to this eventuality as differential localization to avoid any in-
SVM scores requires particular care. The SVM is a discriminative ference of active protein movement. Where a consistent change
model, rather than a generative one, and so SVM scores do not in localization occurs for the majority of protein copies assayed,
represent probabilities. Indeed, the top SVM score for a multi- one may observe a discrete differential localization from one lo-
class classification may not be the most probable localization;[39] calization to another. However, in many cases, the observation
that is, the SVM probabilities need not be consistent. If proba- will be one of proportional changes in localization.
bilities are desired, these can be approximated using, for exam-
ple, quadratic optimization,[39] as others have done in profiling
spatial proteomics studies,[30] but such approximations may be
arbitrarily inaccurate. Alternatively, they can be estimated using 3.1.3. Post-Translational Modifications
additional hold-out data not used to train the classifier, but this is
rarely available given the number of marker proteins per class. These may regulate protein localization. A commonplace ex-
ample is phosphorylation.[4–6] In spatial proteomics studies, re-
searchers typically aim to identify differences in protein localiza-
tion governed by PTM status. A crucial consideration is that we
3.1. The Extended Questions cannot infer whether the modification is specifically regulating
localization as it is also possible that the localization is regulated
While classification algorithms are well established and valuable by other factors and the protein differentially modified accord-
to identify the main localization, they are inadequate to address ing to its localization. Therefore, it is appropriate to avoid assert-
the extended questions of mixed protein localization, differential ing these events as PTM-dependent localization; rather, these are
localization upon cell state perturbation, and the interplay be- best described as concurrent changes in PTM and protein local-
tween post-translational modifications and protein localization. ization.
Below, we set out what we consider to be clear definitions for All of the above questions, in theory, are answerable with cur-
these terms as the intuitive interpretations from a biological rent profiling spatial proteomics techniques. However, address-
standpoint may not match up to what we can assay with profiling ing these questions requires the development of new computa-
spatial proteomics. tional approaches. With these extended questions in hand, we
turn to the important technical considerations for profiling spa-
tial proteomics as we proceed beyond primary localization classi-
3.1.1. Multi-Localized Proteins fication.
Proteomics 2020, 1900392 1900392 (3 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
Box 1.
Proportional and Relative Quantification
It is not possible to estimate the proportion of protein resident ĉa fa and ĉb fb , to place them on the same scale (for example
∑
in each of its subcellular niches using relative protein quantifica- ĉj = c∕ i ci fji for j = a, b).
tion. To see this, consider the following argument. We have defined the measured profile of a protein with respect
During relative MS quantification there is a sampling rate c which to its true proportion and now consider a protein fd which has
represents the proportion of protein analyzed, relative to the ab- mixed localization between niches a and b, with proportions 𝜋
solute amount of protein in that fraction. This sampling rate c and 1 − 𝜋, respectively. Thus, fd = 𝜋fa + (1 − 𝜋)fb and so when
is corrupted by the loss of material as a result of protein extrac- measured by MS quantification we obtain
tion and the proportion of the resultant sample analyzed by MS. ĉd (𝜋fa + (1 − 𝜋)fb ) (1)
The former is unknown, whilst the latter is measurable - conse-
quently, c is unknownable. However, mixing the relative proportions of observed profiles ĉa fa
First, let fa be the proportions of a protein resident in organelle and ĉb fb results in
∑
a across the fractions, likewise for fb . Thus, i fai = 1 and
∑
i fbi = 1, where i denotes the ith fraction. Under MS quantifi- 𝜋̂ca fa + (1 − 𝜋)̂cb fb , (2)
cation, the proportions are transformed according to c, such
that we observe cfa and cfb . These are normalized to obtained which are not equal.
important to consider that missing values may be “missing not how they would appear in a profiling spatial proteomics experi-
at random,” for example, dependent on treatment, or “missing ment. The comparison with directly combining relative profiles
at random,” for example, due to stochastic processes inherent to in the same ratios indicates that relative profiles do not capture
data-dependent acquisition MS.[41] The latter is especially prob- multi-localizations accurately (Figure 1). Resolution of this prob-
lematic for label free quantitation (LFQ) approaches and despite lem requires conversion of relative protein abundances within
efforts to identify the optimal imputation approach for LFQ,[42] fractions to proportional abundance across fractions but this is
the effect of imputation on profiling spatial proteomics experi- difficult to achieve by simply quantifying the total proportion of
ments have not been considered. Isobaric labeling, used in many protein in each fraction, since extraction of the protein results in
spatial proteomics studies,[14,18–21] significantly reduces the pro- a loss of material.[44] An alternative and promising approach for
portion of missing values.[43] However, to address most of the SILAC-compatible systems is to spike-in a consistent heavy la-
extended problems, a greater number of isobaric multiplexes are beled reference sample to achieve proportional quantification.[20]
required, which reduces the total number of peptides or proteins Third, when studying differential localization, it is important
quantified in all samples. This is made worse when separate iso- to consider that cell lysis, organelle morphology and/or cellular
baric multiplexes are used for different experimental conditions sub-structure may be considerably altered across conditions. For
as it becomes more difficult to determine whether the missing example, in order to determine the content of specific vesicles
values between multiplexes occur at random. captured by specific golgins, Shin et al. relocated the golgins
Second, non-targeted proteomics does not measure the abso- to the mitochondria by replacing their Golgi targeting domains
lute copy number of proteins but rather the abundance of a given with a mitochondrial transmembrane domains.[24] This relo-
protein, relative to the total amount of protein in the sample. calization leads to increased mitochondrial “zippering”[45] and
To compare profiles between proteins with very different cellu- thus the mitochondria and interacting peroxisomes sediment
lar abundances, the fraction abundances are typically scaled to at a lower centrifugation speed (Figure 2). Furthermore, we
generate an abundance profile across the fractions. Isobaric tag- have previously observed that the truncated G1 and S phases in
ging allows the quantification values for each fraction to be de- mouse embryonic stem cells have a significant impact on the
rived from the exact same peptide-spectrum match (PSM), which resolution of Golgi profiles.[18] Given that the morphology of
reduces the variance of quantification significantly compared to many organelles is altered during mitosis,[46] one would expect
LFQ.[43] However, the resultant abundances are still relative to that conditions which alter the cell cycle stage distribution may
the amount of protein which was labeled. In either case, there significantly affect organelle morphology.
are different total quantities of protein present in each experi- Finally, the lack of a suitable number of ground truths or strong
mental fraction and thus, we cannot estimate the mixture pro- prior expectations for all the extended questions severely ham-
portions from the observed profiles (see Box 1). To demonstrate pers the development, implementation and comparison of tools
this, we simulated multi-localization using previously published to address them. Consider PTMs as an example; the role of phos-
data.[19] Relative quantification values were adjusted to approxi- phorylation in signaling pathways is well appreciated,[5,7] but the
mate proportions of protein in each fraction. Multi-localization number of phosphorylation sites with known impact on localiza-
between the cytosol and mitochondria was then simulated by tion which have been experimentally validated is limited when
combining proportional quantification profiles for respective compared to our knowledge about the main subcellular localiza-
marker proteins and converted to relative abundances to observe tion of proteins. As such, computational methods to examine the
Proteomics 2020, 1900392 1900392 (4 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
Figure 1. Relative quantification cannot be used to estimate the representation of proteins in multiple localizations. A) Profiles for 10–90% cytosol from
top left to bottom right. Simulated multi-localization profiles presented in color gradient. Equivalent profiles from mixing relative profiles shown in black.
B) Projection of simulated profiles onto principal components (color gradient). Cytosolic and mitochondrial marker proteins shown in yellow and blue,
respectively. C) Projection of relative profile mixtures.
role of phosphorylation in protein localization cannot take ad- approximately half the proteome is multi-localized, since it
vantage of ground truths, or even strong prior expectations, and cannot be classified to a discrete localization, although we noted
the assessment of the validity of the results from any method is there are many explanations for this, not all of which relate
not straightforward. With these considerations in mind, we now to multi-localization.[19] Taking a similar approach, Orre et al.
examine the attempts that have been made so far to address the observed consistent classification between biological replicates,
extended questions with profiling spatial proteomics. with further analysis indicating that inconsistent classifications
were likely due to inaccurate quantification.[30] From this, they
suggest that less than 10% of the proteome is multi-localized.
5. First Attempts to Address the Extended In both these studies, a classification schema designed to
Questions determine the main localization was repurposed into an esti-
mation of the proportion of multi-localized proteins and the
The classification approach has been highly effective at identify- disagreement is likely the result of poorly considered definitions
ing the main localization of proteins; however, this framework for multi-localization. Indeed, as previously noted, successful
has led the field to adopt sub-optimal methods to answer the ex- identification of the main localization for a given protein is not
tended problems (see Table 2). mutually exclusive with that protein being multi-localized, and
As a first example, differing estimates have been reported absence of primary classification has many explanations.
for the proportion of proteins which are multi-localized using Similarly, dynamic localization has also been studied within
profiling spatial proteomics. We have previously suggested that the classification framework by treating the problem as two
separate classification tasks and comparing classifications dif-
ferentially between control and treatment.[47] Whilst this is a
valid approach, it can only identify clear cases of discrete differ-
ential localization and misses smaller changes in proportional
localization.
To address the extended question of multi-localization, dif-
ferential localization and PTMs in a classification-independent
manner, informal methods have been introduced. These ap-
proaches have not clearly stated their assumptions or adequately
justified their methodology and the community has not adopted a
standardized rigorous approach. To elaborate, let us consider the
approaches of Krahmer et al., where they attempt to address all
three of these questions.[23] First, they extend the protein correla-
tion profiling approach with the goal of determining dual local-
izations (see Box 2). Second, to identify differences in the profiles
between conditions, a test based on correlating intra- and inter-
condition profiles was proposed (see Box 3). Finally, to analyze
the spatial phosphoproteome, they apply several filtering steps
Figure 2. Mitochondrial and peroxisomal proteins show a shift toward along with their proposed method for correlating intra- and inter-
earlier fractions in 2 out of 3 replicates when golgin-97 is ectopically ex- condition profiles (see Box 4). In all three cases, assumptions are
pressed. Data used with permission from [24] . not explicitly stated or evaluated, the testing framework involves
Proteomics 2020, 1900392 1900392 (5 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
Table 1. Technical challenges in profiling spatial proteomics with particular relevance to addressing the extended problems. Missing values are prob-
lematic for the simple task of determining primary localization but even more so when addressing the extended problems. Similarly, ground truths are
required for the simple task but these are usually readily available. Relative quantification and inconsistent organelle morphology are only challenging
for the extended problems.
Missing values More samples result in more missing values, especially when • Differential localization
comparing diverse conditions. • PTMs
Relative quantification Protein quantification relative to the fraction cannot be readily • Multi-localization
converted to protein proportion in each fraction.
Inconsistent organelle morphology and/or cell composition Treatments can alter organelle Morphology/cell composition, • Differential localization
invalidating implicit assumptions of the testing framework.
Insufficient ground truths or strong prior expectations We have strong expectations for protein main localization but • Multi-localization
far fewer expectations for the extended questions. Thus, • Differential localization
proposed solutions cannot use prior knowledge and • PTMs
assessment of the quality of results obtained is difficult.
Table 2. Studies addressing the extended problems of multi-localization, differential localization, and the role of post-translational modifications. The
computational approach used and study findings are briefly summarized.
heuristic filtering step(s) and the test itself is frequently not de- how appropriate a distance approach is. To be more precise,
fined clearly. what is meant by a big or interesting distance, as a formal test
As an alternative method to identify differential localization, statistic, needs the context of which organelles the relocalization
Itzhak et al. proposed an informal testing approach denoted as is between. Thus, it is hard to completely justify an approach
the movement-reproducibility (MR) method.[20] This method agnostic to its spatial context.
uses the Mahalanobis distance between inter-condition profiles, To estimate the FDR of the MR method, the suggestion is to
as well as the correlation between these distances. Though perform a “mock” experiment, comparing control versus control
this approach could be formalized, a null hypothesis is never (requiring three additional replicates) so that the number of false
stated. Given that the distances between proteins of the same positives can be estimated for a given cut-off value. This is based
organelle and between proteins of different organelles can vary on the implicit but unstated assumption that organelle profiles
considerably within an experiment it is also unclear, in general, remain similar across conditions. Assuming that most proteins
Proteomics 2020, 1900392 1900392 (6 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
Box 2.
Krahmer et al. determining dual localizations
To determine dual localization Krahmer et al. take the following strated in Box 1, one has to be careful when making conclusion
approach. For each protein from mixed profiles. The first unstated part of the method is how
• The most likely primary organelle is determined using the the marker profiles are combined, presumably they were mixed
SVM. in different possible proportions; however, they might have sim-
• The median profile of the primary organelle is combined with ply been averaged. Furthermore, it is unclear which method of
the median organelle profiles of the markers. correlation was applied. It appears that the authors incorrectly
• The correlation value between the protein/peptide profile and posit correlation > 0.4 was unreliable rather than < 0.4. Finally,
these in silico profiles are determined. the authors report an alpha value, but never state how this was
• Proteins/peptides with correlation > 0.4 are considered unre- computed nor what it actually represents. The distribution of
liable assignments. this measure is never reported so that we can determine how
• The alpha value is reported as a quantitative measure of sec- the alpha value changes from proteins with known single or dual
ond organelle localization. localizations.
Critique
The mixing of observed marker profiles can provide an expecta-
tion for multi-localization. However, as we have already demon-
Box 3.
Krahmer et al. determining changes in localization
To determine changes in localization Krahmer et al. take the fol- • Then, for each protein, compute the difference in these quanti-
lowing approach. ties.
• Within each condition compute the correlation between the
quantitative profiles and retain them only if their maximum 𝛿𝜌 = 𝜌̄B − 𝜌̄W (6)
correlation is greater than 0.5. Precisely, the maximum correla-
tion for the ith protein is • The list is then ranked from largest to smallest.
• The approach is repeated for Spearman and Pearson correla-
𝜌max = max{𝜌(fi1 , fi2 ), 𝜌(fi2 , fi3 ), 𝜌(fi1 , fi3 )}, (3) tions and the results combined.
• An FDR threshold of 0.2 is set.
where fij denotes the quantitative profiles for protein i in repli- Critique
cate j. Then if 𝜌max > 0.5 for protein i then the profiles are re- The initial filtering steps are arbitrary and there is not a justifica-
tained for further analysis. tion of why they are performed or what was the motivation for the
• Repeat this process for all the conditions.
threshold. Both Spearman and Pearson correlations are used
• Take the top two most correlated profiles from each condition,
which have different assumptions, so it is unclear what is the
and compute the average of the within conditions correlations meaning of combined correlation results. If Pearson correlations
and the average of the between condition correlations for the are used then there is an implicit assumption of bivariate nor-
(c)
same replicates. Making this explicit, we write fij for protein i mality that is never checked. Furthermore, correlation cannot be
in replicate j and in condition c. Then compute the average of averaged because they are not additive. Rather they obey the law
the within replicate correlations of cosines and should be treated as such. It is also unclear how
( ) ( ) the different correlations were combine and what an appropriate
(1) (1) (2) (2)
𝜌 fij , fij′ + 𝜌 fij , fij′ null hypothesis is in this scenario. This makes it hard to under-
𝜌̄W = (4) stand how a p-value is computed and whether there was correc-
2 tion for multiple testing. Without clearly stating the assumptions
it is challenging to assess how appropriate these methods are
and the average of the between condition correlations
and how they apply to other datasets or even if the methodology
( ) ( ) presented is valid. Unfortunately, because of the lack of clarity
(1) (2) (1) (2)
𝜌 fij , fij + 𝜌 fij′ , fij′ and the use of propriety software it is impossible to reproduce the
𝜌̄B = . (5) analysis.
2
Proteomics 2020, 1900392 1900392 (7 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
Box 4.
Krahmer et al. determining changes in the phosphoproteome
(at least 90%) do not relocalize, these additional replicates are un- works, so long as the proteins’ profiles still cluster, the precise
necessary as proper calibration of FDR can be achieved by com- normalization approach is unlikely to have a significant impact.
puting an empirical null using large-scale data analysis methods However, when modeling the data, we often have to make as-
such as permuting sample labels.[48] More importantly however, sumptions about the underlying distributions and the normaliza-
the assumption of consistent organelle profiles was not tested in tion method may well invalidate assumptions about the distribu-
any studies where MR method was applied and, as shown here tions of the profiles upon which our models are based. As such,
(Figure 2), organelle profiles can change significantly across con- it will be necessary to devote more attention to the proper pro-
ditions, and thus, this approach is not generalizable. cessing of spatial proteomics data. Furthermore, modeling repli-
cates allows a direct assessment of classification confidence and
reliability, in contrast to the common approach of concatenating
6. Modeling to Address the Extended Questions replicates before SVM classification,[18,19,23] which precludes such
inferences.
The success of the MR method should not be undersold, having
been used successfully to find differential localizations in several
studies.[20,21,25,26] However, the question is whether some protein 7. Quantifying Uncertainty with Bayesian Modeling
relocalization has been overlooked and if the same results could
be obtained with fewer experiments. Further benefit can be drawn by moving to a Bayesian inference
Formal testing procedures do have their place when per- framework, which enables the quantification of uncertainty.[49]
formed appropriately. Consider the study of Shin et al. where For the classification task, we have already presented solutions to
a Bayesian non-parametric two sample test was used to de- this problem through the T-Augmented Gaussian Mixture model
termine whether protein profiles were perturbed between two (TAGM; and its non-parametric counterpart) which attempts to
conditions.[24] Amongst those proteins with perturbed profiles directly model the data.[50,51] The modeling framework allows
those that demonstrated relocalization toward the Mitochondria treatment of markers as strong priors and learning of the truly
were of particular interest within this study. However, the mi- representative distribution of the organelle from the data. This
tochondrial profile was condition-dependent; hence, a marker allows us to distinguish proteins that have confident localizations
agnostic approach was not appropriate and instead the squared from those that are uncertain between two or more localizations.
Mahalanobis distance to the profile of mitochondrial markers in Modeling approaches are not without their limitations and of-
each condition was used to identify the movements toward the ten resulting in increased computational burden. For example,
mitochondria. TAGM is currently limited to modeling the data at the protein
Multi-localization, differential localization and the role of level and discards potentially valuable information at the peptide
PTMs in localization are challenging questions for profiling spa- and PSM level.
tial proteomics. To develop a methodology to answer these ques- More elaborate models can enable simultaneous assignment
tions they first need precise definitions. From here unified meth- of proteins to organelles and novelty detection by allowing
ods can be developed with clear assumption so that the extent proteins to either be assigned to an annotated organelle or one
of their applicability can be assessed. We believe that to address that has not been manually annotated. For example, Crook et al.
the extended questions of profiling spatial proteomics we must propose a semi-supervised Bayesian approach and uncover a
model the data. This brings with it new challenges. To date, clas- novel group of Saccharomyces cerevisiae proteins trafficking from
sification approaches have usually made few assumptions and the ER to the early Golgi apparatus.[52] Additionally, differential
considerations for data processing, for example, peptide spec- localization could also be elucidated using joint models across
tral match (PSM) aggregation and profile normalization have re- conditions, with uncertainty quantification to assist in ranking
ceived little attention. In particular, a variety of methods are used candidates for future experimental investigation. Similarly, the
to normalize profiles (max signal, sum normalization, relative relationship between post-translational modification and differ-
to heavy spike-in). For support vector machines and neural net- ential localization could be examined by comparing profiles for
Proteomics 2020, 1900392 1900392 (8 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
the modified and unmodified protein forms. Through careful credibly fruitful approach to study macromolecular localization
consideration of the question in hand, well-designed models and activity.[13,53] Future studies can help to uncover the degree
have the capacity to uncover many valuable insights which may to which proteins reside in multiple subcellular niches and ex-
be missed by ad hoc approaches. pand our understanding of post-translational regulation of pro-
tein localization. Through consideration and refinement of the
experimental techniques and computational analyses, the endur-
8. Appropriate Models Depend on Experimental ing approach of cellular fractionation can be fully leveraged to
capture the intricacies of protein localization underlying physio-
Design
logical processes and disease aetiology.
The biological system of interest, experimental design and statis-
tical or mathematical model in question are not independent or Acknowledgements
modular entities. They are better thought of as parts of a whole
and can vitally inform each other. If quantification of protein pro- O.M.C. is a Wellcome Trust Mathematical Genomics and Medicine
student and acknowledges generous funding from the School of Clini-
portions in different compartments is desired, then the model,
cal Medicine, Cambridge. T.S. and M.E. are supported by the Medical
design and system should reflect this, perhaps at the cost of other Research Council, Grant/Award number: 5TR00; Wellcome Trust,
important quantities such as the resolution of the subcellular Grant/Award numbers: 110170/Z/15/Z, 110071/Z/15/Z.
niches interrogated. Furthermore, subcellular fractionation may
not need to be exactly replicable if the desire is to achieve the max- Conflict of Interest
imum possible separation of organelles, but the applied model
should be aware of this choice. Finally, if the translocation of pro- The authors declare no conflict of interest.
teins of interest is between two organelles with similar biochem-
ical properties, then the experiment can be designed to ensure Author Contributions
maximal separation of these subcellular niches and prior infor-
O.M.C. and T.S. contributed equally to this work. Author contributions are
mation about these properties can be embedded into a statistical
described according to CRediT standards. O.M.C. and T.S. contributed to
model. Design, modeling, and experiment are an iterative pro- the conceptualization, writing the original draft, writing the review and
cess that allows each to build upon the former to gain the most editing, visualization, and formal analysis. M.E. contributed to the con-
information possible. ceptualization, writing the original draft, writing the review and editing.
K.S.L. contributed to the conceptualization, writing the original draft, su-
pervision, project administration, and funding acquisition.
Proteomics 2020, 1900392 1900392 (9 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com
Danielsson, L. Fagerberg, J. Fall, L. Gatto, C. Gnann, S. Hober, M. [30] L. M. Orre, M. Vesterlund, Y. Pan, T. Arslan, Y. Zhu, A. Fernandez
Hjelmare, F. Johansson, S. Lee, C. Lindskog, J. Mulder, C. M. Mulvey, Woodbridge,O. Frings, E. Fredlund, J. Lehtiö, Mol. Cell 2019, 73, 166.
P. Nilsson, P. Oksvold, J. Rockberg, R. Schutten, J. M. Schwenk, Å. [31] G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Sta-
Sivertsson, E. Sjöstedt, et al., Science 2017, 356, eaal3321. tistical Learning, Vol. 112, Springer, New York, NY 2013.
[12] B. Cox, A. Emili, Nat. Protoc. 2006, 1, 1872. [32] M. Ester, H. P. Kriegel, J. Sander, X. Xu, in KDD’96: Proc. of the Sec-
[13] C. de Duve, J. Cell Biol. 1971, 50, 20d. ond Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press,
[14] T. P. J. Dunkley, R. Watson, J. L. Griffin, P. Dupree, K. S. Lilley, Mol. Portland, OR 1996.
Cell. Proteomics 2004, 3, 1128. [33] L. J. P. van der Maaten, G. E. Hinton, J. Mach. Learn. Res. 2008, 9,
[15] L. J. Foster, C. L. de Hoog, Y. Zhang, Y. Zhang, X. Xie, V. K. Mootha, 2579.
M. Mann, Cell 2006, 125, 187. [34] L. M. Breckels, L. Gatto, A. Christoforou, A. J. Groen, K. S. Lilley, M.
[16] D. J. L. Tan, H. Dvinge, A. Christoforou, P. Bertone, M. A. Arias, K. S. W. B. Trotter, J. Proteomics 2013, 88, 129.
Lilley, J. Proteome Res. 2009, 8, 2667. [35] L. Gatto, L. M. Breckels, S. Wieczorek, T. Burger, K. S. Lilley, Bioinfor-
[17] S. L. Hall, S. Hester, J. L. Griffin, K. S. Lilley, A. P. Jackson, Mol. Cell. matics 2014, 30, 1322.
Proteomics 2009, 8, 1295. [36] S. Tyanova, T. Temu, P. Sinitcyn, A. Carlson, M. Y. Hein, T. Geiger, M.
[18] A. Christoforou, C. M. Mulvey, L. M. Breckels, A. Geladaki, T. Hurrell, Mann, J. Cox, Nat. Methods 2016, 13, 731.
P. C. Hayward, T. Naake, L. Gatto, R. Viner, A. Martinez Arias, K. S. [37] K. Barylyuk, L. Koreny, H. Ke, S. Butterworth, O. M. Crook, I. Lassadi,
Lilley, Nat. Commun. 2016, 7, 8992. V. Gupta, E. Tromer, T. Mourier, T. J. Stevens, L. M. Breckels, A. Pain,
[19] A. Geladaki, N. K. Britovšek, L. M. Breckels, T. S. Smith, O. L. Vennard, K. S. Lilley, R. F. Waller, bioRxiv: 2020.04.23.057125 2020.
C. M. Mulvey, O. M. Crook, L. Gatto, K. S. Lilley, Nat. Commun. 2019, [38] O. M. Crook, K. S. Lilley, L. Gatto, P. D. W. Kirk, arXiv:1903.02909 [stat]
10, 331. 2019.
[20] D. N. Itzhak, S. Tyanova, J. Cox, G. H. Borner, eLife 2016, 5, e16950. [39] T. F. Wu, C. J. Lin, R. C. Weng, J. Mach. Learn. Res. 2004, 5, 975.
[21] D. N. Itzhak, C. Davies, S. Tyanova, A. Mishra, J. Williamson, R. [40] X. C. Fan, J. A. Steitz, Proc. Natl. Acad. Sci. USA 1998, 95, 15293.
Antrobus, J. Cox, M. P. Weekes, G. H. H. Borner, Cell Rep. 2017, 20, [41] B. Zhang, L. Käll, R. A. Zubarev, Mol. Cell. Proteomics 2016, 15, 1467.
2706. [42] C. Lazar, L. Gatto, M. Ferro, C. Bruley, T. Burger, J. Proteome Res. 2016,
[22] M. Jadot, M. Boonen, J. Thirion, N. Wang, J. Xing, C. Zhao, A. Tan- 15, 1116.
nous, M. Qian, H. Zheng, J. K. Everett, D. F. Moore, D. E. Sleat, P. [43] J. D. O’Connell, J. A. Paulo, J. J. O’Brien, S. P. Gygi, J. Proteome Res.
Lobel, Mol. Cell. Proteomics 2017, 16, 194. 2018, 17, 1934.
[23] N. Krahmer, B. Najafi, F. Schueder, F. Quagliarini, M. Steger, S. Seitz, [44] P. Feist, A. B. Hummon, Int. J. Mol. Sci. 2015, 16, 3537.
R. Kasper, F. Salinas, J. Cox, N. H. Uhlenhaut, T. C. Walther, R. Jung- [45] M. Wong, S. Munro, Science 2014, 346, 1256898.
mann, A. Zeigerer, G. H. H. Borner, M. Mann, Dev. Cell 2018, 47, [46] M. L. M. Jongsma, I. Berlin, J. Neefjes, Trends Cell Biol. 2015, 25, 112.
205.e7. [47] P. M. J. Beltran, R. A. Mathias, I. M. Cristea, Cell Syst. 2016, 3, 361.e6.
[24] J. J. H. Shin, O. M. Crook, A. Borgeaud, J. Cattin-Ortolá, S. Y. Peak- [48] B. Efron, Large-scale inference: empirical Bayes methods for estimation,
Chew, J. Chadwick, K. S. Lilley, S. Munro, bioRxiv: 841965 2019. testing, and prediction, Vol. 1, Cambridge University Press, Cambridge
[25] A. K. Davies, D. N. Itzhak, J. R. Edgar, T. L. Archuleta, J. Hirst, L. P. 2012.
Jackson, M. S. Robinson, G. H. H. Borner, Nat. Commun. 2018, 9. [49] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, D. B.
[26] J. Hirst, D. N. Itzhak, R. Antrobus, G. H. H., Borner, M. S., Robinson, Rubin, Bayesian Data Analysis, Chapman and Hall/CRC, Boca Raton,
PLoS Biol. 2018, 16, e2004411. FL 2013.
[27] J. S., Andersen, C. J., Wilkinson, T., Mayor, P., Mortensen, E. A. Nigg, [50] O. M. Crook, C. M. Mulvey, P. D. W. Kirk, K. S. Lilley, L. Gatto, PLoS
M. Mann, Nature 2003, 426, 570. Comput. Biol. 2018, 14, e1006516.
[28] L. M. Breckels, C. M. Mulvey, K. S. Lilley, L. Gatto, F1000Research [51] O. M. Crook, L. M. Breckels, K. S. Lilley, P. D. W. Kirk, L. Gatto,
2016, 5, 2926. F1000Research 446, 2019, 8.
[29] L. Gatto, L. M. Breckels, T. Burger, D. J. H. Nightingale, A. J. Groen, [52] O. Crook, A. Geladaki, D. J. H. Nightingale, O. Vennard, K. S. Lilley,
C. Campbell, N. Nikolovski, C. M. Mulvey, A. Christoforou, M. Ferro, L. Gatto, P. D. Kirk, bioRxiv: 2020.05.05.078345 2020.
K. S. Lilley, Mol. Cell. Proteomics 2014, 13, 1937. [53] A. Claude, J. Exp. Med. 1946, 84, 61.
Proteomics 2020, 1900392 1900392 (10 of 10) © 2020 The Authors. Proteomics published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim