MOSA Software

for improving cancer analysis

Uploaded by

Nagasai Kavya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views12 pages

MOSA Software

for improving cancer analysis

Uploaded by

Nagasai Kavya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Article https://doi.org/10.

1038/s41467-024-54771-4

Synthetic augmentation of cancer cell line

multi-omic datasets using unsupervised
deep learning
Received: 21 December 2023 Zhaoxiang Cai 1,5, Soﬁa Apolinário 2,3,5, Ana R. Baião2,3, Clare Pacini 4,
Miguel D. Sousa2,3, Susana Vinga 2,3, Roger R. Reddel 1, Phillip J. Robinson 1
,
Accepted: 18 November 2024
Mathew J. Garnett 4, Qing Zhong 1 & Emanuel Gonçalves 2,3

Check for updates Integrating diverse types of biological data is essential for a holistic under-
1234567890():,;
1234567890():,;

standing of cancer biology, yet it remains challenging due to data hetero-

geneity, complexity, and sparsity. Addressing this, our study introduces an
unsupervised deep learning model, MOSA (Multi-Omic Synthetic Augmenta-
tion), specifically designed to integrate and augment the Cancer Dependency
Map (DepMap). Harnessing orthogonal multi-omic information, this model
successfully generates molecular and phenotypic profiles, resulting in an
increase of 32.7% in the number of multi-omic profiles and thereby generating
a complete DepMap for 1523 cancer cell lines. The synthetically enhanced data
increases statistical power, uncovering less studied mechanisms associated
with drug resistance, and refines the identification of genetic associations and
clustering of cancer cell lines. By applying SHapley Additive exPlanations
(SHAP) for model interpretation, MOSA reveals multi-omic features essential
for cell clustering and biomarker identification related to drug and gene
dependencies. This understanding is crucial for developing much-needed
effective strategies to prioritize cancer targets.

The growing molecular and phenotypic characterization of cancer cell Despite recent successes of deep learning7 multi-omics integra-
lines makes them one of the most studied human cell models1. This tion faces several limitations, most importantly high heterogeneity of
ever-growing and rich multi-omic data continues to drive the identi- different data types (e.g., discrete vs. continuous distributions),
ﬁcation of cancer genes and the discovery of therapeutic targets2–4. intrinsic technological limitations (e.g., missing values), and limited
Although genomics has been a primary focus in the search for pre- data availability (e.g., in this study, only 25.8% of the cancer cell lines
dictive biomarkers in cancer, recent functional genetic screens con- have a complete set of all seven omic datasets under consideration)8.
ducted by the Cancer Dependency Map (DepMap) consortium Unsupervised machine learning has been successful in multi-omics
revealed that less than 20% of RNAi cancer dependencies could be integration capturing patterns of data variation shared across different
explained by mutations and copy number alterations5. This highlights omics9,10. This approach highlighted cancer cellular states associated
the importance of developing holistic machine learning models cap- with epithelial-to-mesenchymal transition (EMT), a key process in drug
able of vertically integrating orthogonal datasets. In this case, vertical resistance and metastasis11. Unsupervised deep learning based models
integration involves not only genomics but also other types of can generate improved versions of input datasets by reconstructing
omics data6. missing measurements and correcting experimental error, and

1
ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia. 2INESC-ID, 1000-029
Lisboa, Portugal. 3Instituto Superior Técnico (IST), Universidade de Lisboa, 1049-001 Lisboa, Portugal. 4Wellcome Sanger Institute, Wellcome Genome
Campus, Cambridge CB10 1SA, UK. 5These authors contributed equally: Zhaoxiang Cai, Soﬁa Apolinário. e-mail: [email protected];
[email protected]