Using AI to better understand cancer

Publicly released:
Australia; NSW
CMRI
CMRI

Scientists are using a new artificial intelligence-based method to improve our understanding of how cancer cells behave.

Media release

From: Children's Medical Research Institute (CMRI)

In a collaboration between researchers at the INESC-ID, Instituto Superior Técnico - University of Lisbon, Portugal, and ProCan at Children’s Medical Research Institute in Sydney, a new artificial intelligence-based method has been developed to significantly improve our understanding of how cancer cells behave.

Published in the prestigious scientific journal, Nature Communications, a deep learning method, known as Multi-Omic Synthetic Augmentation (MOSA), is able to extract valuable additional information from large sets of data that have been painstakingly collected by laboratory scientists about cancer cells.

Omics refers to any type of big data about biological systems. For example, genomics refers to data about DNA and proteomics to data about proteins. Multi-omics refers to collections of two or more sets of omic data, from which many new insights can be obtained using advanced computational methods.

MOSA is designed to deal with a common problem in multi-omic databases called sparsity (gaps in the available data). In the example studied in the Nature Communications paper, because of the complexities of collecting ‘omic data, a set of cancer cells had results for anywhere between two and seven types of ‘omic data, which meant that the data set was incomplete.

The ProCan researchers involved in the collaboration included lead author Dr Zhaoxiang (Simon) Cai and joint senior author Associate Professor Qing Zhong. Senior author, Dr Emanuel Gonçalves, is an Assistant Professor at Instituto Superior Técnico, University of Lisbon. Scientists from the Wellcome Sanger Institute, Cambridge, UK, also contributed to the study.

Dr Cai explained that their MOSA method was able to artificially synthesise data to fill gaps in the multi-omic data for more than 1500 cancer cell lines, representing a wide range of cancer types, expanding the total dataset by 32.7% at a much lower cost than would be required to perform the laboratory tests. The advantage of using the resulting combination of real and synthetic data is that it is often superior to the real data alone for training machine learning models.

“We showed that the augmented data resulted in increased accuracy in predicting how cancer cells would respond to anticancer treatments and provided more opportunities to discover new potential drug targets”, said Dr Cai.

Professor Roger Reddel, who is a study co-author and a co-director of ProCan, said “This is a significant step towards ProCan’s goal of being able to predict what treatment any individual patient’s cancer will respond to, so we can assist cancer clinicians choose the best available treatment for each of their patients.”

Story is now online: Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning | Nature Communications

Journal/
conference:
Nature Communications
Research:Paper
Organisation/s: Children's Medical Research Institute (CMRI), The University of Sydney, University of Lisbon
Funder: This research was funded in part by the Wellcome Trust Grant 206194. ProCan® is supported by the Australian Cancer Research Foundation, Cancer Institute New South Wales (NSW) (2017/TPG001,REG171150), NSW Ministry of Health (CMP-01), The University of Sydney, Cancer Council NSW (IG 18-01), Ian Potter Foundation, the Medical Research Futures Fund (MRFF-PD), National Health and Medical Research Council (NHMRC) of Australia European Union grant (GNT1170739, a companion grant to support the European Commission’s Horizon 2020 Program, H2020-SC1-DTH-2018-1,’iPC- individualized Paediatric Cure’ [ref. 826121]), and National Breast Cancer Foundation (IIRS-18-164). Work at ProCan® is done under the auspices of a Memorandum of Understanding between Children’s Medical Research Institute and the U.S. National Cancer Institute’s International Cancer Proteogenome Consortium (ICPC), that encourages cooperation among institutions and nations in proteogenomic cancer research in which datasets are made available to the public. Z.C. is the recipient of a PhD Scholarship from Sydney Cancer Partners with funding from Cancer Institute NSW (2021/CBG0002). A.R.B. is funded by the Portuguese national agency Fundação para a Ciência e a Tecnologia (FCT) through the research grant UI/BD/154599/2022. This work has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 951970 (OLISSIPO project). For open access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. This work was supported by national funds through FCT, under project UIDB/50021/2020 (https://doi.org/10.54499/UIDB/50021/2020). The authors acknowledge the OSCARS project, funded by the European Commission’s Horizon Europe Research and Innovation Programme under grant agreement No. 101129751.
Media Contact/s
Contact details are only visible to registered journalists.