King Abdullah University of Science & Technology (KAUST) researchers from their Computational Bioscience Research Center have developed a machine learning-based method to identify genes that stimulate tumor growth. The algorithms sift through reams of molecular data collected from studies of cancer cell lines, mouse models and human patients. This prediction method could help clinicians and care teams to tailor medicines to the molecular subtypes of patients as part of the ongoing path toward precision medicine. This method could also be used by biopharma companies to support their ongoing hunt for new therapeutic agents. The Saudi-based research team has made the work product freely available on GitHub.
The rate of discovery of new cancer-driving genes has been declining rapidly in recent years. This dynamic led the KAUST team to seek a new computational strategy. Rather than rely on sequence data, the Saudi Arabia-based team developed a machine learning model that factors in biological features of genes and pathways involved in tumor formation.
Sets up the Research Imperative
Traditionally, scientists have approached the search for genes with a causal role in cancer by starting with DNA sequence data. By extensively cataloging tumor mutations shared among patients with a common type of cancer, the research community has documented hundreds of genes with a causal impact on tumor development. Experimental follow up is then used to functionally associate these genes with the hallmarks of cancer.
The Research Project
Led by Robert Hoehndorf, the prediction method, described in Scientific Reports could help clinicians tailor medicines to the molecular subtypes of patients. According to Sara Althubaiti, a PhD student in the Hoehndorf Lab and first author of the study, “Our method can be used as a framework to predict and validate cancer-driver genes in any database or real population sample.”
As Ms. Althubaiti elaborated this method essentially turns the traditional approach upside down as she elaborated, “Essentially, our approach is knowledge-driven and we use tumor sequencing data as validation. This is unlike most approaches, which are data-driven combined with interpretation of the findings with respect to established knowledge.
The KAUST team designed the algorithm to recognize functional and phenotypic patterns that predispose a gene toward playing a role in driving tumor development. They validated the model using a publicly available database involving some 27,000 different tumor variants as well as functional and sequence data. They yielded outcomes evidencing the successful validation for the categorization of cancer-driving genes; not to mention detection of more than 100 other likely culprits—many with specific roles in particular tumor.
The KAUST investigators then further tested the algorithm’s performance on molecular data gathered from two cohorts of cancer patients. The first was from King Abdulaziz University Hospital in Saudi Arabia, comprising 26 tumor samples from individuals with a rare type of head and neck cancer called nasopharyngeal carcinoma. The other cohort comprised 114 colorectal cancer samples from patients treated at the University of Birmingham Hospital in the United Kingdom. In both patient groups, the model singled out candidate driver genes that were frequently mutated and shared pathogenic features of other cancer-causing genes.
The Work Product
The Saudi Arabian-based team has made the algorithm freely available on GitHub. See the link.
About King Abdullah University of Science & Technology Computational Bioscience Research Center
KAUST’s Computational Bioscience Research Center aims to solve the methodological and practical challenges linked to extracting useful information from Big Data in Biomedical Research. This center focuses on machine learning and high-performance computing for efficient knowledge, data and text mining. Their work employs the philosophy of integrated and experimental approaches for data generation and validation.