Glycoproteomics: a new era in biomarkers
6 Aug 2021
Klaus Lindpaintner takes us on a journey of discovery, automation and scale-up that has applied AI to mass spectrometry data analysis to deliver a portfolio of glycoproteomic classifiers in more than a dozen indications, bringing glycoprotein biomarkers to market within unprecedented timescales.
… the application of AI to mass spectrometry data has resulted in a series of rapid scientific accomplishments that demonstrate how ubiquitously impactful protein glycosylation is…
It is the summer of 2016. Aldo Carrascoso, software whiz and serial entrepreneur, is troubled by numerous cases of cancer in his family. Seeking an explanation, he learns that no known genetic and genomic cancer markers, such as BRCA, were present in his afflicted family members.
His search leads him to Carolyn Bertozzi of Stanford University, and Carlito Lebrilla of University of California, Davis, both scientists acclaimed for their work in the area of protein glycosylation. This challenging field of research looks at the complex ways in which proteins are modified by the addition of a variety of different sugar (glycan) molecules at specific sites.
Unravelling protein glycosylation mechanisms
Carolyn is recognised as a pioneer in unravelling the mechanistic role of protein glycosylation in a variety of medical conditions, particularly in cancer, and Carlito has been one of the foremost drivers of glycoprotein chemistry analytics to study its impact in various disease states. Aldo is intrigued by the potential power of glycoproteomics to aid in his mission, because of the biologic plausibility that the position of glucose molecules on proteins may have powerful effects on the actual function of the proteins in the body.
The glycosylation process is situated well downstream in the cascade from genetic blueprint to ultimate biologic phenotype (e.g., presence or absence of disease) and as such offers an opportunity to integrate a panoply of factors known to affect health, among them genetic blueprint, nutritional and metabolic states, lifestyle choices such as exercise or tobacco use, and other environmental influences such as an individual’s microbiome.
Sugars as biological information carriers
While the scientific community has been aware of the biological importance of sugar molecules since the late 19th century, the role of this class of molecules as carriers of biological information remains largely under-appreciated, probably due to the challenges of analysing this most complex of biologic “alphabets”. Compared to the two other major carriers of biologic information – nucleic acids and proteins— this complexity has to do with the basic biochemistry of glycans.
Glycan structures are branched where nucleic acids and proteins are linear. Glycans contain multiple different molecular bonds and are synthesised by a multi-step, multi-enzyme catalytic processes rather than having a simple linear template. Information is carried in a large number of oligosaccharide building blocks with different chiralities. Add to this the positional specificity of the amino-acid residue to which any glycan can be attached in a given protein, and glycoproteins expand into a vast galaxy of potential biological information content.
Analysing glycoprotein data with AI
It makes perfect sense to Aldo that protein glycosylation – which affects up to 80% of all proteins—could be a powerful tool to find the answers he is seeking. But there is one major stumbling block: the analysis of the complex data files generated by mass spectrometry is – even with the best available software – extraordinarily time-consuming. Despite recent advances, a set of glycoprotein data on 100 patients requires a full-time, dedicated PhD effort of eight months to become interpretable biomedical information. Aldo sees an opportunity to massively accelerate and automate this process by applying his expertise in artificial intelligence and neural networks. He, Carolyn, and Carlito establish InterVenn Biosciences in early 2017. The mission is to exploit the marriage of the sub-molecular resolution of mass spectrometry with advanced artificial intelligence data processing to create a platform that will catapult glycoproteomics to industrial scale and practical application for medical and clinical problems.
Fast forward to spring, 2021. As the company’s founders predicted, the application of AI to mass spectrometry data has resulted in a series of rapid scientific accomplishments that demonstrate how ubiquitously impactful protein glycosylation is for a broad array of biological functions and medical conditions. Processing and analysing the glycoproteomic data of 100 patients now takes six minutes and requires a server, not a PhD. And yet, the results correlate perfectly with the more laborious traditional approach.
Using this AI-powered platform and machine-learning approach, a portfolio of glycoproteomic classifiers in more than a dozen indications - with impressive levels of accuracy - has been generated successfully. All are based on peripheral blood (“liquid biopsy”) rather than tissue biopsy. These classifiers predict malignant vs. benign conditions, aggressive vs. indolent disease course, and response to certain medical therapies.
Predicting cancer treatment side-effects
The company’s present focus is on a promising test for predicting which cancer patients will respond to novel immuno-oncology drugs. Only a fraction of patients benefit from these medications – and those who do not may suffer serious side effects. Preliminary data indicate that MS-AI-platform-technology-derived biomarker classifiers are highly effective in recognising which patients will respond. The company now works to bring this test to market.
Author: Klaus Lindpaintner is the Chief Scientific and Medical Officer of InterVenn Biosciences intervenn.com