Seeing the whole picture
4 Sep 2018 by Evoluted New Media
What makes one patient different from another one? Answering this question is a fundamental prerequisite for personalised medicine – and a multi-omic approach is the only way to get there say EMBLs Britta Velten and Wolfgang Huber
We often do not understand why a disease is more aggressive in one patient than another, or why one person responds well to a drug but another doesn’t. This is one of the main challenges for clinicians: deciding on the most appropriate treatment for a certain patient, at a certain time. In order to arrive at better patient stratifications and treatment decisions, it is essential to find out what underlies the variation across patients in treatment outcome or disease progression.
By combining techniques for the same set of patients, commonly referred to as multi-omics approach, we can study the abundances and activities of important biomolecules in samples taken from individual patients.Nowadays, many different omics technologies enable researchers to explore the molecular basis of patient heterogeneity. Genome and transcriptome sequencing as well as epigenetic, proteomic or metabolomic profiling are only some examples. Increasingly, these assays are complemented by perturbation experiments, such as drug response screens. By combining some of these techniques for the same set of patients, commonly referred to as multi-omics approach, we can study the abundances and activities of important biomolecules in samples taken from individual patients.
However, while each single omic can provide important insights into the molecular sources of patient variability, it is often the case that no single one gives us the complete picture. Imagine taking pictures of a complex object, for example a house. To get a full view of what it looks like, you would take pictures from multiple angles and viewpoints. Each picture will give you some idea of the object - but only when combining the views from many different angles can you fully reconstruct what the object actually looks like. Similar principles hold true for complex diseases like cancer; they typically arise from the interaction of different biological layers and cannot be completely understood by probing only a single layer.
The challenge of integration By combining different omics technologies, researchers can obtain readouts from different types of molecules that make up our cells and tissues. And indeed, multi-omic studies on large cohorts are becoming more and more common, both in basic as well as in clinical research. These efforts yield growing volumes of rich and valuable datasets. However, a major challenge is making sense of these and to extract meaningful information. For example, identifying the important drivers of disease heterogeneity from these complex and large data collections can be laborious, as most computational methods can only operate with a single or a small number of different assays. As a result, the benefits of multi-omic profiling can get diminished if each omics data type is analysed on its own instead of combining the information. An integrative analysis, on the other hand, would help us to make more robust and powerful inference from the data.
While this sounds like a reasonable mission, there are a lot of challenges that stand in the way of achieving it. Each omic technology and its resulting data type comes along with its own properties, dimensions and characteristics, and few methods are available that enable joint analyses in an unbiased manner. For example, with a single dataset it is common practice to use principal component analysis to gain insights of the structure within the data, and what are the main axes of variation. This helps to obtain a simple visualisation of the most important structures and guide further analyses. However, with multiple data sets, a joint exploratory analysis of all data is difficult. In addition, in practice multi-omic studies are rarely complete and for many patient samples we may only have some but not all the possible views. We need methods that are able to cope with these irregular data structures.
To tackle these challenges, (we and?) researchers at the European Molecular Biology Laboratory developed Multi-Omics Factor Analysis (MOFA). MOFA is a computational tool to integrate the information obtained from different omics technologies1. Based on different types of omics data, MOFA finds the main sources of disease variability across different molecular data types. These sources are represented by factors that can be understood in a similar way as components from a principal component analysis. They can highlight disease subtypes or gradients and visualise the most important structures in the data. However, unlike principal components, they are informed by multiple data sets and hence are able to characterise from which molecular data type the variation originates and which individual markers or processes are driving it.
In particular, this facilitates connecting molecular markers across different biological layers that jointly underlie a specific source of disease variability. In addition to enabling a first exploration of the data, these factors can then be useful to eventually stratify patients into groups that may benefit from different types of treatment and reveal the molecular markers that characterise their specific disease characteristics.
MOFA and leukaemia In a recent paper published in Molecular Systems Biology, MOFA was used to analyse disease heterogeneity in chronic lymphocytic leukaemia (CLL), the most common type of leukaemia in the Western world. The researchers applied MOFA to data from a study that combined information from genome and transcriptome sequencing, methylation arrays and ex-vivo drug response screens for 200 patient samples2. By combining information from the different omics types, MOFA recovered major disease subgroups and gradients in CLL.
Some of these were related to important genetic markers that are already used to guide clinical care, while others could be linked to less well studied axes of disease heterogeneity or technical sources of variation. MOFA automatically connected those markers to variation originating from other molecular layers, such as differences in drug responses or expression. By combining all this information, MOFA also pinpointed samples for which the single clinical marker was inaccurate and could impute markers or drug responses for patient samples that were missing this information.
While MOFA aims to find the major sources of variation between patients, not all variation between patients is directly related to their clinical outcome. Also technical sources of variation arising in many omic technologies can be of interest as identifying them can help us to adjust for them in downstream analyses. However, in this study several of the main sources of variation that MOFA found were strongly associated to the severeness of the disease and consequently the time span until a patient needed treatment. In particular, the axes of variation identified by MOFA could jointly give better predictions for this clinical outcome than models that are based on a single omic technology or do not properly integrate the information from different technologies.
In addition to the promises of multi-omic integration for personalised medicine, similar approaches are important in a range of biological domains. A most recent development are multi-omic studies at single cell resolution. Unlike bulk studies, these can help us to also understand the variation of important markers between cells instead of only observing an average in an often very heterogeneous cell population. With ambitious initiatives like the Human Cell Atlas3, methods to dissect cell-to-cell heterogeneity are becoming increasingly important. While first applications of MOFA to single cell data have been promising, the researchers are currently still working on further improving the method so that it can cope with the increasing sample numbers in single cell data or more complex experimental designs.
We hope that MOFA will make it easier to integrate complex multi-omics data sets and gain an overview of major sources of sample variation and their molecular basis. To make exploration of multi-omics data easy various downstream analyses tools are provided as part of the method that can visualise the main sources of variation and their molecular signatures.
[box type="shadow" ]The software, together with extensive tutorials, is available at https://github.com/bioFAM/MOFA.[/box]
References:
- Argelaguet*, Velten*, et al. "Multi-Omics Factor Analysis—a framework for unsupervised integration of multi?omics data sets." Molecular Systems Biology 14.6 (2018): e8124.
- Dietrich*, Oles*, Lu* et al. (2018). "Drug-perturbation-based stratification of blood cancer." The Journal of clinical investigation 128(1): 427-445.
- Rozenblatt-Rosen, et al. "The Human Cell Atlas: from vision to reality." Nature News 550.7677 (2017): 451.
Authors:
Britta Velten is a predoctoral fellow in Professor Huber’s lab at the European Molecular Biology Laboratory (EMBL).
Wolfgang Huber is based at the European Molecular Biology Laboratory (EMBL)
Where he aims to understand inter-individual differences by large-scale statistical modelling and integrating multiple levels of genomic and molecular information