Make it rain
12 Oct 2017 by Evoluted New Media
Jacky Pallas knew that putting biomedical research in the cloud on a global scale was never going to be an easy task. Now, two years after the launch of the eMedLab programme, she says that the project is delivering real and tangible benefits
Jacky Pallas knew that putting biomedical research in the cloud on a global scale was never going to be an easy task. Now, two years after the launch of the eMedLab programme, she says that the project is delivering real and tangible benefits
Four years ago a group of academics from some of Europe’s leading biomedical research institutions came together to write a grant application. This is nothing new but in this case the proposal was for a very novel type of computational cluster, one designed and built to handle the large compute and data demands of medical bioinformatics.
The vision was, and still is, to maximise the gains for patients and for medical research that will come from the explosion in human health data – genomics, imaging and electronic health records. Following the award from the MRC in 2014 the team started to design, procure and build the MRC eMedLab private cloud and also to recruit four outstanding young scientists to form the eMedLab Research Academy.
We now have over 100 active users supporting a range of science challenges in cancer, infection, auto-immune conditions, heart failure and rare diseases in children. The MRC eMedLab team is made up of senior academics, researchers and technology specialists from seven organisations; University College London, Queen Mary University of London, London School of Hygiene and Tropical Medicine, the Francis Crick Institute, the Wellcome Trust Sanger Institute, the EMBL European Bioinformatics Institute, and King’s College London.
We work alongside research consortia from the UK and across the world on large data science projects and we work with OCF and other technology companies to make the private cloud infrastructure function as efficiently and effectively as possible. This is truly a team science programme1.
Under a biomedical cloud
Advances in biomedical genomics, imaging and electronic capture of clinical data mean that bioinformatics is now a very data intensive discipline. So, the challenge the consortium had was one of accumulating medical and biological data on an unprecedented scale and complexity, coordinating it, storing it safely and securely, and to making it readily available to researchers.We set out to build a private cloud infrastructure that could deliver significant computing capacity to analyse anonymised patient data, together with a fast, scalable data storage system, which allows us to share access to large datasets across projects. As part of the funding rules from MRC, we had to design, procure, build and test this new kind of private cloud system within a very tight 12 months’ timeframe. We formed a team of technology specialists from all of the seven partners to help us to do this, working with researchers from across disciplines and institutions who provided the requirements for the system.
From the outset MRC eMedLab wanted to adopt a cloud-like ‘virtual machine’ environment as it gave the flexibility to accommodate lots of different project types. OCF, a high performance compute, storage and data analytics integrator, was chosen as the successful bidder, having successful partnerships with World leading technology vendors, Red Hat, Lenovo, and Mellanox Technologies.
The Red Hat Enterprise Linux OpenStack Platform was selected as it is a highly scalable solution that has enabled scientists to create and use virtual clusters bespoke to their specific needs. It has allowed us to select compute memory, processors, networking, storage and archiving policies, all orchestrated by a simple web-based user-Interface. Researchers are able access up to 6,000 cores of processing power.
The cloud infrastructure went live in April 2015 and was tested throughout the rest of the year with a select number of science pilots. Research was then scaled up in 2016 and the cloud infrastructure is now fully occupied by projects. A joint team of researchers and technologists evaluate proposals for scientific merit and technological feasibility and then allocate compute and data resources to the new projects. We have recently worked closely with OCF to upgrade the OpenStack Platform so we can manage the system more efficiently.
Was it a success?
Other than being a mere £1.06 over its £6.8M capital budget, the MRC eMedLab project has been, and continues to be, a huge success. It has become a key infrastructure resource for the MRC as well as an exemplar for other advanced research computing projects.The success, in my view, has to be attributed to the concept – adopted by the entire consortium – of ‘partnership working’, where everybody contributed to building and using one shared resource. OCF, our integration partner was there to provide services, support and consultancy, in addition to the hardware and software solutions, and we have a new model of federated operations support with staff from all the partner institutions.
We don’t just share the compute and data resource efficiently, but also share the learning, the technology and the science too. We have published our first Nature paper, on variation in human stem cells2, as well as several other high impact papers on, for example, the genetics of infectious diseases and dementia. Members of the team have given talks at a range of high profile science meetings from the American Society of Human Genetics to the international OpenStack conference. We have also received further awards of funding from MRC and others. For example the London School of Hygiene and Tropical Medicine is working on a project, in collaboration with researchers in Africa and Vietnam, looking at population levels and the prevalence of HIV and TB. They are interested in how the pathogen/bacteria evolve and the genetics of human resistance.
MRC eMedLab now supports six separate projects in MRC’s stratified medicine portfolio. These are a consortia of universities, NHS and industry working together to identify better therapies for auto-immune conditions such as rheumatoid arthritis and psoriasis based on the genetics of the patient. Other shining examples include research by the Francis Crick Institute investigating cancer evolution, the development of new personalised immunotherapies against tumours and collaboration on research into rare diseases in with at Great Ormond Street Hospital for Sick Children. Other new research involves linking genomics and brain imaging to better understand dementia, studying rare mitochondrial diseases and understanding how stem cells function.
Professor Charles Swanton, Professor of Translational Cancer Therapeutics at The Francis Crick Institute and UCL Cancer Institute is clear about the impact MRC eMedLab has had on his work. “Understanding cancer evolution over space and time is a complex task reliant on advanced computational technologies. MRC eMedLab has been critical to the developments in our understanding of how cancers evolve and adapt, the processes that foster cancer cell variation that allow natural section to function and ultimately how we might go about slowing down tumour evolution to improve patient survival outcomes,” he said.
MRC eMedLab was the result of a group of diverse individuals coming together with a shared goal of improving the resources available to researchers. It was a technically innovative project that has paved the way for further projects supported by the MRC and other funders.
It is that mix of skills, knowledge and experience which makes for a successful project, what we now call ‘team science’.
Author: Jacky Pallas is Director of eResearch at King’s College London
References: 1. https://www.teamsciencetoolkit.cancer.gov/public/WhatisTS.aspx 2. http://www.nature.com/nature/journal/v546/n7658/full/nature22403.html