Finding the personal touch
18 Aug 2015 by Evoluted New Media
The drive for personalised medicine requires an increasingly intimate understanding of the human genome – something impossible without high-performance computing. But, says Dr Robert Esnouf, the pace of research must be maintained and the pressure to develop higher performance computing is on…
The drive for personalised medicine requires an increasingly intimate understanding of the human genome – something impossible without high-performance computing. But, says Dr Robert Esnouf, the pace of research must be maintained and the pressure to develop higher performance computing is on…
It’s been 15 years since scientists reported the first draft of the human genome; it cost nearly £2 billion to produce. Amazingly there is now a set of machines on the market, which can sequence 18,000 reasonable quality human genomes per year at less than £1000 each. What a phenomenal rate of improvement – and I should add there are new technologies on the horizon that are even more revolutionary.
At my own facility, we currently sequence over 500 genomes per year. And we store around 20,000 genomes – just over half a petabyte – on high-speed disk for analysis. The Wellcome Trust Centre for Human Genetics (WTCHG) is a research institute of the Nuffield Department of Medicine at the University of Oxford. It receives core support from the Wellcome Trust as well as financial support from the university and a wide range of research councils and charities. It is based in purpose-built laboratories on the University of Oxford’s Old Road Campus in Headington, one of the largest concentrations of biomedical research in the world.
The Centre is an international leader in genomics, statistical genetics and structural biology; we collaborate with research teams from across the world on a number of large-scale studies. Our research budget from competitively-won grants is close to £20 million annually, and we publish around 300 primary papers each year.
In the early days of genetics research, scientists focused on rare diseases – with single identifiable causes. Why? They only needed a small group, a family perhaps, to work out what was happening. Successes came through thick and fast.
However, when we look at common, complex human diseases, they are very complex indeed. They might appear as a single symptom, but there are a lot of causes underneath, each with its own complex dependencies. To look at those diseases we have to compare lots and lots of individual people – or rather their genetic information – all at the same time.
And there are so many genetic features to discover, some of which can lead to diabetes or obesity or heart disease, etc. For example, when studying type-2 diabetes, one of our teams found 80 genetic links and predicted it would be possible to identify about 500 more if only they could source the patient data to support their work. The diseases we’re looking at now are fantastically complex!
Other genetics projects at WTCHG include national and international studies on various cancers, malaria and analyses of bacterial genomes to trace the spread of infection. I’m pleased to say a lot of this genetics research is now trickling down to a clinical setting. When looking at cancer treatments, particular mutations in cancerous cells can mean certain therapies will not be effective. Clinicians are now looking for those genetic markers. Screening is almost routine for breast cancer, for example, and it is possible to tailor current therapies to suit – i.e. eliminating ineffective therapies based on genetic markers.
There is now a drive to personalise medicine – that is, health care where there is a genetic-analysis component. Science is hoping to study thousands – or even millions – of genetic samples over time to understand how certain elements affect populations and how this makes a difference to patients. Everyone is becoming more aware of genetics in medicine. Health services are, fairly understandably, conservative, but there is a big culture change happening. Many consultants were trained before the first genome was sequenced, but even they are acknowledging a move towards treatment informed by genetics.
We need to maintain the pace of research right now, and to do that we need more high-performance computing (HPC) – big file systems, big memory and big clusters. At our Centre, we now meet this challenge with two main clusters.
Our latest cluster uses Intel Ivy Bridge CPUs and provides a 2.6x performance increase over its predecessor built in 2011. It boasts 1,728 cores of processing power, up from 912, with 16GB 1866MHz memory per core (a total of 27.6TB in all) compared to a maximum of 8GB per core on our previous cluster. Our Research Computing facility now manages more than 4000 cores and 5PB storage, making it already one of the largest departmental computing facilities in a UK university.
The new cluster is working alongside the previous production cluster and they use a common management infrastructure and a common Mellanox FDR InfiniBand fabric that links the compute nodes to a DDN GRIDScaler SFA12K storage system whose controllers can read block data at 20GB/s.
Each research group at WTCHG can use its own server to submit jobs to, and receive results from, the clusters. If it will run on a server it can easily be redirected to the clusters. Users don’t need to logon directly to the clusters or be aware of other research groups using it. We try to isolate groups so they don’t slow each other down and have as simple an experience as possible. Users have Linux skills, but they do not need to be HPC experts to use the system safely and effectively.
The high-performance cluster and big-data storage systems were designed by us in partnership with OCF, a leading HPC, data management, big-data storage and analytics provider. As the integrator, OCF also provided the WTCHG team with training on the new system.
We have learned from past experience that we need to tailor our HPC technology to give us an edge in ‘all-against-all’ analyses of hundreds of genomes: lining up multiple genomes against each other and using sophisticated statistics to compare them and spot differences that might explain the genetic origin of diseases or susceptibility to diseases.
By understanding the characteristics of key genetics applications and optimising how they map onto the new cluster’s architecture, the Centre has also been able to improve dramatically the efficiency of some analyses. For example, an analysis of 1500 genomes using the Broad Institute’s Genome Analysis Tool Kit (GATK) used to take months, but similar analyses can now be completed in a week using fewer cores simply by tweaking a few filesystem parameters.
The new cluster has also proved itself to be perfectly suited to supporting research by the Centre’s Division of Structural Biology (STRUBI) and it has already produced some of the world’s highest-resolution electron microscopy reconstructions – revealing structural details vital to understanding processes such as infection and immunity. The improvement in the performance of electron microscopy codes, particularly Relion, is also very impressive: “movie-mode” processing requiring two weeks on eight 16-core nodes of a typical cluster is now completed in 24 hours on just six of the new FDR-enabled, high-memory nodes.
Research is driving our adoption of HPC. Compared to this time last year, our researchers – and we have about 100 active users – can put through around 5x more work and are doing so on a machine with the same energy footprint. I’m pleased to say that with the support of OCF and its hardware partners, like DDN, Mellanox and Fujitsu, we’re now fully armed to meet today’s challenges. But research never stands still; our current system may only be relevant for 3-5 years, and we’re already planning for the next phase!
The author:
Dr Robert Esnouf is Head of Research Computing Core at the Wellcome Trust Centre for Human Genetics