Getting from data to detail
18 Nov 2016 by Evoluted New Media
Cryo-electron microscopy may well be a landmark technology for high-res 3D imaging – potentially challenging even x-ray crystallography – but it is very data intensive. Jean-Christophe Ducom, from The Scripps Institute, explains how to effectively surf the data wave
Cryo-electron microscopy may well be a landmark technology for high-res 3D imaging – potentially challenging even x-ray crystallography – but it is very data intensive. Jean-Christophe Ducom, from The Scripps Institute, explains how to effectively surf the data wave
The Scripps Research Institute (TSRI) is one of the world’s largest independent, not-for-profit organisations focusing on biomedical research. As well as helping to lay the foundation for new and innovative ways to treat cancer, rheumatoid arthritis, hemophilia and other diseases, we have also been at the forefront of combatting infectious and deadly viruses, such as HIV, Ebola and Zika. We have approximately 2,700 employees on campuses in La Jolla, California, and Jupiter, Florida, with a roster of renowned scientists (including two Nobel Laureates) who collaborate on groundbreaking discoveries. Technology-driven research is a hallmark of the institute, as evidenced by our pioneering use of cryo-electron microscopy (Cryo-EM).
Cryo-EM, however, has pushed data handling requirements to new levels for a number of reasons
The significance of Cryo-EM is it enables scientists to look more closely at the inner workings of organelles (tiny structures that perform specific functions within a cell) and to study the structure of medically important proteins. The ability to observe molecular complexes in conditions similar to those found within cells is critically important as it provides a more complete description of molecular movements than previously possible. Because Cryo-EM delivers atomic-level, high-resolution 3D molecular models with unprecedented speed and accuracy, structures that once took years or even decades to fully understand now often can be elucidated in weeks. This has huge implications for understanding how molecules involved in disease might be targeted with drugs and vaccines – and that’s why labs around the world are racing to participate in the Cryo-EM “resolution revolution.” Our researchers continue to push the envelope in this area. For example, TSRI’s Lander lab is currently using Cryo-EM to shed light on treatments for Alzheimer’s, Parkinson’s, Lou Gehrig’s and Huntington’s diseases. For more than a decade, we have refined our high performance computing (HPC) and storage infrastructure at TSRI to handle growing amounts of data accruing from instrumentation such as microscopy and sequencing. Even without Cryo-EM, the data we are handling is continually increasing, as a result of developments such as improved instrumentation and sample prep, automated sample handling, the 24/7 production operation of instruments and automated data acquisition.
Cryo-EM, however, has pushed data handling requirements to new levels for a number of reasons. Firstly, to support our use of Cryo-EM, we developed new software – Leginon – to automate capture and analysis of massive amounts of data generated by the latest microscopes. This advanced data acquisition enables scientists to greatly increase the number of samples studied and images acquired for each sample, while reducing set up time. We have also developed a new, extensive and sophisticated processing pipeline called Appion to streamline image analysis, so that scientists could move quickly from raw data to 3D structures. Having the entire infrastructure in place enabled us to immediately conduct very high-end research with the new instrumentation while a lot of other institutions were just getting started. We simply plugged in the new instrumentation and ran with it.Finally, poised to break new scientific ground, we deployed state-of-the-art transmission electron microscopes (TEMs) from FEI Company. In addition to the powerful Titan Krios, we were the first organisation worldwide to deploy FEI’s Talos Arctica microscope. As a result of these developments, Cryo-EM has quickly become the biggest producer of data at TSRI, yielding four times more output than our genomics workloads. We’re collecting about 30 TBs of data each week, and obviously this major surge in data acquisition created an urgent need to extend our existing storage infrastructure – firstly, to simply manage the additional data and, secondly, to most effectively harness this data, so that research can be achieved at the highest possible level.
Cryo-EM technology is also driving significant research to discover new Ebola-fighting antibodies and to find treatments for other emerging diseases, such as the Zika virus
Keeping pace with the rapid influx in data proved troublesome as TSRI scientists were forced to archive existing data on several occasions so that they had sufficient space to continue running the microscopes. Using the archive in this fashion was a short-term fix, as researchers wanted a more robust solution for longer-term retention, while also streamlining access to archive data in order to facilitate re-analysis. Clearly it’s much easier to retrieve and reexamine older data than it is to collect entirely new images. A scalable archive was also critical to keeping pace with advances in image processing algorithms, which offers the possibility of revealing new insights through re-analysis of data from research performed earlier. We had originally incorporated high-performance storage from DataDirect Networks (DDN) to support our HPC environment five years ago, but it became clear that Cryo-EM needed its own storage to keep up with ever-increasing data demands.
After first expanding our parallel file system storage by 700TB, we began looking for a departmental solution that could be dedicated to Cryo-EM research. Team members wanted a system that could expand easily and cost effectively. Additionally, it was critical to equip researchers with an active archive – a simple and expedient way to archive older data to free up space, yet keep it readily available for later re-analysis when required. We decided on a combination of a parallel file system from DDN as well as its object storage appliance with about 2PB of capacity. The object storage platform met our active archive requirements. We could achieve cost savings by moving older project data from primary storage to the less expensive object storage platform while, crucially, maintaining accessibility for collaboration. Automated and transparent tiering between the two meant users didn’t need to know where files were stored, which was key. As a result, approximately 50 scientists across six research groups now have prompt access to robust storage to fuel Cryo-EM research and collaboration.Through being able to fully harness the Cryo-EM data, TSRI scientists are well positioned to lead major scientific breakthroughs. They routinely solve protein structures to the point where they can actually see how the atoms within it are positioned and interact with each other as well as other machinery within the cell. This allows them to design drugs or vaccines to combat a great swath of diseases. One recent TSRI study, for example, used Cryo-EM technology, featuring the Titan Krios in combination with a new generation of digital cameras, Gatan K2, to capture the structure of the HIV protein responsible for recognition and infection of host cells. The resulting images included a more complete depiction of the protein structure than ever seen before. Study findings also included a detailed map of a vulnerable site at the base of this protein, along with a binding site of an antibody that can neutralise HIV. This gives researchers a better idea of the most important factors to consider in the development of an HIV vaccine.
Cryo-EM technology is also driving significant research to discover new Ebola-fighting antibodies and to find treatments for other emerging diseases, such as the Zika virus. In years past, using X-ray crystallography it could take a year or more before researchers could look at the protein structure and develop antibodies. With Cryo-EM and the ability to collect and analyse data quickly, TSRI scientists can turn that discovery process around in a matter of weeks, which was unheard of previously. In TSRI’s Lander lab, scientists are using multi-scale, high-resolution 3D imaging to determine the precise neurological mechanisms involved in maintaining neuronal integrity. Insights into these molecular relationships that give rise to normal neuron function are proving essential to understanding disease progression.
Cryo-EM is a game-changer in the world of scientific research. By harnessing data that holds the secret to life-saving discoveries, we can accelerate time to discovery while ensuring that scientists have prompt access to decades of vital research. And through DDN, TSRI scientists based in California can share research with their counterparts in Florida and thousands of scientists around the world.
Author
Jean-Christophe Ducom, High-Performance Computing Manager, Scripps Research