The journey to the exascale
31 Mar 2015 by Evoluted New Media
So much of modern science has become reliant on the correct processing and analysis of big data – but how have we arrived at the point where computing power is so important? Dave Fellinger, Chief Scientist at DDN, takes us on a whistle stop tour of high performance scientific computing
So much of modern science has become reliant on the correct processing and analysis of big data – but how have we arrived at the point where computing power is so important? Dave Fellinger, Chief Scientist at DDN, takes us on a whistle stop tour of high performance scientific computing
In the 1960’s British music culture dominated the globe; but for the rapidly emerging technology aficionados of this period, The Stones and The Beatles were secondary to the progressive work taking place at the University of Manchester.
In the early 1960’s, the now defunct UK electronic engineering firm, Ferranti, collaborated with the University of Manchester to develop Atlas, a computer that was capable of processing speeds near to one microsecond per instruction. It was the most powerful computer in the world and was recognised, by some, as the world’s first supercomputer. According to author and Computing Historian Professor Simon Lavington “…when Atlas went offline about half the UK’s entire computing capacity was lost.”
The University of Manchester continued its path of computing innovation and two other Atlas machines were developed – one for British Petroleum and one for the Atlas Computer Laboratory; the latter now the home of the UK’s Science Technology Facilities Council and several European Space agency research centres – still a hot bed of supercomputing.
A decade or so earlier, on the other side of the Atlantic, the now proclaimed ‘father of supercomputing’, Seymour Cray started his professional career with the US Navy funded company Engineering Research Associates (ERA). Following the acquisition and dissolution of ERA at the end of the 1950’s Cray joined Control Data Corporation (CDC).
Cray was responsible for the design, build and configuration of the CDC 6600 in 1964, the computer that claimed the crown from Atlas as being the world’s fastest – and by a country mile. Throughout the 1960’s and into the early 1970’s CDC management provided direction to Seymour Cray to build machines targeted at business and commercial data processing for ‘average’ customers. This edict ran counter to the ideas of the man that wanted to build the fastest computers in the world and Seymour Cray left CDC to form Cray Research in 1972.
Throughout the 1960s through to the early eighties, the supercomputing space was dominated by proprietary systems that were only available to organisations with the deepest of pockets. The user landscape had changed little in the twenty or so years since the work at the University of Manchester.
But in the late eighties and early nineties, the evolution of the supercomputer took a major step change, driven – in part - by enterprise / commercial computing.
The technology that dominated the commercial computing space was the mainframe, which had begun its life around the same time as the supercomputer but designed for high transactional purposes. Organisations started to ‘revolt’ against the mainframe – prices were high and end users felt a few manufacturers of proprietary technology were holding them hostage.
Across the board there was a push for more ‘open’ or ‘semi-proprietary’ technologies – and the microprocessor was, in part, a major impetus for that change. The introduction of the microprocessor in the 1980’s collapsed the price of compute capability. And, following Moores Law, as the microprocessor doubled in core count, so did the entire ecosystem around it double. Organisations started to want more, but for less.
The introduction of the microprocessor enabled supercomputers to now be built with widely available technology at the core. So, the world of supercomputers moved into a semi-proprietary space built around microprocessors, but the operation system and system buses still remained relatively closed technologies.
There are two ways you can make computers go faster. The traditional way has been a Symmetric Multi Processing (SMP) approach. In the SMP approach, a single copy of the operating system directs multiple processors, all of which share the same memory and data bus. This allows the system to shift tasks between the processors to better manage workloads – but the path to the memory is restricted, which could in effect create a bottleneck.
The other way is the massively parallel processing approach, which allocates an operating system and memory to each individual processor and each individual processor is then interconnected to allow massively quick exchange of data.
One of the early implementations of Massively Parallel Processing (MPP) was the Goodyear Massively Parallel Processor supercomputer built by Goodyear Aerospace in 1983, for NASA’s first space flight research lab – the Goddard Space Flight Center. The supercomputer delivered vast computational performance at a fraction of the cost of other supercomputers of the time.
MPP allows organisations to connect multiple servers together and leverage commercially available and commodity technologies – the result is, costs of systems start to come down significantly.
The introduction of microprocessor technology, the ‘breaking down’ of the mainframe and the MPP approach made supercomputing technologies available to a much larger pool of organisations – albeit still dominated by government labs, academic institutes and oil and gas organisations.
As the cost of compute continued to fall, more and more organisations wanted to ask more questions of their data and run analysis and simulations on it.
Ken Batcher, an emeritus professor of Computer Science at the Kent State Univerisity – and once an architect at the aforementioned Goodyear Aerospace centre – was at the heart of parallel computing throughout the 1980’s to present. He defined a supercomputer as “…a device for turning compute bound problems into I/O bound problems. “
As processing environments continued to grow unabated and processor performance followed a similar path, organisations were starting to suffer from a bottleneck in their storage infrastructure. The lack of relative performance in storage technology compared to the compute performance was slowing organisations’ ability to fully benefit from their HPC infrastructures.
Shortly after the Goodyear MPP project, DataDirect Netorks’ (DDN) CEO, Alex Bouzari, and President, Paul Bloch, founded Mega Drive Systems. We developed a highly parallel architecture, lots of pipes going to the server, lots of pipes going to the disks, and we aggregate the performance of the disks and present them back to the server. Besides improving the storage architecture we started to look at the overall system architecture and the busses that usually connect a storage device with a related server.
Understanding that the highest performance bus is really no bus at all, we designed a virtual environment where the server and even a data reduction process can be embedded and operated in the same effective memory space as the storage device. Why download raw data and process it externally when the process itself can be uploaded and the data can be served in reduced form directly from the storage device? We branded this architecture “Storage Fusion Architecture” and it is being used to serve and process data in many diverse environments.
You will see from the infographic just how quickly data is growing. The promise of Exascale is immense. Sequencing an entire genome once took a significant percentage of the scientific resources of multiple countries, yet now it is common practice across hundreds of research and commercial entities around the world. And we can easily see the day coming very soon when personalised medicine will be a reality and patients’ sequence data will be the basis of custom therapies used to treat them. That has to be one of the main aims of supercomputing, making more effective treatments more widely available.
More machines, with more power and speed started to come into the reach of more individuals and organisations. Manufacturing companies, financial organisations, genetic organisations like Wellcome Trust Sanger Institute outlined above, were starting to employ supercomputing technologies with increasing regularity.Between 2000-2010 significant progress was made in parellisation of both compute and storage. But it was during this period that industry started to witness the commoditisation of the compute. Supercomputing was beginning to move beyond the confines of the scientists and ‘propeller heads’ into the direction of mainstream computing.
The democratisation of supercomputing was well underway. Entry prices for supercomputing had fallen dramatically from $100m in the 1960’s, to approximately $5m in the 1980’s to sub $1m today.
The price collapse of technology, commodisation and new technology developments have created a new high performance computing league, one that is no longer the privy of the few. The ‘new’ kids on the block are data hungry and we are now in the era of big data – organisations have lots of it, different types of it, structured and unstructured and located in a plethora of places. And, it’s getting bigger.
Now, whether you are running commercial HPC applications like genomic sequencing, financial backtesting, seismic processing or even big data analytics, the growing need to increase I/O performance and remove latency should be, and probably is, already at the top of the priority list as you determine how to architect your own environment.
To that end, right now there’s a ton of intensive planning underway as the Top 10 or 20 supercomputing labs gear up for Exascale over the next decade. It is an awesome undertaking to support all of the I/O performance that needs be generated, as they move from a dozen threads per node to 1,000; and from 13,000 cores to 57 million cores - that’s a 4,000x increase. Doing the math, that will be about a billion concurrent threads to support.
Your environment is only as fast as your slowest component. Each of these component’s barriers are exactly where scientists and developers are targeting their prodigal smarts to breakthrough every limitation in the ecosystem between here and Exascale.
So, whether you’re looking to deploy a greenfield HPC/Big Data application, architecting for Exascale or anywhere in between, there’s a new paradigm of caching emerging that provides a much more efficient alternative to provisioning I/O performance over that of spinning disk.Parallel file systems, storage and provisioning I/O performance via the traditional constructs of spinning disk have been among the most formidable challenges to overcome. The challenge is creating a new construct where we allow the breakneck pace of compute development to proliferate without that pesky Moore’s Law constricting storage and governing our next big leap forward.
What if software-defined storage could perform all of the heavy I/O lifting while leveraging parallel file systems and spinning disk behind for a persistent storage layer? Unlike alternate flash approaches, it wouldn’t require you to throw more hardware at the problem. At DDN, we address this with our Infinite Memory Engine (IME) technology - IME moves data much closer to the compute to avoid traversing multiple layers of protocols and networking. Other vendors might refer to such technology as burst buffer technology.
The days of buying lower capacity drives (or “short stroking” them) are gone, as performance is handled in the memory tier. This long-awaited decoupling of performance and capacity enables you to now eliminate the overprovisioning of compute and storage resources just for peak bandwidth needs. If your domain is compute, how would you like to reclaim 30% of your processing that was previously occupied with storage-centric tasks and waiting for slower spinning disk to perform reads and writes? That’s huge ROI in and of itself.
When you’re able to virtualise disparate flash resources (and have the flexibility to have it be resident in compute nodes or I/O nodes) into a single pool of really fast in memory storage, all of a sudden spinning disk storage array buying criteria becomes all about having the most efficient capacity density. Intelligent algorithms can optimise aligned writes to spinning disk as cache flushes, significantly lowering the performance requirements of your spinning array.
If you’re in charge of the storage budget, the returns are even greater - imagine using 70-75% less rack space and power, as you need far fewer components such as controllers, drives, and routers etc. You’ll also be maximising every rack unit in your storage array – as they can now be populated with the largest capacity drives available.
By eliminating the need to build disk-full, server-full storage architectures to support bursty I/O, you can dramatically reduce storage requirements and liberate your compute to do the processing it was actually purchased for.
The author: Dave Fellinger is Chief Scientist, Strategy & Technology at DataDirect Networks. He serves on the board of the iRODS Consortium and the External Advisory Board of the DataNet Federation Consortium.