The technical challenges to building an exascale system are many. They include solving software problems to enable parallelism across what may be hundreds of thousands of compute cores; dealing with reliability and resiliency needs in an environment that will see ongoing core failures; and energy efficiency.
That last issue, energy efficiency, gets a lot of attention. For every megawatt of power, the annual cost is roughly $1 million. The 150-petaflop systems DOE has planned for 2017 will operate at about 10 MW.
The top researchers internationally acknowledge that there is competition to reach exascale, but there's also an understanding that software stack development is so complex that international cooperation is needed.
Although the Europeans are operating on a time frame that may be similar to the U.S., Japan had earlier announced a goal to reach exascale by 2020. But Akinori Yonezawa, deputy director at the Riken Advanced Institute for Computational Science, said, in an interview Tuesday, that the goal is to now build a 200- to 600-petaflop system by 2020, not an exascale system.
Last month, Riken selected Fujitsu to develop the basic design for this system.
In 2008, the first U.S. petascale system came from by IBM. If Moore's Law still applied to high performance computing, the U.S. should reach exascale by 2018. But it became clear early on that the technical issues were too great to meet that date.
Exascale won't necessarily be an easy thing to agree on.
An exascale system can be built today by just connection "a gazillion" GPUs at it, said IBM's Turek. "The question is what will work on? What will it support?" he said.
Today, the Linpack benchmark, which measures a system's floating point rate of execution, is widely used to determine capability and ranking on the Top 500 supercomputer list. But for an exascale system, Turek said, a more useful metric may be application performance: how much improvement is the system delivering for a real-world use.
Turek said the DOE systems IBM is building are a stepping stone to exascale. "It's a vehicle to mitigate risk, because we know there is a tremendous amount of learning and innovation that needs to take place," he said.
Sign up for Computerworld eNewsletters.