U.S. Scientists Introduce Tougher Benchmarks To Gain Back The Lost Title
China’s super computer Tianhe-1A revealed last month at HPC 2010 China, has set a new performance record of 2.507 petaflops, as measured by the LINPACK benchmark, making it the fastest system in China and in the world today. Tianhe-1A surpassed the U.S. ‘Jaguar’ made by Cray which held the first place until June this year. The China’s National University of Defence technology at Changsha developed the Tianhe-1A utilizing ‘heterogeneous computing’. This modern architecture couples massively parallel Graphics Processing Units (GPU) with multi-core Central Processing Units (CPU) to improve performance, reduce size and power consumption. The system uses 7,168 Nvidia Tesla M2050 GPUs and 14,336 CPUs; According to GPU manufacturer Nvidia, it would require more than 50,000 CPUs and twice as much floor space to deliver the same performance using CPUs alone. “The performance and efficiency of Tianhe-1A was simply not possible without GPUs,” said Guangming Liu, chief of National Supercomputer Center in Tianjin. “The scientific research that is now possible with a system of this scale is almost without limits.”
Top Rank – Top500 HPC List
#1 Tianhe-1A China – National Supercomputing Center in Tianjin China
#2 Jaguar – Cray/U.S.A. – DOE/SC/Oak Ridge National Laboratory
#3 Nebula 2.0 China – National Supercomputing Centre in Shenzhen (NSCS) China
#4 TSUBAME 2.0 – GSIC Center, Tokyo Institute of Technology Japan
#5 Hopper – Cray/U.S.A. – DOE/SC/LBNL/NERSC
#6 Tera-100 – Commissariat a l’Energie Atomique (CEA) France
#7 Roadrunner – DOE/NNSA/LANL
#8 Kraken XT5 Cray/U.S.A. – National Institute for Computational Sciences/University of Tennessee
#9 JUGENE – Forschungszentrum Juelich (FZJ) Germany
Despite being an important tool for the development of nuclear weapons, super computers have become an essential tool for commercial applications, including for medical drugs research, industrial, logistics and financial applications. However, the most powerful machines are still employed by government establishments, for advanced research, energy and defense applications.
The Chinese achievement is impressive, but U.S. expert argue that speed alone is not sufficient to rate best in High Performance Computing. Data intensive supercomputer applications are increasingly important HPC workloads, but are ill suited for platforms designed for 3D physics simulations. Current benchmarks and performance metrics do not provide useful information on the suitability of supercomputing systems for data intensive applications.
To gain back their leading position, U.S. scientists have changing the rules, introducing more challenging tasks as benchmarks. “Some, whose supercomputers placed very highly on simpler tests like the Linpack, also tested them on the Graph500, but decided not to submit results because their machines would shine much less brightly,” said Sandia computer scientist Richard Murphy, a lead researcher in creating and maintaining the test. The new Graph500 benchmark was developed by a team led by researchers from Sandia Labs, the Georgia Institute of Technology, University of Illinois at Urbana-Champaign, and Indiana University. Sofar nine supercomputers have been tested, validated and ranked by the new “Graph500” challenge, first introduced last week by an international team led by Sandia National Laboratories. The machines were tested for their ability to solve complex problems involving random-appearing graphs, rather than for their speed in solving a basic numerical problem, today’s popular method for ranking top systems.
Complex problems involving huge numbers of related data points are found in the medical world where large numbers of medical entries must be correlated, in the analysis of social networks with their huge numbers of electronically related participants, or in international security where numerous containers on ships roaming the world and their ports of call must be tracked.
Such problems are solved by creating large, complex graphs with vertices that represent the data points — say, people on Facebook – and edges that represent relations between the data points – say, friends on Facebook. These problems stress the ability of computing systems to store and communicate large amounts of data in irregular, fast-changing communication patterns, rather than the ability to perform many arithmetic operations. The Graph500 benchmarks are indicative of the ability of supercomputers to handle such complex problems. The new team has developed specific benchmarks to address three application kernels: concurrent search, optimization (single source shortest path), and edge-oriented (maximal independent set). Five additional graph-related business areas are being addressed, including Cybersecurity, Medical Informatics, Data Enrichment, Social Networks, and Symbolic Networks.
The Graph500 benchmarks present problems in different input sizes. These are described as huge, large, medium, small, mini and toy. No machine proved capable of handling problems in the huge or large categories. “I consider that a success,” Murphy said. “We posed a really hard challenge and I think people are going to have to work to do ‘large’ or ‘huge’ problems in the available time.” More memory, he said, might help.
The abbreviations “GE/s” and “ME/s” represented in the table below describe each machine’s capabilities in giga-edges per second and mega-edges per second — a billion and million edges traversed in a second, respectively. Competitors were ranked first by the size of the problem attempted and then by edges per second.
Top ranks – Graph500 HPC Challenge:
#1Intrepid, Argonne National Laboratory – 6.6 GE/s on scale 36 (Medium)
#2Franklin, National Energy Research Scientific Computing Center – 5.22 GE/s on Scale 32 (Small)
#3cougarxmt, Pacific Northwest National Laboratory – 1.22 GE/s on Scale 29 (Mini)
#4graphstorm, Sandia National Laboratories’ – 1.17 GE/s on Scale 29 (Mini)
#5Endeavor, Intel Corporation, 533 ME/s on Scale 29 (Mini)
#6Erdos, Oak Ridge National Laboratory – 50.5 ME/s on Scale 29 (Mini)
#7Red Sky, Sandia National Laboratories – 477.5 ME/s on Scale 28 (Toy++)
#8Jaguar, Oak Ridge National Laboratory – 800 ME/s on Scale 27 (Toy+)
#9Endeavor, Intel Corporation – 615.8 ME/s on Scale 26 (Toy)
Interpid is the most pwoerful HPC machine at Argonne’s Argonne Leadership Computing Facility (ALCF). This IBM Blue Gene/P machine that is capable of more than 500 trillion calculations a second. In 2012 the facility will get an even more powerful computer, a 10-petaFLOPS IBM Blue Gene/Q supercomputer called Mira. It will be being 20 times faster than Interpid, running programs at 10 quadrillion calculations a second.