Efficient optical interconnect architecture for HPC and data center systems

Efficient optical interconnect architecture for HPC and data center systems Summary form only given. Modern large-scale computer systems require interconnection networks that are both high-performance and energy efficient. Most large-scale systems are made from a large number of “nodes”, where each node has one or more computing elements and a block of main memory, and is connected to a network that interconnects all of the nodes. There are several factors influencing the evolution of systems that has lead to this configuration: (1) as these components are all based on consumer products, the costs have become very low, as compared to customized components, (2) many problems require very large amounts of main memory (for efficient execution), and the best way to have sufficient memory capacity is to use many smaller discrete blocks of memory, as opposed to one or a few large blocks of memory, and (3) this arrangement of distributednodes offers opportunities for maximum levels of parallel computation. The Top500 list [Top500] lists high-performance computer (HPC) systems ranked by achieved performance on a standard benchmark program. On this list, the number of nodes per system ranges from 44 to 98,000. Provisioning an efficient, high-performance network interconnecting these nodes is critical for the effective operation of such a system. Also, the “Exascale Report” [Kogg08] highlights the impacts of network power requirements in a “strawman” execscale system design. In this design, the interconnect portion of the proposed system consumes 27% of the total projected system power. In multi-nodesystems, there are two classes of interconnection networks: an intra-node network connecting the sockets and cores on a single node, and an inter-node network connecting the tens to thousands of nodes that comprise the system. The Oracle macrochip [Krish09] is a design for a node that uses a Silicon Photonics intra-node network to connect the 64 sites on a node. The next step in exploiting this macrochip is a system based on interconnec- ing multiple macrochips. Both this intranode network and the proposed inter-node network use optical communication to achieve the energy-efficient, high-performance operation that will be required in future HPC systems. This talk summarizes some of the lessons learned in designing and analyzing the macrochip [Koka12] ; it also discusses some of the issues that are emerging in the design of the multi-macrochip systems of the future.