In this paper we describe the performance and scalability of some common collective communication patterns on the ASCI Q machine. Experimental results conducted on a 1024 node/4096 processor segment of this machine show that the network is fast and scalable. The network is able to barrier synchronize in a few tens of microseconds, perform a broadcast with an aggregate bandwidth of more than 100 GB/s and sustain heavy hot spot traffic with a limited performance degradation.
Keywords:
Cluster computing
Gb/sec and Tb/sec switching and routing technologies