Specifications
Performance
In this section we compare the performance of the code our compiler generates from CAF with hand-coded MPI implementations of the MG, CG, BT and SP NAS parallel benchmark codes. For our study, we used MPI versions from the NPB 2.3 release. Sequential performance measurements used as a baseline were performed using the NPB 2.3-serial release. The NPB codes are widely regarded as useful for evaluating the performance of compilers on parallel systems.
For each benchmark, we compare the parallel efficiency of MPI and CAF implementations of each benchmark.
All experiments were performed on a cluster of 92 HP zx6000 workstations interconnected with Myrinet 2000. Each workstation node contains two 900MHz Intel Itanium 2 processors with 32KB/256KB/1.5MB of L1/L2/L3 cache, 4-8GB of RAM, and the HP zx1 chipset. Our operating environment is the GNU/Linux operating system (kernel version 2.4.20 plus patches). Although this Linux kernel is SMP-capable, we used only one of the processors on each SMP node for our experiments (1) to avoid contention for the Myrinet and local memory, and (2) to avoid process ping-ponging since our kernel was not configured to support affinity scheduling. We used the Intel Fortran v7.0 for Itanium (efc) as the back-end compiler for all F90 code generated by the CAF translator as well as for the MPI versions of the benchmarks. Optimization level 3 was used along with the override-limits option to prevent the compiler from automatically disabling certain expensive optimizations. CAF executables were linked against ARMCI 1.1-beta for Myrinet GM. All executables were linked against Myricom’s MPI implementation MPICH-GM 1.2.5..10 (compiled with Intel’s efc) running on Myricom’s GM 1.6.4 driver substrate.