Yuan Zhao and Ken Kennedy (2006)
Dependence-based Code Generation for a CELL Processor
In: Proceedings of the 19th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2006), Springer-Verlag, Lecture Notes in Computer Science.
The CELL processor has attracted many interests from research community due to its high performance with special architecture features such as heterogeneous multi-core (PPE and SPEs), parallelism at both coarse (thread) and fine (vector) granularity, high data transfer bandwidth among cores and to memory, and explicit local scratch-pad memory control through DMA. However, obtaining high performance on CELL requires significant programming efforts also due to these architecture features. To remove the burden from ordinary users, previous research have developed an OpenMP compiler for CELL. In this paper, we present and evaluate a dependence-based compiler approach for automatically generating code for CELL from a single source program with no parallelism directives. Our code generation model is similar to that of the OpenMP: a loop nest is parallelized across PPE and SPEs and vectorized on each core accordingly, multi-buffering data buffers are created and DMA data transfers are generated automatically. However, compared to the OpenMP model, our approach can also handle the loop nests that carry dependences. To preserve the correct semantics due to dependences, we developed a barrier and a uni-directional synchronization using the on-chip communication mechanisms. We also developed strategies to improve the DMA data movement performance and the vector alignment performance by offloading computation on PPE and exploring memory reuse at the innermost loop. Our experimental evaluation demonstrated that the applicability of our approach, the reduction of the overhead of thread fork-join by parallelizing the whole loop nest that carry dependences, and the effectiveness of loop peeling on PPE for DMA data movement.