Tutorial #2
Tutorial type: half-day
Title: Introduction to Programming High Performance Applications on the CELL Broadband Engine.
Authors: Jakub Kurzak, Alfredo Buttari, University of Tennessee at Knoxville
Description: Programming the STI CELL processor is about successfully exploiting
its potential for delivering very high performance. The purpose of this tutorial is to give the programmer practical guidelines for
achieving this goal. We begin by a brief overview of the main CELL
architectural features and its software development environment. Then
we discuss three basic aspects of CELL programming: SPE SIMD kernel
development (vectorization), SPE parallelization and intra-chip
communication. We show how high performance SPE kernels are created
by replacing scalar operations with vector ones, heavily unrolling
loops, and exploiting dual-issue nature of the SPE architecture. We
explain coding using SIMD C language extensions (intrinsics), as well
as using assembly language and discuss aspects specific to code
development in assembly. We present static performance analysis using
the spu-timing tool. The presentation of intra-chip communication
follows, with emphasis on DMA communication both for bulk data
transfers as well as for synchronization. We discuss message size and
alignment restrictions, enforcing of message ordering using barrier
and fence mechanisms and creation of complex data transfers using DMA
lists. We conclude the topic with guidelines on implementing
pipelined processing with direct local store to local store
communication. We discuss basic profiling techniques using the SPE
decrementer. We conclude with a set of practical tips and tricks and
a list of "gotchas" or common rookie mistakes. A brief overview of
academic and commercial CELL programming packages follows, and a
discussion of a real life example - scanning network traffic using
DFA-based string matching. The tutorial ends with a presentation of
techniques for programming multi-CELL systems using message passing
with MPI.
Bios:
Jakub Kurzak received the MSc degree in electrical and computer
engineering from Wroclaw University of Technology, Poland, and the PhD
degree in computer science from the University of Houston. He is a
research associate in the Innovative Computing Laboratory in the
Computer Science Department at the University of Tennessee,
Knoxville. His research interests include parallel algorithms,
specifically in the area of numerical linear algebra, and also
parallel programming models and performance optimization for parallel
architectures spanning distributed and shared memory systems, as well
as next generation multi-core and many-core processors.
Alfredo Buttari received the MSc degree in computer science and the
PhD degree in computer science and control engineering from the
University of Rome, Italy. He is a research associate in the
Innovative Computing Laboratory in the Computer Science Department at
the University of Tennessee, Knoxville. His research interests include
numerical linear algebra, dense and sparse methods, direct and
iterative solvers, parallel algorithms and performance optimization
for parallel architectures including next generation multi-core and
many-core processors. |