Personal tools
You are here: Home Publications A Fault-Tolerant Communication Library for Grid Environments
Document Actions

Edgar Gabriel, Graham E Fagg, Antonin Bukovsky, Thara Angskun, and Jack Dongarra (2003)

A Fault-Tolerant Communication Library for Grid Environments

17th Annual ACM International Conference on Supercomputing (ICS 2003) International Workshop on Grid Computing and e-Science.

With increasing numbers of processors and applications running in virtual Grid environments, application level fault-tolerance is getting more of an important issue. This paper presents the semantics of a fault tolerant version of the Message Passing Interface, the de-facto standard for communication in scientific applications, which gives applications the possibility to recover from a node or link error and continue execution in a well defined way. The architecture of FT-MPI, an implementation of MPI using the semantics presented above as well as some tools supporting end-users during the application development step with FT-MPI are presented. Furthermore, a performance comparison of FT-MPI to the most relevant MPI-libraries for point-to-point benchmarks and the High Performance Linpack Benchmark, is shown.

by admin last modified 2007-12-10 21:05
« September 2010 »
Su Mo Tu We Th Fr Sa
1234
567891011
12131415161718
19202122232425
2627282930
 

Powered by Plone

LACSI Collaborators include:

Rice University LANL UH UNM UIUC UNC UTK