Personal tools
You are here: Home Publications Self-Healing Network for Scalable Fault Tolerant Runtime Environments
Document Actions

Thara Angskun, Graham E Fagg, George Bosilca, Jelena Pjesivac-Grbovic, and Jack J Dongarra (2006)

Self-Healing Network for Scalable Fault Tolerant Runtime Environments

In: DAPSYS 2006 Conference Proceedings, Innsbruck, Austria, DAPSYS 6th Austrian-Hungarian Workshop on Distributed and Parallel SystemsS 2006 Conference, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems.

Scalable and fault tolerant runtime environments are needed to support and adapt to the underlying libraries and hardware which require a high degree of scalability in dynamic large-scale environments. This paper presents a self-healing network (SHN) for supporting scalable and fault-tolerant runtime environments. The SHN is designed to support transmission of messages across multiple nodes while also protecting against recursive node and process failures. It will automatically recover itself after a failure occurs. SHN is implemented on top of a scalable fault-tolerant protocol (SFTP). The experimental results show that both the latest multicast and broadcast routing algorithms used in SHN are faster than the original SFTP routing algorithms.

by admin last modified 2007-12-10 21:05
« October 2010 »
Su Mo Tu We Th Fr Sa
12
3456789
10111213141516
17181920212223
24252627282930
31
 

Powered by Plone

LACSI Collaborators include:

Rice University LANL UH UNM UIUC UNC UTK