Open MPI
OpenMPI is a community version of MPI. Each of the core contributors is
the developer of an existing production-quality implementation of the
Message Passing Interface (MPI) standard—FT-MPI (UTK), LA-MPI (LANL)
and LAM/MPI (IU)—which offer various approaches to data and process
fault tolerance in addition to high-performance communication.
The OpenMPI project is developing a highly configurable and extensible
runtime environment— or middleware—to support robust parallel
computation on systems ranging from small mission-critical and embedded
systems to future petascale supercomputers. OpenMPI has a
light-weight component architecture that allows for on the fly loading
of component modules and run-time selection of features (including
network device, OS, and resource management support), enabling the
middleware to be highly adaptable, both statically to accommodate a
wide variety of system types, and dynamically in response to rapidly
changing heterogeneous environments. The architecture also provides an
ideal framework for adding support for experimental or innovative
devices. Project OpenMPI's initial goal is to provide a framework
for a new high-quality implementation of MPI Version 2 with high levels
of communication performance, scalability to hundreds of thousands of
processes, and data and process fault tolerance. The first release of
open-MPI is scheduled for November 2004. OpenMPI was designed, however,
to be the foundation of more complete runtime environment than a simple
message-passing library. A central goal of OpenMPI is to enable
effective fault management (an essential requirement for scalable
computers). Middleware such as OpenMPI is uniquely positioned to
coordinate and broker the tasks of fault prediction, detection,
recovery and reconfiguration. We do not propose to provide a fully
automatic or “canned” solution to fault management, but rather to
provide a consistent and common APIs so that applications can discover,
characterize, and respond appropriately to faults.
The low-level communication layer of OpenMPI is designed with high-performance in mind, providing low latency, and scalable high bandwidth through the striping of message fragments across multiple network devices, with optional end-to-end data integrity through a lightweight checksum/retransmission protocol. The design is structured in such a way that all or part of the communication protocol may be offloaded to network-device processors on architectures where this is beneficial. Finally, OpenMPI is highly portable, conforming to ISO C and POSIX standards throughout. This enables us to target a variety of operating systems, including novel choices such as Plan 9 and realtime operating systems (RTOSs).
The low-level communication layer of OpenMPI is designed with high-performance in mind, providing low latency, and scalable high bandwidth through the striping of message fragments across multiple network devices, with optional end-to-end data integrity through a lightweight checksum/retransmission protocol. The design is structured in such a way that all or part of the communication protocol may be offloaded to network-device processors on architectures where this is beneficial. Finally, OpenMPI is highly portable, conforming to ISO C and POSIX standards throughout. This enables us to target a variety of operating systems, including novel choices such as Plan 9 and realtime operating systems (RTOSs).