Operating System Issues
Operating System Issues Related to Scalability
Work at UNM is focused on low-level performance issues associated with
communication and host operating systems. These activities support work
at LANL on Clustermatic and OpenMPI
(LA-MPI). Clustermatic and OpenMPI are the base-level infrastructure
for LANL's ASC codes. Recent work at CCS-3 has shown that system
software performance can have a large impact on ASC application
performance. Our work is attempting to quantify these impacts and
explore the design space of possible solutions to these performance
problems in collaboration with researchers in CCS-1.
We are in the process of defining and implementing a connection-less TCP that allows the user (and eventually the system itself) to deactivate a socket when it is not being used. Deactivation removes the heavyweight socket and replaces it with a timewait structure that is nearly eight times smaller. This should enable us to support tens to hundreds of thousands of TCP connections, most of which are inactive; while a smaller, working set of active connections are fully instantiated. We will measure the costs associated with dynamic activation and deactivation of sockets. This involves defining the appropriate metrics (e.g., round-trip time, congestion window size, slow-start threshold) related to reactivation, cached metrics during deactivation, static metrics and shared metrics across connections (bundling). We are also exploring methods for automatically deactivating sockets when they are not part of the “working set'” for a process and automatically re-activating them as they are needed.
Preliminary studies at UNM have shown that the actual cost/benefit tradeoffs in these decisions are complex; the costs of allowing for user-level end-to-end reliability in LA-MPI/OpenMPI, for example, may be higher than initially expected even when this functionality is not needed. In addition, the performance benefits and reliability costs provided by fault handling in low-level networking protocols still need to be accurately quantified.
In FY2005, UNM will continue to work with Rich Graham in CCS-1 to study the impact of kernel-level and library-level fault handling in OpenMPI and LA-MPI. By quantifying the performance impact and potential reliability risks of fault-handling placement, we hope to aid LANL in improving both the reliability and performance of OpenMPI and LA-MPI in supporting ASC applications, as well as to help direct future networking and operating systems research at UNM.
Scalability of TCP
Our primary focus in this work is to identify and address limits to scalability in these protocols. The need to manage connection state is perhaps the single biggest factor limiting the scalability of TCP. This is especially true when protocol processing is offloaded to a network interface with limited resources. We have developed and refined methods to determine the actual amount of system memory used per open TCP connection. Using these methods, we have demonstrated that memory usage will become a bottleneck to TCP performance in the future.We are in the process of defining and implementing a connection-less TCP that allows the user (and eventually the system itself) to deactivate a socket when it is not being used. Deactivation removes the heavyweight socket and replaces it with a timewait structure that is nearly eight times smaller. This should enable us to support tens to hundreds of thousands of TCP connections, most of which are inactive; while a smaller, working set of active connections are fully instantiated. We will measure the costs associated with dynamic activation and deactivation of sockets. This involves defining the appropriate metrics (e.g., round-trip time, congestion window size, slow-start threshold) related to reactivation, cached metrics during deactivation, static metrics and shared metrics across connections (bundling). We are also exploring methods for automatically deactivating sockets when they are not part of the “working set'” for a process and automatically re-activating them as they are needed.
Application Impact of Fault-handling Placement
Message-passing systems such as MPI have to handle possible hardware faults in message passing, including (primarily) network packet losses, but also possibly including packet data corruption by either the network itself or the host hardware. Message-passing systems can handle such network faults either at a low-level, such as in kernel network protocols, or in user libraries and applications. Both approaches have their advantages, with low-level fault handling potentially offering lower overhead and high-level fault handling providing complete end-to-end reliability.Preliminary studies at UNM have shown that the actual cost/benefit tradeoffs in these decisions are complex; the costs of allowing for user-level end-to-end reliability in LA-MPI/OpenMPI, for example, may be higher than initially expected even when this functionality is not needed. In addition, the performance benefits and reliability costs provided by fault handling in low-level networking protocols still need to be accurately quantified.
In FY2005, UNM will continue to work with Rich Graham in CCS-1 to study the impact of kernel-level and library-level fault handling in OpenMPI and LA-MPI. By quantifying the performance impact and potential reliability risks of fault-handling placement, we hope to aid LANL in improving both the reliability and performance of OpenMPI and LA-MPI in supporting ASC applications, as well as to help direct future networking and operating systems research at UNM.