Document Actions

Operating System Issues

by admin — last modified 2004-10-15 12:14

Operating System Issues Related to Scalability

Work at UNM is focused on low-level performance issues associated with communication and host operating systems. These activities support work at LANL on Clustermatic and OpenMPI (LA-MPI). Clustermatic and OpenMPI are the base-level infrastructure for LANL's ASC codes. Recent work at CCS-3 has shown that system software performance can have a large impact on ASC application performance. Our work is attempting to quantify these impacts and explore the design space of possible solutions to these performance problems in collaboration with researchers in CCS-1.

Scalability of TCP

Our primary focus in this work is to identify and address limits to scalability in these protocols. The need to manage connection state is perhaps the single biggest factor limiting the scalability of TCP. This is especially true when protocol processing is offloaded to a network interface with limited resources. We have developed and refined methods to determine the actual amount of system memory used per open TCP connection. Using these methods, we have demonstrated that memory usage will become a bottleneck to TCP performance in the future.

We are in the process of defining and implementing a connection-less TCP that allows the user (and eventually the system itself) to deactivate a socket when it is not being used. Deactivation removes the heavyweight socket and replaces it with a timewait structure that is nearly eight times smaller. This should enable us to support tens to hundreds of thousands of TCP connections, most of which are inactive; while a smaller, working set of active connections are fully instantiated. We will measure the costs associated with dynamic activation and deactivation of sockets. This involves defining the appropriate metrics (e.g., round-trip time, congestion window size, slow-start threshold) related to reactivation, cached metrics during deactivation, static metrics and shared metrics across connections (bundling). We are also exploring methods for automatically deactivating sockets when they are not part of the “working set'” for a process and automatically re-activating them as they are needed.

Application Impact of Fault-handling Placement

Message-passing systems such as MPI have to handle possible hardware faults in message passing, including (primarily) network packet losses, but also possibly including packet data corruption by either the network itself or the host hardware. Message-passing systems can handle such network faults either at a low-level, such as in kernel network protocols, or in user libraries and applications. Both approaches have their advantages, with low-level fault handling potentially offering lower overhead and high-level fault handling providing complete end-to-end reliability.

Preliminary studies at UNM have shown that the actual cost/benefit tradeoffs in these decisions are complex; the costs of allowing for user-level end-to-end reliability in LA-MPI/OpenMPI, for example, may be higher than initially expected even when this functionality is not needed. In addition, the performance benefits and reliability costs provided by fault handling in low-level networking protocols still need to be accurately quantified.

In FY2005, UNM will continue to work with Rich Graham in CCS-1 to study the impact of kernel-level and library-level fault handling in OpenMPI and LA-MPI. By quantifying the performance impact and potential reliability risks of fault-handling placement, we hope to aid LANL in improving both the reliability and performance of OpenMPI and LA-MPI in supporting ASC applications, as well as to help direct future networking and operating systems research at UNM.

Infiniband Testbed

Infiniband will be an important, if not the preferred, interconnect for future systems based on commodity components. As such, the UNM group is starting to look at the implications of Infiniband as an interconnect. The potential for very high bandwidth, 40 GB/second, in the near future is of particular interest. In the past, the introduction of 1 Gb/second Ethernet led to major changes in operating system structure, including: zero-copy structures and OS-bypass. We anticipate that the increase to 40 GB/second will lead to similar changes in OS structures. In this context, we are working with Ron Minnich from CCS-1 to design a networking testbed that will allow us to experiment with a network interface that has a very powerful processor.

LACSI at Rice University

Sections

Personal tools

Document Actions

Operating System Issues

Scalability of TCP

Application Impact of Fault-handling Placement

Infiniband Testbed