Document Actions

Clustermatic Performance Instrumentation

by admin — last modified 2004-10-15 12:14

Application performance engineering requires a performance instrumentation and analysis infrastructure that is robust and scalable while providing the analysis capabilities needed by developers of both system and application code. There is a demand for instrumentation of processor performance within an application process and across all code running on a node. While this may be sufficient for measuring compute-bound applications, operating system operations play a vital role in the performance of applications that communicate with the outside world, including messaging in parallel applications and any interaction with I/O subsystems. Measuring these costs will require adding instrumentation to the operating system kernel and to the specific device drivers that contribute to the costs.

It will be crucial that the performance instrumentation infrastructure be scalable and impose low enough overheads that it can be used to measure production runs on large systems.

Performance Counter Profiling

HPCToolkit from Rice runs on current Clustermatic systems by layering itself on top of PAPI from Tennessee. One problem with this approach is that it allows one to look at internal performance of an application, but it does not provide a system-wide view that captures all phenomena relevant to performance. We will investigate adding such pervasive performance monitoring and analysis infrastructure into Clustermatic systems. The approach taken will be to begin with the oprofil software and extend and modify it to work on Clustermatic systems. This work is coupled to the activities described in Section, “Application and System Performance.”

Fine-grained Monitoring of System Software Costs

General operating system costs are becoming increasing impediments to ASC application performance on large-scale machines. Recent studies at CCS-3, for example, have shown that operating system effects in the SAGE ASC code can cause up to a 50% performance penalty on large-scale systems such as ASCI-Q. The current solution to this problem is to dedicate approximately 12.5% of ASCI-Q to operating system services to OS interference issues. While this approach does mitigate the problem, it comes at making a non-trivial portion of the ASCI-Q system unavailable to applications.

To address these issues, UNM is working on novel approaches to measuring operating system and message-passing costs in large-scale systems that are central to ASC's mission. The first part of this research consists of modifying the Linux kernel to monitor and report the operating system costs associated with each network transmission and reception on a per-request basis. By augmenting the Linux kernel with per-request monitoring facilities, we aim to quantify the exact operating system costs that cause operating system interference performance effects similar to those measured at CCS-3. These per-request monitoring facilities then can be used to guide later modifications of Linux to increase message-passing performance, and integrated with more comprehensive system monitoring facilities.

One such comprehensive system monitoring approach is the focus of UNM’s other LACSI research on monitoring. This approach, which we term message-centric monitoring, seeks to extend this approach to the entire system and to measure the complete hardware, operating system, and communication costs associated with ASC message-passing codes. Instead of examining the performance of individual requests on a host-by-host basis, our message-centric profiling approach associates performance data with the data in MPI messages as this data propagates across the system, is received by one host, processed by an application, and sent to another host. The overall goal is to have the performance data associated with a message encompass the entire diameter of the computation required to generate it, with the emphasis on profiling message-passing and operating system costs.

LACSI at Rice University

Sections

Personal tools

Document Actions

Clustermatic Performance Instrumentation

Performance Counter Profiling

Fine-grained Monitoring of System Software Costs