Personal tools
You are here: Home Publications Monitoring Large Systems via Statistical Sampling
Document Actions

Celso L Mendes and Daniel A Reed (2004)

Monitoring Large Systems via Statistical Sampling

The International Journal of High Performance Computing Applications, Volume 18(2):pp.267-277.

As the trend in parallel systems scales toward petaflop performance tapped by advances in circuit density and by an increasingly available computational Grid, the development of efficient mechanisms for monitoring large systems becomes imperative. When computational components are coupled via dynamically shifting connections with various remote resources, the number of potential factors affecting system behavior is enormous. Yet the overhead of monitoring can be prohibitive. In this paper we present a new technique for monitoring large systems based on statistical sampling. Rather than monitoring each component, we select a statistically valid sample and measure the behavior of sample members. We describe the formal requirements of sample selection and verify the feasibility of our approach with experiments on large parallel systems and wide-area networks. Our results show that this technique can be a powerful tool to enable effective monitoring without incurring the large costs typically associated to exhaustive checking.

by admin last modified 2007-12-10 21:05
« September 2010 »
Su Mo Tu We Th Fr Sa
1234
567891011
12131415161718
19202122232425
2627282930
 

Powered by Plone

LACSI Collaborators include:

Rice University LANL UH UNM UIUC UNC UTK