pith. sign in

arxiv: 1504.06836 · v1 · pith:IQDORFWNnew · submitted 2015-04-26 · 💻 cs.DC · cs.OS

Monitoring Extreme-scale Lustre Toolkit

classification 💻 cs.DC cs.OS
keywords lustremonitoringperformanceanalysisclientsdiscussextreme-scalefilesystems
0
0 comments X
read the original abstract

We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in- depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.