pith. sign in

arxiv: 1508.01847 · v3 · pith:J3XFUB25new · submitted 2015-08-08 · 💻 cs.DC

Big Data Analytics on Traditional HPC Infrastructure Using Two-Level Storage

classification 💻 cs.DC
keywords storagetwo-levelcomputingsystemorangefsclustersdata-intensivefile
0
0 comments X
read the original abstract

Data-intensive computing has become one of the major workloads on traditional high-performance computing (HPC) clusters. Currently, deploying data-intensive computing software framework on HPC clusters still faces performance and scalability issues. In this paper, we develop a new two-level storage system by integrating Tachyon, an in-memory file system with OrangeFS, a parallel file system. We model the I/O throughputs of four storage structures: HDFS, OrangeFS, Tachyon and two-level storage. We conduct computational experiments to characterize I/O throughput behavior of two-level storage and compare its performance to that of HDFS and OrangeFS, using TeraSort benchmark. Theoretical models and experimental tests both show that the two-level storage system can increase the aggregate I/O throughputs. This work lays a solid foundation for future work in designing and building HPC systems that can provide a better support on I/O intensive workloads with preserving existing computing resources.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.