Abstract: Many high-performance computing applications have become highly data
intensive due to the substantial increase of both simulation data
generated from scientific computing models and instrument data collected
from increasingly large-scale sensors and instruments. These applications
transfer large amounts of data between compute nodes and storage nodes,
which is a costly and bandwidth consuming process. The data movement often
dominates the applications run time. This study investigates a new Fusion
Active Storage System (FASS) to address the data movement bottleneck issue
specifically for write-intensive big data applications. The FASS enables a
paradigm that moves write-intensive computations to storage nodes,
generates and writes data in place to storage devices. It moves
computations to data and avoids the data movement bottleneck on the data
path. The FASS has an advantage of minimizing data movements and can have
an impact on big data applications.