The official website of ROCCI cyberinfrastructure project.
Project Title: CSSI: Elements: ROCCI: Integrated Cyberinfrastructure for In Situ Lossy Compression Optimization Based on Post Hoc Analysis Requirements
PIs: Sheng Di, Franck Cappello (UChicago), Dingwen Tao (IU)
Awards: 2104023 (UChicago, $320K), 2247080 (IU, $280K)
Project Duration: 9/15/2021 - 8/31/2024
Today’s simulations and advanced instruments are producing vast volumes of data, presenting a major storage and I/O burden for scientists. Error-bounded lossy compressors, which can significantly reduce the data volume while controlling data distortion with a constant error bound, have been developed for years. However, a significant gap still remains in practice. On the one hand, the impact of the compression errors on scientific research is yet not well understood, so that how to set an appropriate error bound for lossy compression is very challenging to scientists. On the other hand, how to select the bestfit compression technology and run it automatically in scientific application codes is non-trivial because of pros and cons of different compression techniques and diverse characteristics of applications and datasets. This project aims to develop a Requirement-Oriented Compression Cyber-Infrastructure (ROCCI) for data-intensive domains such as astrophysics and materials science, which can select and run the bestfit lossy compressor automatically at runtime, in terms of user’s requirement on their post hoc analysis.
The overarching goal of this project is to offer a complete series of automatic functions/services allowing users transparently running the bestfit compressor at runtime during the scientific simulations or data acquisition. This project advances knowledge and understanding with three key thrusts. (1) It builds an efficient layer to interoperate with different lossy compressors and diverse post hoc analysis requirements on data fidelity by leveraging existing compression adaptor library - LibPressio and compression assessment library - Z-checker. (2) It develops an efficient engine to determine the bestfit compressor with optimized settings based on user’s post-hoc analysis requirements. (3) It develops a user-friendly infrastructure that integrates compression optimization and execution via the HDF5 dynamic filter mechanism. This project particularly targets cosmology and materials science applications and their specific requirements of using lossy compressors in practice.
Sheng Di (University of Chicago and Argonne National Lab): UChicago PI
Sheng Di is a computer scientist at Argonne National Laboratory, USA. He is an IEEE senior member. He is a scientist at Large through the Consortium for Advanced Science and Engineering (CASE) at the University of Chicago. He is an institute fellow of Northwestern-Argonne Institute of Science and Engineering (NAISE). He has published 100+ refereed journal and conference papers, including TPDS, TC, TCC, TKDE, JPDC, IJHPCA, PPoPP, SC, IPDPS, HPDC, PACT, MSST, DSN, ICPP, CLUSTER, BigData, IWQoS, CCGrid, HiPC, Grid, CLOUD, UCC, EuroPar, ICPADS, Europar, etc. He is the receipient of DOE 2021 Early Career Research Program Award Winner, 2018 IEEE-Chicago Distinguished Mentoring Award, and 2019 IEEE-Chicago Distinguished R&D Award.
Franck Cappello (University of Chicago and Argonne National Lab): UChicago Co-PI
Franck Cappello is the director of the Joint-Laboratory on Extreme Scale Computing gathering six of the leading HPC institutions in the world: ANL, NCSA, Inria, BSC, JSC, and Riken. He is a senior computer scientist at Argonne National Laboratory and an adjunct associate professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. He is an expert in resilience and fault tolerance for scientific computing and data analytics. Recently he started investigating lossy compression for scientific data sets to respond to the pressing needs of scientist performing large-scale simulations and experiments. His contribution to this domain is one of the best lossy compressors for scientific data set respecting user-set error bounds. He is a member of the editorial board of the IEEE TPDS and of the ACM HPDC and IEEE CCGRID steering committees. He is a fellow of the IEEE.
Dingwen Tao (Indiana University): WSU PI
Dingwen Tao is an associate professor in the Luddy School of Informatics, Computing, and Engineering at Indiana University Bloomington. He has published in the top-tier HPC and big data conferences and journals, including SC, ICS, HPDC, PPoPP, PACT, IPDPS, CLUSTER, DAC, BigData, ICPP, MSST, TPDS, TC, JPDC, IJHPCA, etc. He is the recipient of the R&D100 Awards Winner (2021), IEEE Computer Society TCHPC Early Career Researchers Award for Excellence in High Performance Computing (2020), NSF CISE Research Initiation Initiative (CRII) Award (2020), IEEE CLUSTER Best Paper Award (2018), and UCR Dissertation Year Program (DYP) Award (2017).
Robert Underwood (Argonne National Lab): Development of Compressor Adapter
Xiaodong Yu (Argonne National Lab): Development of Compressor Assessment Engine
Junjing Deng (Argonne National Lab): Evaluation on Synchrotron Coherent X-ray Experiments
Zarija Lukic (Lawrence Berkeley National Lab): Evaluation on Cosmological Simulations
Suren Byna (Lawrence Berkeley National Lab): Integration with HDF5
Arham Khan (University of Chicago)
Pei-Yau Weng (Washington State University)
The 8th International Workshop on Data Analysis and Reduction for Big Scientific Data held with 2022 ACM/IEEE SC conference.
2022 NSF CSSI PI Meeting for “Towards a Sustainable Data and Software Cyberinfrastructure” at Alexandria, VA.
The 2nd International Workshop on Big Data Reduction held with 2021 IEEE Big Data conference.
This material is based upon work supported by the National Science Foundation under Grants No. 2104023 and 2247080. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.