Selected Publications

Refereed Journal Articles

  • Sarunya Pumma, Min Si, Wu-chun Feng, and Pavan Balaji. Scalable Deep Learning via I/O Analysis and Optimization. In ACM Transactions on Parallel Computing (TOPC), vol. 6, no. 2, pp. 1–34. June, 2019.
  • Min Si, Antonio J Peña, Jeff Hammond, Pavan Balaji, Masamichi Takagi, and Yutaka Ishikawa. Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications. In IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 29, no. 9, pp. 1975-1989. Sept. 1, 2018. [DOI] [PDF]

Refereed Conferences Papers (peer-reviewed)

  • Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, and Pavan Balaji. Daps: A Dynamic Asynchronous Progress Stealing Model for MPI Communication. In Proceedings of 2021 IEEE International Conference on Cluster Computing (CLUSTER), pages 516–527. [PDF]
  • Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, and Pavan Balaji. CAB-MPI: Exploring Interprocess Work-Stealing toward Balanced MPI Communication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20). IEEE Press, Article 36, 1–15. (Acceptance Rate: 25.4%) [PDF]
  • Abdelhalim Amer, Charles Archer, Michael Blocksome, Chongxiao Cao, Michael Chuvelev, Hajime Fujita, Maria Garzaran, Yanfei Guo, Jeff R. Hammond, Shintaro Iwasaki, Kenneth J. Raffenetti, Mikhail Shiryaev, Min Si, Kenjiro Taura, Sagar Thapaliya, and Pavan Balaji. Software Combining to Mitigate Multithreaded MPI Contention. In Proceedings of the ACM International Conference on Supercomputing (ICS '19). Association for Computing Machinery, New York, NY, USA, 367–379. (Acceptance Rate: 23.3%)
  • Atsushi Hori, Min Si (Joint First Co-Author), Balazs Gerofi, Masamichi Takagi, Jai Dayal, Pavan Balaji, and Yutaka Ishikawa. Process-in-Process: Techniques for Practical Address-Space Sharing. ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC). Jun. 11--15, 2018, Tempe, Arizona, USA. (Acceptance Rate: 19.6%) Best Paper Award [PDF]
  • Sarunya Pumma, Min Si, Wu-Chun Feng, and Pavan Balaji. Parallel I/O Optimizations for Scalable Deep Learning. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 15-17, 2017, Shenzhen, China. [PDF]
  • Min Si, and Pavan Balaji. Process-based Asynchronous Progress Model for MPI Point-To-Point Communication. IEEE International Conference on High Performance Computing and Communications (HPCC).Dec. 18-20, 2017, Bangkok, Thailand. (Acceptance Rate: 38%) [PDF]
  • Sarunya Pumma, Min Si, Wu-Chun Feng, and Pavan Balaji. Towards Scalable Deep Learning via I/O Analysis and Optimization. IEEE International Conference on High Performance Computing and Communications (HPCC). Dec. 18-20, 2017, Bangkok, Thailand. (Acceptance Rate: 38%) [PDF]
  • Kenneth J. Raffenetti, Abdelhalim Amer, Lena Oden, Charles Archer, Wesley Bland, Hajime Fujita, Yanfei Guo, Tomislav Janjusic, Dmitry Durnov, Michael Blocksome, Min Si, Sangmin Seo, Akhil Langer, Gengbin Zheng, Masamichi Takagi, Paul Coffman, Jithin Jose, Sayantan Sur, Alexander Sannikov, Sergey Oblomov, Michael Chuvelev, Masayuki Hatanaka, Xin Zhao, Paul Fischer, Thilina Rathnayake, Matt Otten, Misun Min, and Pavan Balaji. Why is MPI so Slow? Analyzing the Fundamental Limits in Implementing MPI-3.1. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 12-17, 2017, Denver, Colorado. (Acceptance Rate: 18%) [PDF]
  • Min Si, Antonio J Peña, Jeff Hammond, Pavan Balaji, and Yutaka Ishikawa. Scaling NWChem with Efficient and Portable Asynchronous Communication in MPI RMA. In Proceedings of 8th IEEE International Scalable Computing Challenge - Colocated with IEEE/ACM CCGrid 2015, pages 811 - 816, May 2015. (Acceptance Rate: 33%) Scale Challenge Finalist [PDF]
  • Min Si, Antonio J Peña, Jeff Hammond, Pavan Balaji, Masamichi Takagi, and Yutaka Ishikawa. Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures. In Proceedings of the IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2015), pages 665 - 676, May 2015. (Acceptance Rate: 21.8%) [PDF] [SLIDES]
  • Min Si, Antonio J. Peña, Pavan Balaji, Masamichi Takagi, and Yutaka Ishikawa. MT-MPI: Multithreaded MPI for Many-core Environments. In Proceedings of the 28th ACM International Conference on Supercomputing, ICS '14, pages 125–134. ACM, 2014. (Acceptance Rate: 21%) [PDF] [SLIDES]

Refereed Workshops Papers (peer-reviewed)

  • Michael Wilkins, Yanfei Guo, Rajeev Thakur, Nikos Hardavellas, Peter Dinda, and Min Si. A FACT-based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems. In 2021 Workshop on Exascale MPI (ExaMPI), pages 36-45. 2021-11. [PDF]
  • Min Si, Huansong Fu, Jeff Hammond, and Pavan Balaji. OpenSHMEM over MPI as a Performance Contender: Thorough Analysis and Optimizations. In OpenSHMEM and Related Technologies Workshop 2021. 2021-09. [PDF]
  • Min Si, Yutaka Ishikawa, and Masamichi Tatagi. Direct MPI Library for Intel Xeon Phi Co-Processors. In Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, pages 816–824, 2013-05. [PDF]
  • Min Si, and Yutaka Ishikawa. Design of Direct Communication Facility for Many-Core Based Accelerators. In Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2012 IEEE 26th International, pages 924 –929, 2012-05. [PDF]