Xiaodong Yu (xyu38)

Xiaodong Yu

Assistant Professor

Charles V. Schaefer, Jr. School of Engineering and Science

Department of Computer Science

Education

  • PhD (2019) Virginia Tech (Computer Science)

Research

High-Performance Computing, Large-scale Deep Learning, System Security

Institutional Service

  • CS Tenure-Track Faculty Search Committee Member

Professional Service

  • The 10th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD 2024) in conjunction with ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) Technical Program Committee Member
  • IEEE Transactions on Parallel and Distributed Systems (TPDS) Reviewer
  • IEEE Transactions on Parallel and Distributed Systems (TPDS) Review Board Member
  • The Fourth International Workshop on Big Data Reduction (IWBDR-4) in conjunction with 2023 IEEE International Conference on Big Data (IEEE BigData) Technical Program Committee Member
  • Future Generation Computer Systems (FGCS) - Elsevier Reviewer
  • The first Workshop on Software and Hardware Co-Design of Deep Learning Systems in Accelerators (SHDA 2023) in conjunction with ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) Technical Program Committee Member
  • IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Finance Chair

Appointments

Assistant Professor, Department of Computer Science, Stevens Institute of Technology, 2023 - present
Assistant Computer Scientist, Math and Computer Science Division, Argonne National Laboratory, 2019 - 2023

Professional Societies

  • ASEE – American Society for Engineering Education Member
  • ACM – Association for Computing Machinery Member
  • IEEE – Institute of Electrical and Electronics Engineers Member

Selected Publications

Conference Proceeding

  1. Shah, M.; Yu, X.; Di, S.; Becchi, M.; Cappello, F.; Dazzi, P.; Mencagli, G.; Lowenthal, D. K.; Badia, R. M. (2024). A Portable, Fast, DCT-based Compressor for AI Accelerators. Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024, Pisa, Italy, June 3-7, 2024 (pp. 109--121). ACM.
    https://doi.org/10.1145/3625549.3658662.
  2. Huang, J.; Di, S.; Yu, X.; Zhai, Y.; Zhang, Z.; Liu, J.; Lu, X.; Raffenetti, K.; Zhou, H.; Zhao, K.; Chen, Z.; Cappello, F.; Guo, Y.; Thakur, R. (2024). An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression. IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024 (pp. 752--764). IEEE.
    https://doi.org/10.1109/IPDPS57955.2024.00072.
  3. Xie, Z.; Emani, M.; Yu, X.; Tao, D.; He, X.; Su, P.; Zhou, K.; Vishwanath, V.; Bagchi, S.; Zhang, Y. (2024). Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor. Proceedings of the 2024 USENIX Annual Technical Conference, USENIX ATC 2024, Santa Clara, CA, USA, July 10-12, 2024 (pp. 1203--1221). USENIX Association.
    https://www.usenix.org/conference/atc24/presentation/xie.
  4. Song, S.; Huang, Y.; Jiang, P.; Yu, X.; Zheng, W.; Di, S.; Cao, Q.; Feng, Y.; Xie, Z.; Cappello, F.; Dazzi, P.; Mencagli, G.; Lowenthal, D. K.; Badia, R. M. (2024). CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2. Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024, Pisa, Italy, June 3-7, 2024 (pp. 309--321). ACM.
    https://doi.org/10.1145/3625549.3658691.
  5. Huang, J.; Di, S.; Yu, X.; Zhai, Y.; Liu, J.; Huang, Y.; Raffenetti, K.; Zhou, H.; Zhao, K.; Lu, X.; Chen, Z.; Cappello, F.; Guo, Y.; Thakur, R.; Kise, K.; Salapura, V.; Annavaram, M.; Varbanescu, A. L. (2024). gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters. Proceedings of the 38th ACM International Conference on Supercomputing, ICS 2024, Kyoto, Japan, June 4-7, 2024 (pp. 437--448). ACM.
    https://doi.org/10.1145/3650200.3656636.
  6. Huang, J.; Di, S.; Yu, X.; Zhai, Y.; Liu, J.; Huang, Y.; Raffenetti, K.; Zhou, H.; Zhao, K.; Chen, Z.; Cappello, F.; Guo, Y.; Thakur, R.; Steuwer, M.; Lee, I. A.; Chabbi, M. (2024). POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU Clusters. Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2024, Edinburgh, United Kingdom, March 2-6, 2024 (pp. 454--456). ACM.
    https://doi.org/10.1145/3627535.3638467.
  7. Zhang, C.; Sun, B.; Yu, X.; Xie, Z.; Zheng, W.; Iskra, K. A.; Beckman, P.; Tao, D. (2023). Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors. Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC-W 2023, Denver, CO, USA, November 12-17, 2023 (pp. 1757--1766). ACM.
    https://doi.org/10.1145/3624062.3624257.
  8. Huang, Y.; Di, S.; Yu, X.; Li, G.; Cappello, F.; Arnold, D.; Badia, R. M.; Mohror, K. M. (2023). cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End Performance. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023, Denver, CO, USA, November 12-17, 2023 (pp. 43:1--43:13). ACM.
    https://doi.org/10.1145/3581784.3607048.
  9. Zhang, B.; Tian, J.; Di, S.; Yu, X.; Feng, Y.; Liang, X.; Tao, D.; Cappello, F.; Butt, A. R.; Mi, N.; Chard, K. (2023). FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs. Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2023, Orlando, FL, USA, June 16-23, 2023 (pp. 129--142). ACM.
    https://doi.org/10.1145/3588195.3592994.
  10. Shah, M.; Yu, X.; Di, S.; Lykov, D.; Alexeev, Y.; Becchi, M.; Cappello, F. (2023). GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations. IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023, St. Petersburg, FL, USA, May 15-19, 2023 (pp. 757--767). IEEE.
    https://doi.org/10.1109/IPDPS54959.2023.00081.
  11. Zhang, B.; Tian, J.; Di, S.; Yu, X.; Swany, M.; Tao, D.; Cappello, F.; Gallivan, K. A.; Gallopoulos, E.; Nikolopoulos, D. S.; Ram\'on Beivide (2023). GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs. Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023 (pp. 348--359). ACM.
    https://doi.org/10.1145/3577193.3593706.
  12. Zhang, C.; Smith, S.; Sun, B.; Tian, J.; Soifer, J.; Yu, X.; Song, S. L.; He, Y.; Tao, D.; Gallivan, K. A.; Gallopoulos, E.; Nikolopoulos, D. S.; Ram\'on Beivide (2023). HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023 (pp. 324--335). ACM.
    https://doi.org/10.1145/3577193.3593717.
  13. Shah, M.; Yu, X.; Di, S.; Becchi, M.; Cappello, F.; Gallivan, K. A.; Gallopoulos, E.; Nikolopoulos, D. S.; Ram\'on Beivide (2023). Lightweight Huffman Coding for Efficient GPU Compression. Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023 (pp. 99--110). ACM.
    https://doi.org/10.1145/3577193.3593736.
  14. Rivera, C.; Di, S.; Tian, J.; Yu, X.; Tao, D.; Cappello, F. (2022). Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs. 2022 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022, Lyon, France, May 30 - June 3, 2022 (pp. 717--727). IEEE.
    https://doi.org/10.1109/IPDPS53621.2022.00075.
  15. Yu, X.; Di, S.; Zhao, K.; Tian, J.; Tao, D.; Liang, X.; Cappello, F.; Weissman, J. B.; Chandra, A.; Gavrilovska, A.; Tiwari, D. (2022). Ultrafast Error-bounded Lossy Compression for Scientific Datasets. HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022 - 1 July 2022 (pp. 159--171). ACM.
    https://doi.org/10.1145/3502181.3531473.
  16. Yu, X.; Di, S.; Gok, A. M.; Tao, D.; Cappello, F. (2021). cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions. IEEE International Conference on Cluster Computing, CLUSTER 2021, Portland, OR, USA, September 7-10, 2021 (pp. 307--319). IEEE.
    https://doi.org/10.1109/Cluster48925.2021.00065.
  17. Bicer, T.; Yu, X.; Ching, D. J.; Chard, R.; Cherukara, M. J.; Nicolae, B.; Kettimuthu, R.; Foster, I. T.; Nichols, J.; Maccabe, A. B.; Nutaro, J. J.; Pophale, S.; Devineni, P.; Ahearn, T.; Verastegui, B. (2021). High-Performance Ptychographic Reconstruction with Federated Facilities. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation - 21st Smoky Mountains Computational Sciences and Engineering, SMC 2021, Virtual Event, October 18-20, 2021, Revised Selected Papers (vol. 1512, pp. 173--189). Springer.
    https://doi.org/10.1007/978-3-030-96498-6/_10.
  18. Tian, J.; Di, S.; Yu, X.; Rivera, C.; Zhao, K.; Jin, S.; Feng, Y.; Liang, X.; Tao, D.; Cappello, F. (2021). Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs. IEEE International Conference on Cluster Computing, CLUSTER 2021, Portland, OR, USA, September 7-10, 2021 (pp. 283--293). IEEE.
    https://doi.org/10.1109/Cluster48925.2021.00047.
  19. Yu, X.; Bicer, T.; Kettimuthu, R.; Foster, I. T.; Zhou, H.; Moreira, J.; Mueller, F.; Etsion, Y. (2021). Topology-aware optimizations for multi-GPU ptychographic image reconstruction. ICS '21: 2021 International Conference on Supercomputing, Virtual Event, USA, June 14-17, 2021 (pp. 354--366). ACM.
    https://doi.org/10.1145/3447818.3460380.
  20. Yu, X.; Wei, F.; Ou, X.; Becchi, M.; Bicer, T.; Yao, D. D. (2020). GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting. 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, May 18-22, 2020 (pp. 274--284). IEEE.
    https://doi.org/10.1109/IPDPS47924.2020.00037.
  21. Yu, X.; Xiao, Y.; Cameron, K. W.; Danfeng (Daphne) Yao; Jansen, R.; Peterson, P. A. (2019). Comparative Measurement of Cache Configurations' Impacts on Cache Timing Side-Channel Attacks. 12th USENIX Workshop on Cyber Security Experimentation and Test, CSET 2019, Santa Clara, CA, USA, August 12, 2019. USENIX Association.
    https://www.usenix.org/conference/cset19/presentation/yu.
  22. Lux, T. C.; Watson, L. T.; Chang, T. H.; Bernard, J.; Li, B.; Yu, X.; Li Xu; Back, G.; Butt, A. R.; Cameron, K. W.; Yao, D.; Hong, Y.; Wong, K.; Shen, C.; Brown, D. (2018). Novel meshes for multivariate interpolation and approximation. Proceedings of the ACMSE 2018 Conference, Richmond, KY, USA, March 29-31, 2018 (pp. 13:1--13:7). ACM.
    https://doi.org/10.1145/3190645.3190687.
  23. Yu, X.; Hou, K.; Wang, H.; Feng, W. (2017). A framework for fast and fair evaluation of automata processing hardware. 2017 IEEE International Symposium on Workload Characterization, IISWC 2017, Seattle, WA, USA, October 1-3, 2017 (pp. 120--121). IEEE Computer Society.
    https://doi.org/10.1109/IISWC.2017.8167767.
  24. Yu, X.; Wang, H.; Feng, W.; Gong, H.; Cao, G. (2017). An Enhanced Image Reconstruction Tool for Computed Tomography on CPUs. Proceedings of the Computing Frontiers Conference, CF'17, Siena, Italy, May 15-17, 2017 (pp. 97--106). ACM.
    https://doi.org/10.1145/3075564.3078889.
  25. Nourian, M.; Wang, X.; Yu, X.; Feng, W.; Becchi, M.; Gropp, W. D.; Beckman, P.; Li, Z.; Cazorla, F. J. (2017). Demystifying automata processing: GPUs, FPGAs or Micron's AP?. Proceedings of the International Conference on Supercomputing, ICS 2017, Chicago, IL, USA, June 14-16, 2017 (pp. 1:1--1:11). ACM.
    https://doi.org/10.1145/3079079.3079100.
  26. Yu, X.; Hou, K.; Wang, H.; Feng, W.; Nie, J.; Obradovic, Z.; Suzumura, T.; Ghosh, R.; Nambiar, R.; Wang, C.; Zang, H.; Baeza-Yates, R.; Hu, X.; Kepner, J.; Cuzzocrea, A.; Tang, J.; Toyoda, M. (2017). Robotomata: A framework for approximate pattern matching of big data on an automata processor. 2017 IEEE International Conference on Big Data (IEEE BigData 2017), Boston, MA, USA, December 11-14, 2017 (pp. 283--292). IEEE Computer Society.
    https://doi.org/10.1109/BigData.2017.8257936.
  27. Yu, X.; Wang, H.; Feng, W.; Gong, H.; Cao, G. (2016). cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs. IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, Cartagena, Colombia, May 16-19, 2016 (pp. 165--168). IEEE Computer Society.
    https://doi.org/10.1109/CCGrid.2016.96.
  28. Yu, X.; Feng, W.; Danfeng (Daphne) Yao; Becchi, M.; Crowley, P.; Rizzo, L.; Mathy, L. (2016). O3FA: A Scalable Finite Automata-based Pattern-Matching Engine for Out-of-Order Deep Packet Inspection. Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems, ANCS 2016, Santa Clara, CA, USA, March 17-18, 2016 (pp. 1--11). ACM.
    https://doi.org/10.1145/2881025.2881034.
  29. Yu, X.; Becchi, M.; Nicolau, A.; Shen, X.; Amarasinghe, S. P.; Vuduc, R. W. (2013). Exploring different automata representations for efficient regular expression matching on GPUs. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, Shenzhen, China, February 23-27, 2013 (pp. 287--288). ACM.
    https://doi.org/10.1145/2442516.2442548.
  30. Yu, X.; Becchi, M.; Franke, H.; Heinecke, A.; Palem, K. V.; Upfal, E. (2013). GPU acceleration of regular expression matching for large datasets: exploring the implementation space. Computing Frontiers Conference, CF'13, Ischia, Italy, May 14 - 16, 2013 (pp. 18:1--18:10). ACM.
    https://doi.org/10.1145/2482767.2482791.

Journal Article

  1. Di, S.; Liu, J.; Zhao, K.; Liang, X.; Underwood, R.; Zhang, Z.; Shah, M.; Huang, Y.; Huang, J.; Yu, X.; Ren, C.; Guo, H.; Wilkins, G.; Tao, D.; Tian, J.; Jin, S.; Jian, Z.; Wang, D.; Rahman, M. H.; Zhang, B.; Calhoun, J. C.; Li, G.; Yoshii, K.; Alharthi, K. A.; Cappello, F. (2024). A Survey on Error-Bounded Lossy Compression for Scientific Datasets. CoRR (vol. abs/2404.02840).
    https://doi.org/10.48550/arXiv.2404.02840.
  2. Huang, J.; Di, S.; Yu, X.; Zhai, Y.; Liu, J.; Raffenetti, K.; Zhou, H.; Zhao, K.; Chen, Z.; Cappello, F.; Guo, Y.; Thakur, R. (2023). C-Coll: Introducing Error-bounded Lossy Compression into MPI Collectives. CoRR (vol. abs/2304.03890).
    https://doi.org/10.48550/arXiv.2304.03890.
  3. Sun, B.; Yu, X.; Zhang, C.; Tian, J.; Jin, S.; Iskra, K.; Zhou, T.; Bicer, T.; Beckman, P.; Tao, D. (2022). SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates. CoRR (vol. abs/2211.00224).
    https://doi.org/10.48550/arXiv.2211.00224.
  4. Bicer, T.; Yu, X.; Ching, D. J.; Chard, R.; Cherukara, M. J.; Nicolae, B.; Kettimuthu, R.; Foster, I. T. (2021). High-Performance Ptychographic Reconstruction with Federated Facilities. CoRR (vol. abs/2111.11330).
    https://arxiv.org/abs/2111.11330.
  5. Yu, X.; Nikitin, V. V.; Ching, D. J.; Aslan, S. S.; Doga G\"ursoy; Bicer, T. (2021). Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data. CoRR (vol. abs/2106.07575).
    https://arxiv.org/abs/2106.07575.
  6. Yu, X.; Wang, H.; Feng, W.; Gong, H.; Cao, G. (2019). GPU-Based Iterative Medical CT Image Reconstructions. J. Signal Process. Syst. (3-4 ed., vol. 91, pp. 321--338).
    https://doi.org/10.1007/s11265-018-1352-0.
  7. Yu, X.; Lin, B.; Becchi, M. (2014). Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence. IEEE J. Sel. Areas Commun. (10 ed., vol. 32, pp. 1822--1833).
    https://doi.org/10.1109/JSAC.2014.2358840.