
The full publication list is available in my Google Scholar site

Full Publication

Selected Publications


Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing
Chenning Tao, Liqiang Lu, Size Zheng, Li-Wen Chang, Minghua Shen, Hanyu Zhang, Fangxin Liu, Kaiwen Zhou, and Jianwei Yin
International Symposium on Computer Architecture, 2025 (ISCA 2025)

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives (arXiv)
Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ningxin Zheng, Ziheng Jiang, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, and Xin Liu
Conference on Machine Learning and Systems, 2025 (MLSys 2025)

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts (arXiv)
Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, Qi Hou, Weihao Cui, Size Zheng, Li-Wen Chang, Quan Chen, and Xin Liu
Conference on Machine Learning and Systems, 2025 (MLSys 2025)

Collaborative Computing on Heterogeneous CPU-FPGA Architectures Using OpenCL
Sitao Huang, Li-Wen Chang, Izzat El Hajj, Simon Garcia de Gonzalo, Juan Gómez Luna, Sai Rahul Chalamalasetti, Mohamed El-Hadedy, Dejan Milojicic, Onur Mutlu, Deming Chen and Wen-Mei Hwu
ACM/SPEC International Conference on Performance Engineering, 2019 (ICPE 2019)

Accelerating Recurrent Neural Networks through Compiler Techniques and Quantization
Li-Wen Chang, Yang Chen, Wenlei Bao, Amit Agarwal, Eldar Akchurin, Ke Deng and Emad Barsoum
Workshop on Systems for ML at NeurIPS, 2018

Collaborative Computing for Heterogeneous Integrated Systems (doi)
Li-Wen Chang, Juan Gómez-Luna, Izzat El Hajj, Sitao Huang, Deming Chen and Wen-mei W. Hwu
ACM/SPEC International Conference on Performance Engineering, 2017 (ICPE 2017) (conference h5-index = 21)

Chai: Collaborative Heterogeneous Applications for Integrated-architectures
Juan Gómez-Luna, Izzat El Hajj, Li-Wen Chang, Victor Garcia-Flores, Simon Garcia de Gonzalo, Thomas B. Jablin, Antonio J. Peña and Wen-Mei Hwu
IEEE International Symposium on Performance Analysis of Systems and Software, 2017 (ISPASS 2017), to appear (conference h5-index = 24, acceptance rate: 24/81 = 29.6%)

Efficient Kernel Synthesis for Performance Portable Programming (doi)
Li-Wen Chang, Izzat El Hajj, Christopher Rodrigues, Juan Gómez-Luna and Wen-mei W. Hwu
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016 (MICRO-49), (conference h5-index = 39, acceptance rate: 61/283 = 21.6%)

KLAP: Kernel Launch Aggregation and Promotion for Optimizing Dynamic Parallelism (doi)
Izzat El Hajj, Juan Gómez-Luna, Cheng Li, Li-Wen Chang, Dejan Milojicic and Wen-mei W. Hwu
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016 (MICRO-49), (conference h5-index = 39, acceptance rate: 61/283 = 21.6%)

DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model (doi)
Li-Wen Chang, Hee-Seok Kim, and Wen-mei W. Hwu
Proceedings of the 21th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2016) (conference h5-index = 51, acceptance rate: 53/232 = 22.8%)

A Programming System for Future Proofing Performance Critical Libraries (doi)
Li-Wen Chang, Izzat El Hajj, Hee-Seok Kim, Juan Gómez-Luna, Abdul Dakkak and Wen-mei W. Hwu
Proceedings of the 21th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2016) (conference h5-index = 34)

In-Place Data Sliding Algorithms for Many-Core Architectures (doi)
Juan Gómez Luna, Li-Wen Chang, I-Jui Sung, Nicolás Guil Mata and Wen-Mei Hwu
In Parallel Processing, International Conference on (ICPP), 2015 (conference h5-index = 22, acceptance rate: 99/305 = 32.5%)

Adaptive Cache Management for Energy-efficient GPU Computing (doi)
X. Chen, L.-W. Chang, C. I. Rodrigues, J. Lv, Z. Wang, and W.-m. W. Hwu
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014 (MICRO-47) (conference h5-index = 40, acceptance rate: 53/273 = 19.4%)

A Scalable, Numerically Stable, High-performance Tridiagonal Solver using GPUs (doi, code)
Li-Wen Chang, John A. Stratton, Hee-Seok Kim, and Wen-mei W. Hwu
The International Conference for High Performance Computing, Networking Storage and Analysis 2012 (SC 2012) (conference h5-index = 46, acceptance rate: 100/472 = 21.2%)

Optimization and Architecture Effects on GPU Computing Workload Performance (doi)
J. A. Stratton, N. Anssari, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, G. Liu, W.-m. Hwu
Innovative Parallel Computing, 2012 (Voted #2 Best Paper Finalist)

A Tiling-Scheme Viterbi Decoder in Software- Defined Radio for GPUs (doi)
C.-S. Lin, W.-L. Liu, W.-T. Yeh, L.-W. Chang, W.-M. Hwu, S.-J. Chen, and P.-A. Hsiung
In Wireless Communications, Networking and Mobile Computing, the 7th International Conference on, p.p. 1- 4, 2011 (conference h5-index = 17)

A Scalable Tridiagonal Solver for GPUs (doi)
Hee-Seok Kim, Shengzhao Wu, Li-Wen Chang, Wen-mei W. Hwu
In Parallel Processing, International Conference on (ICPP), p.p. 444-453, 2011 (conference h5-index = 22, acceptance rate: 81/363 = 22.3%)

Parallel Implementation of Multi-Dimensional Ensemble Empirical Mode Decomposition (doi, code)
L.-W. Chang, M.-T. Lo, N. Anssari, K.-H. Hsu, N. Huang, W.-m. W. Hwu
International Conference on Acoustics, Speech, and Signal Processing, 2011 (ICASSP 2011) (conference h5-index = 47)

GPU-Based Color Doppler Ultrasound Processing (doi)
L.-W. Chang, K.-H. Hsu, P.-C. Li
International Ultrasonics Symposium (IUS), 2009 (conference h5-index = 14)

Depth Detection of Light Field (doi)
Yi-Hao Kao, Chia-Kai Liang, Li-Wen Chang, Homer H. Chen
ICASSP 2007 (conference h5-index = 47)

Technical Report

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference (arXiv)
Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen, 2024

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion (arXiv)
Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Chengji Yao, Ziheng Jiang, Haibin Lin, Xin Jin, and Xin Liu, 2024

NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques (arXiv)
Wenlei Bao, Li-Wen Chang, Yang Chen, Ke Deng, Amit Agarwal, Emad Barsoum, and Abe Taha, 2019

Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing (pdf, code)
John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, Wen-mei W. Hwu
IMPACT Technical Report, University of Illinois at Urbana-Champaign, 2012

Invited Talk

Open Source Adoption Lessons and Improvements for ML Compiler in Production
the 4th C4ML workshop, at CGO 2023

High-performance Linear Recurrence, and Its Applications
the 1st International Workshop on Computational Science and Engineering, 2013

A Scalable, Numerically Stable, High-performance Tridiagonal Solver for GPUs
GPU Technology Conference (GTC), 2013

A Scalable Tridiagonal Solver for GPUs
Private talk, INRIA, 2011

Parallel Empirical Mode Decomposition for GPUs
The HHT'3 workshop tutorial, 2011


In-Place Matrix Transposition on GPUs (doi)
J. Gómez-Luna, I.-J. Sung, L.-W. Chang, J. M. González-Linares, N. Guil and W.-m. Hwu
IEEE TPDS, 27(3), Mar. 2015 (journal h5-index = 76, Impact Factor = 2.173)

Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems (doi)
J. A. Stratton, C. Rodrigues, I.-J. Sung, L.-W. Chang, N. Anssari, G. D. Liu, W.-m. W. Hwu, N. Obeid
IEEE Computer, 45(8), Aug. 2012 (Impact Factor = 1.438)

Graphics Processing Unit-Based High-Frame-Rate Color Doppler Ultrasound Processing (doi)
Li-Wen Chang, Ke-Hsin Hsu, Pai-Chi Li
IEEE TUFFC, 56(9), Sept. 2009 (journal h5-index = 35, Impact Factor = 1.503)

Analysis and Compensation of Rolling Shutter Effect (doi)
Chia-Kai Liang, Li-Wen Chang, Homer H. Chen
IEEE TIP, 17(8), Aug. 2008 (journal h5-index = 75, Impact Factor = 3.111)

Book Chapter

Parallel Patterns: Prefix Sum
Li-Wen Chang, Juan Gómez-Luna, David B. Kirk, and Wen-mei W Hwu
Programming Massively Parallel Processors: A Hands-on Approach, Ch. 8, 2016

Parallel Patterns: Merge Sort
Li-Wen Chang, Jie Lv, David B. Kirk, and Wen-mei W Hwu
Programming Massively Parallel Processors: A Hands-on Approach, Ch. 11, 2016

A Guide for Implementing Tridiagonal Solvers on GPUs (doi)
Li-Wen Chang and Wen-mei W Hwu
Numerical Computations with GPUs, Ch. 2, 2014


Toward Performance Portability for CPUs and GPUs through Algorithmic Compositions
Ph.D. Dissertation, ECE UIUC, 2017

Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-core Architectures (pdf)
MS Thesis, ECE UIUC, 2014

Let's Get In Touch!