Proceeding
Collaborative Computing on Heterogeneous CPU-FPGA Architectures Using OpenCL
Sitao Huang, Li-Wen Chang, Izzat El Hajj, Simon Garcia de Gonzalo, Juan Gómez Luna, Sai Rahul Chalamalasetti, Mohamed El-Hadedy, Dejan Milojicic, Onur Mutlu, Deming Chen and Wen-Mei Hwu
ACM/SPEC International Conference on Performance Engineering, 2019 (ICPE 2019)
Accelerating Recurrent Neural Networks through Compiler Techniques and Quantization
Li-Wen Chang, Yang Chen, Wenlei Bao, Amit Agarwal, Eldar Akchurin, Ke Deng and Emad Barsoum
Workshop on Systems for ML at NeurIPS, 2018
Collaborative Computing for Heterogeneous Integrated Systems (doi)
Li-Wen Chang, Juan Gómez-Luna, Izzat El Hajj, Sitao Huang, Deming Chen and Wen-mei W. Hwu
ACM/SPEC International Conference on Performance Engineering, 2017 (ICPE 2017) (conference h5-index = 21)
Chai: Collaborative Heterogeneous Applications for Integrated-architectures
Juan Gómez-Luna, Izzat El Hajj, Li-Wen Chang, Victor Garcia-Flores, Simon Garcia de Gonzalo, Thomas B. Jablin, Antonio J. Peña and Wen-Mei Hwu
IEEE International Symposium on Performance Analysis of Systems and Software, 2017 (ISPASS 2017), to appear (conference h5-index = 24, acceptance rate: 24/81 = 29.6%)
Efficient Kernel Synthesis for Performance Portable Programming (doi)
Li-Wen Chang, Izzat El Hajj, Christopher Rodrigues, Juan Gómez-Luna and Wen-mei W. Hwu
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016 (MICRO-49), (conference h5-index = 39, acceptance rate: 61/283 = 21.6%)
KLAP: Kernel Launch Aggregation and Promotion for Optimizing Dynamic Parallelism (doi)
Izzat El Hajj, Juan Gómez-Luna, Cheng Li, Li-Wen Chang, Dejan Milojicic and Wen-mei W. Hwu
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016 (MICRO-49), (conference h5-index = 39, acceptance rate: 61/283 = 21.6%)
DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model (doi)
Li-Wen Chang, Hee-Seok Kim, and Wen-mei W. Hwu
Proceedings of the 21th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2016) (conference h5-index = 51, acceptance rate: 53/232 = 22.8%)
A Programming System for Future Proofing Performance Critical Libraries (doi)
Li-Wen Chang, Izzat El Hajj, Hee-Seok Kim, Juan Gómez-Luna, Abdul Dakkak and Wen-mei W. Hwu
Proceedings of the 21th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2016) (conference h5-index = 34)
In-Place Data Sliding Algorithms for Many-Core Architectures (doi)
Juan Gómez Luna, Li-Wen Chang, I-Jui Sung, Nicolás Guil Mata and Wen-Mei Hwu
In Parallel Processing, International Conference on (ICPP), 2015 (conference h5-index = 22, acceptance rate: 99/305 = 32.5%)
Adaptive Cache Management for Energy-efficient GPU Computing (doi)
X. Chen, L.-W. Chang, C. I. Rodrigues, J. Lv, Z. Wang, and W.-m. W. Hwu
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014 (MICRO-47) (conference h5-index = 40, acceptance rate: 53/273 = 19.4%)
A Scalable, Numerically Stable, High-performance Tridiagonal Solver using GPUs (doi, code)
Li-Wen Chang, John A. Stratton, Hee-Seok Kim, and Wen-mei W. Hwu
The International Conference for High Performance Computing, Networking Storage and Analysis 2012 (SC 2012) (conference h5-index = 46, acceptance rate: 100/472 = 21.2%)
Optimization and Architecture Effects on GPU Computing Workload Performance (doi)
J. A. Stratton, N. Anssari, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, G. Liu, W.-m. Hwu
Innovative Parallel Computing, 2012 (Voted #2 Best Paper Finalist)
A Tiling-Scheme Viterbi Decoder in Software- Defined Radio for GPUs (doi)
C.-S. Lin, W.-L. Liu, W.-T. Yeh, L.-W. Chang, W.-M. Hwu, S.-J. Chen, and P.-A. Hsiung
In Wireless Communications, Networking and Mobile Computing, the 7th International Conference on, p.p. 1- 4, 2011 (conference h5-index = 17)
A Scalable Tridiagonal Solver for GPUs (doi)
Hee-Seok Kim, Shengzhao Wu, Li-Wen Chang, Wen-mei W. Hwu
In Parallel Processing, International Conference on (ICPP), p.p. 444-453, 2011 (conference h5-index = 22, acceptance rate: 81/363 = 22.3%)
Parallel Implementation of Multi-Dimensional Ensemble Empirical Mode Decomposition (doi, code)
L.-W. Chang, M.-T. Lo, N. Anssari, K.-H. Hsu, N. Huang, W.-m. W. Hwu
International Conference on Acoustics, Speech, and Signal Processing, 2011 (ICASSP 2011) (conference h5-index = 47)
GPU-Based Color Doppler Ultrasound Processing (doi)
L.-W. Chang, K.-H. Hsu, P.-C. Li
International Ultrasonics Symposium (IUS), 2009 (conference h5-index = 14)
Depth Detection of Light Field (doi)
Yi-Hao Kao, Chia-Kai Liang, Li-Wen Chang, Homer H. Chen
ICASSP 2007 (conference h5-index = 47)
Technical Report
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference (arXiv)
Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen
arXiv.org, 2024
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion (arXiv)
Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Chengji Yao, Ziheng Jiang, Haibin Lin, Xin Jin, and Xin Liu
arXiv.org, 2024
NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques (arXiv)
Wenlei Bao, Li-Wen Chang, Yang Chen, Ke Deng, Amit Agarwal, Emad Barsoum, and Abe Taha
arXiv.org, 2019
Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing (pdf, code)
John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, Wen-mei W. Hwu
IMPACT Technical Report, University of Illinois at Urbana-Champaign, 2012
Invited Talk
Open Source Adoption Lessons and Improvements for ML Compiler in Production
the 4th C4ML workshop, at CGO 2023
High-performance Linear Recurrence, and Its Applications
the 1st International Workshop on Computational Science and Engineering, 2013
A Scalable, Numerically Stable, High-performance Tridiagonal Solver for GPUs
GPU Technology Conference (GTC), 2013
A Scalable Tridiagonal Solver for GPUs
Private talk, INRIA, 2011
Parallel Empirical Mode Decomposition for GPUs
The HHT'3 workshop tutorial, 2011
Journal
In-Place Matrix Transposition on GPUs (doi)
J. Gómez-Luna, I.-J. Sung, L.-W. Chang, J. M. González-Linares, N. Guil and W.-m. Hwu
IEEE TPDS, 27(3), Mar. 2015
(journal h5-index = 76, Impact Factor = 2.173)
Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems (doi)
J. A. Stratton, C. Rodrigues, I.-J. Sung, L.-W. Chang, N. Anssari, G. D. Liu, W.-m. W. Hwu, N. Obeid
IEEE Computer, 45(8), Aug. 2012 (Impact Factor = 1.438)
Graphics Processing Unit-Based High-Frame-Rate Color Doppler Ultrasound Processing (doi)
Li-Wen Chang, Ke-Hsin Hsu, Pai-Chi Li
IEEE TUFFC, 56(9), Sept. 2009 (journal h5-index = 35, Impact Factor = 1.503)
Analysis and Compensation of Rolling Shutter Effect (doi)
Chia-Kai Liang, Li-Wen Chang, Homer H. Chen
IEEE TIP, 17(8), Aug. 2008 (journal h5-index = 75, Impact Factor = 3.111)
Book Chapter
Parallel Patterns: Prefix Sum
Li-Wen Chang, Juan Gómez-Luna, David B. Kirk, and Wen-mei W Hwu
Programming Massively Parallel Processors: A Hands-on Approach, Ch. 8, 2016
Parallel Patterns: Merge Sort
Li-Wen Chang, Jie Lv, David B. Kirk, and Wen-mei W Hwu
Programming Massively Parallel Processors: A Hands-on Approach, Ch. 11, 2016
A Guide for Implementing Tridiagonal Solvers on GPUs (doi)
Li-Wen Chang and Wen-mei W Hwu
Numerical Computations with GPUs, Ch. 2, 2014
Thesis
Toward Performance Portability for CPUs and GPUs through Algorithmic Compositions
Ph.D. Dissertation, ECE UIUC, 2017
Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-core Architectures (pdf)
MS Thesis, ECE UIUC, 2014