Li-Wen Chang


A tiny Research Scientist and Engineering Director
at ByteDance

Portrait of Li-Wen Chang

About Me

I am a Principal Research Scientist and Engineering Director at ByteDance Seed, working on GenAI systems, distributed training, AI infrastructure, model compilation, and performance-critical libraries across GPUs and AI ASICs. I have led teams and built Triton-distributed (distributed Triton for parallel systems), ShadowKV (high-throughput long-context LLM inference), veScale (a PyTorch-native LLM training framework), Flux (communication-overlapping tensor-parallel primitives), ByteIR (model compilation across heterogeneous hardware), and ByteMLPerf (AI accelerator benchmarking). My CV is available upon request via email (dddscy AT gmail DOT com).

Previously, I was a Senior Software Engineer at Microsoft Cloud & AI, focusing on AI infrastructure, model compilation, and ONNX Runtime deployment across CPUs, GPUs, and AI ASICs.

I completed my Ph.D. under Professor Wen-mei Hwu in the IMPACT group, where I developed TANGRAM (Transformation-, Architecture-, Network-, Granularity-, Runtime-aware Adaptive Machine), a performance-portable high-level language designed to deliver strong performance across CPUs, GPUs, FPGAs, and distributed systems from a single source codebase.

News: We have one performance estimation paper accepted in MLSys 2026
News: We have one programming language and compiler paper (Forge) accepted in ICLR 2026
News: We have one inference paper (ShadowKV) accepted in ICML 2025 as a spotlight

Research/Work Interest


GenAI

MLSys

Heterogeneous Computing

Compiler Optimization

Biography


Li-Wen Chang is a Research Scientist and Engineering Director at ByteDance Seed, working on AI infrastructure, model compilation, and large-scale training and inference systems. His work spans TensorFlow, PyTorch, and AI ASICs, and he has been at ByteDance since June 2021. Previously, he was a Senior Software Engineer at Microsoft Cloud & AI, focusing on AI infrastructure, model compilation, and ONNX Runtime.

Li-Wen received the B.S. degree in Electrical Engineering from National Taiwan University (NTU) in 2007. He received a Gold Medal from the 13th Asian Pacific Mathematics Olympiad (APMO) 2001 during his high school. Also, he received multiple honors during undergraduate. During his undergraduate, he investigated Rolling Shutter Effect of CMOS, gave its first numerical analysis, and proposed an efficient algorithm to compensate it. The result is published in IEEE Transactions on Image Processing, 2008 (doi).

After undergraduate, Li-Wen joined a startup team in an ultrasonic imaging lab of NTU to build pioneering prototypes of high-frequency ultrasonic imaging machines, which can provide real-time non-invasive imaging with microscopic resolution for biomedical research. One of the prototypes was used for preclinical tumor research in National Taiwan University Hospital (NTUH) and became a commercial product. The startup was sold and merged into Coretronic Corp in 2009 and then renamed as S-Sharp.

After the startup, Li-Wen joined in UIUC under the supervision of Professor Wen-mei W. Hwu. During his Master, he proposed the first parallel tridiagonal solver with pivoting for GPUs. The result is published in SC'12 (doi) and included as gtsv in NVIDIA CUSPARSE 5.5 or later. He is also a contributor for a well-known GPU benchmark suite, Parboil, and a somehow useful collaborative computing benchmark suite, Chai. In his Ph.D., he designed a tool chain for achieving performance portability across CPUs, GPUs, FPGAs and distributed systems. He earned his Master in Aug. 2014 and his Ph.D. in Aug. 2017.

Let's Get In Touch!