Li-Wen's Webpage

About Me

I am currently a Principal-level Research Scientist and Engineering Director at ByteDance Seed, working at GenAI, distributed training, AI intrastructure, model compilation, libraries, PyTorch, OpenAI Triton, Tensorflow, ONNX, NPU ASICs. I managed, led several teams, and built Triton-distributed (distributed Triton for parallel systems), ShadowKV (high-throughput long-context LLM inference), veScale (a PyTorch native LLM training framework), Flux (a fast communication-overlapping library for tensor parallelism on GPUs), ByteIR (a model compilation solution for various hardware), and ByteMLPerf (an AI accelerator benchmarking tool). My CV is available upon request through email (dddscy AT gmail DOT com).

Previously, I was Senior SW Engineer at Microsoft Cloud & AI, working at AI intrastructure, model compilation, libraries, PyTorch, ONNX, and NPUs.

I pursued my Ph.D. in Professor Wen-mei Hwu's IMPACT group. I worked on a performance portable high-level language called TANGRAM (Transformation-, Architecture-, Network-, Granularity-, Runtime-aware Adaptive Machine). It is designed to achieve high performance across CPUs, GPUs, FPGAs and distributed systems from single source code.

News: We have one inference paper (ShadowKV) accepted in ICML 2025 as a spotlight
News: We open-sourced Triton-distributed and released the techical report in arXiv.
News: We have one inference paper (MegaScale-Infer) accepted in OSDI 2025.
News: We have one quantum computing paper accepted in ISCA 2025.
News: We have two communication overlap papers (Comet and TileLink) accepted in MLSys 2025.

Research/Work Interest

GenAI

MLSys

Heterogeneous Computing

Compiler Optimization

Li-Wen Chang is a Research Scientist and Engineering Director at ByteDance Seed, workinng at AI intrastructure, model compilation, libraries, Tensorflow, PyTorch, and NPU ASICs, since June 2021. Previously, he was a Senior Software Engineer at Microsoft Cloud & AI, working at AI intrastructure, model compilation, libraries, and ONNX.

Li-Wen received the B.S. degree in Electrical Engineering from National Taiwan University (NTU) in 2007. He received a Gold Medal from the 13th Asian Pacific Mathematics Olympiad (APMO) 2001 during his high school. Also, he received multiple honors during undergraduate. During his undergraduate, he investigated Rolling Shutter Effect of CMOS, gave its first numerical analysis, and proposed an efficient algorithm to compensate it. The result is published in IEEE Transactions on Image Processing, 2008 (doi).

After undergraduate, Li-Wen joined a startup team in an ultrasonic imaging lab of NTU to build pioneering prototypes of high-frequency ultrasonic imaging machines, which can provide real-time non-invasive imaging with microscopic resolution for biomedical research. One of the prototypes was used for preclinical tumor research in National Taiwan University Hospital (NTUH) and became a commercial product. The startup was sold and merged into Coretronic Corp in 2009 and then renamed as S-Sharp.

After the startup, Li-Wen joined in UIUC under the supervision of Professor Wen-mei W. Hwu. During his Master, he proposed the first parallel tridiagonal solver with pivoting for GPUs. The result is published in SC'12 (doi) and included as gtsv in NVIDIA CUSPARSE 5.5 or later. He is also a contributor for a well-known GPU benchmark suite, Parboil, and a somehow useful collaborative computing benchmark suite, Chai. In his Ph.D., he designed a tool chain for achieving performance portability across CPUs, GPUs, FPGAs and distributed systems. He earned his Master in Aug. 2014 and his Ph.D. in Aug. 2017.

Li-Wen Chang

About Me

Research/Work Interest

Biography

Let's Get In Touch!