Li-Wen Chang


A tiny Research Scientist and Engineering Manager
at ByteDance

About Me

I am currently a Senior Staff-level Research Scientist and Engineering Manager at ByteDance Seed/AML, working at GenAI, distributed training, AI intrastructure, model compilation, libraries, PyTorch, OpenAI Triton, Tensorflow, ONNX, NPU ASICs. I managed, led several teams, and built ShadowKV (high-throughput long-context LLM inference), veScale (a PyTorch native LLM training framework), Flux (a fast communication-overlapping library for tensor parallelism on GPUs), ByteIR (a model compilation solution for various hardware), and ByteMLPerf (an AI accelerator benchmarking tool). My CV is available upon request through email (dddscy AT gmail DOT com).

Previously, I was Senior SW Engineer at Microsoft Cloud & AI, working at AI intrastructure, model compilation, libraries, PyTorch, ONNX, and NPUs.

I pursued my Ph.D. in Professor Wen-mei Hwu's IMPACT group. I worked on a performance portable high-level language called TANGRAM (Transformation-, Architecture-, Network-, Granularity-, Runtime-aware Adaptive Machine). It is designed to achieve high performance across CPUs, GPUs, FPGAs and distributed systems from single source code.

News: Our high-throughput long-context LLM inference paper, ShadowKV, is released in arXiv.
News: Our high-performance communication overlapping paper, Flux, is released in arXiv.

Research/Work Interest


GenAI

MLSys

Heterogeneous Computing

Compiler Optimization

Biography


Li-Wen Chang is a Research Scientist and Engineering Manager at ByteDance AML, workinng at AI intrastructure, model compilation, libraries, Tensorflow, PyTorch, and NPU ASICs, since June 2021. Previously, he was a Senior Software Engineer at Microsoft Cloud & AI, working at AI intrastructure, model compilation, libraries, and ONNX.

Li-Wen received the B.S. degree in Electrical Engineering from National Taiwan University (NTU) in 2007. He received a Gold Medal from the 13th Asian Pacific Mathematics Olympiad (APMO) 2001 during his high school. Also, he received multiple honors during undergraduate. During his undergraduate, he investigated Rolling Shutter Effect of CMOS, gave its first numerical analysis, and proposed an efficient algorithm to compensate it. The result is published in IEEE Transactions on Image Processing, 2008 (doi).

After undergraduate, Li-Wen joined a startup team in an ultrasonic imaging lab of NTU to build pioneering prototypes of high-frequency ultrasonic imaging machines, which can provide real-time non-invasive imaging with microscopic resolution for biomedical research. One of the prototypes was used for preclinical tumor research in National Taiwan University Hospital (NTUH) and became a commercial product. The startup was sold and merged into Coretronic Corp in 2009 and then renamed as S-Sharp.

After the startup, Li-Wen joined in UIUC under the supervision of Professor Wen-mei W. Hwu. During his Master, he proposed the first parallel tridiagonal solver with pivoting for GPUs. The result is published in SC'12 (doi) and included as gtsv in NVIDIA CUSPARSE 5.5 or later. He is also a contributor for a well-known GPU benchmark suite, Parboil, and a somehow useful collaborative computing benchmark suite, Chai. In his Ph.D., he designed a tool chain for achieving performance portability across CPUs, GPUs, FPGAs and distributed systems. He earned his Master in Aug. 2014 and his Ph.D. in Aug. 2017.

Let's Get In Touch!