Li-Wen Chang


A tiny Research Scientist and Engineering Manager
at ByteDance

About Me

I am currently a Senior Staff-level Research Scientist and Engineering Manager at ByteDance AML, working at GenAI, distributed training, AI intrastructure, model compilation, libraries, PyTorch, Tensorflow, ONNX, NPU ASICs. I managed, led several teams, and built veScale (a PyTorch native LLM training framework), ByteIR (a model compilation solution for various hardware), and ByteMLPerf (an AI accelerator benchmarking tool). My CV is available upon request through email (dddscy AT gmail DOT com).

Previously, I was Senior SW Engineer at Microsoft Cloud & AI, working at AI intrastructure, model compilation, libraries, PyTorch, ONNX, and NPUs.

I pursued my Ph.D. in Professor Wen-mei Hwu's IMPACT group. I worked on a performance portable high-level language called TANGRAM (Transformation-, Architecture-, Network-, Granularity-, Runtime-aware Adaptive Machine). It is designed to achieve high performance across CPUs, GPUs, FPGAs and distributed systems from single source code.

News: I gave a talk of our work at ByteDance in C4ML workshop at CGO 2023.
News: Our high-performance BLAS library for deep learning paper in is released in arXiv. It can deliever 1.4x speedups in average over MKL.
News: Our CPU-FPGA OpenCL high-level synthesis paper is accepted in ICPE 2019.

Research/Work Interest


Parallel computing

Heterogeneous computing

Optimization for compiler

*High-performance computing

*Especially for applications in computer vision, sparse linear algebra, deep learning, or graph

Biography


Li-Wen Chang is a Research Scientist and Engineering Manager at ByteDance AML, workinng at AI intrastructure, model compilation, libraries, Tensorflow, PyTorch, and NPU ASICs, since June 2021. Previously, he was a Senior Software Engineer at Microsoft Cloud & AI, working at AI intrastructure, model compilation, libraries, and ONNX.

Li-Wen received the B.S. degree in Electrical Engineering from National Taiwan University (NTU) in 2007. He received a Gold Medal from the 13th Asian Pacific Mathematics Olympiad (APMO) 2001 during his high school. Also, he received multiple honors during undergraduate. During his undergraduate, he investigated Rolling Shutter Effect of CMOS, gave its first numerical analysis, and proposed an efficient algorithm to compensate it. The result is published in IEEE Transactions on Image Processing, 2008 (doi).

After undergraduate, Li-Wen joined a startup team in an ultrasonic imaging lab of NTU to build pioneering prototypes of high-frequency ultrasonic imaging machines, which can provide real-time non-invasive imaging with microscopic resolution for biomedical research. One of the prototypes was used for preclinical tumor research in National Taiwan University Hospital (NTUH) and became a commercial product. The startup was sold and merged into Coretronic Corp in 2009 and then renamed as S-Sharp.

After the startup, Li-Wen joined in UIUC under the supervision of Professor Wen-mei W. Hwu. During his Master, he proposed the first parallel tridiagonal solver with pivoting for GPUs. The result is published in SC'12 (doi) and included as gtsv in NVIDIA CUSPARSE 5.5 or later. He is also a contributor for a well-known GPU benchmark suite, Parboil, and a somehow useful collaborative computing benchmark suite, Chai. In his Ph.D., he designed a tool chain for achieving performance portability across CPUs, GPUs, FPGAs and distributed systems. He earned his Master in Aug. 2014 and his Ph.D. in Aug. 2017.

Let's Get In Touch!