Li-Wen's Webpage

Work Experience

Feb. 2025–Present

ByteDance Seed

Principal Research Scientist & Engineering Director

Lead research and engineering strategy for AI compilers, model deployment, and distributed training and inference acceleration.
Direct global model deployment teams across U.S. and international infrastructure, enabling production ML rollouts at tens-of-thousands-of-accelerators scale.
Drive multiple Triton compiler initiatives (distributed Triton and a confidential AI ASIC extension), spanning GPU/ASIC acceleration, AI code synthesis, and large-scale training and inference systems.
Mentor cross-disciplinary initiatives across programming languages, computer architecture, and system acceleration.

May 2023–Feb. 2025

ByteDance Seed/AML

Senior Staff Research Scientist & Engineering Manager

Led an MLIR-based compiler team targeting CPU, GPU, edge, AI ASICs, and distributed systems; drove SW/HW co-design for multiple generations of in-house AI ASICs and DPUs. Directed ByteIR in collaboration with OpenXLA, Torch-MLIR, and ONNX-MLIR.
Managed production deployment of 1,000s of models across 10,000s of in-house AI ASICs.
Led a heterogeneous computing team optimizing training, inference, and deployment across commercial DLAs; oversaw benchmarking via ByteMLPerf.
Directed GPU acceleration initiatives, including FLUX and an AMD GPU acceleration effort.
Led Triton projects for in-house AI ASICs and distributed systems (Triton-distributed).
Supervised distributed LLM training and inference acceleration, including veScale and ShadowKV.
Initiated a large-scale model performance estimation program (Charon).
Mentored projects in programming languages, computer architecture, and acceleration.

Jun 2021–May 2023

ByteDance AML

Staff Research Scientist

Founded and led a model compilation team for an MLIR-based AI compiler across CPUs, GPUs, and AI ASICs, including SW/HW co-design for AI ASICs; launched ByteIR.
Led a production team deploying numerous models on in-house AI ASICs.

Sept 2018–Jun 2021

Microsoft Cloud & AI

Senior Software Engineer

Led an AI compilation team building an LLVM/MLIR-based compiler for Maia 100, GPUs, and CPUs within ONNX Runtime. Led the Argo and Nuphar projects, delivering up to 3x cost reduction and 10x faster model release cycles for Azure Cognitive Services.
Developed performance optimizations and a programming model for Maia 100.
Co-designed pipeline parallelism for ONNX Runtime training.

July 2017–Aug 2018

Microsoft AI & Research

Software Engineer

Co-led the AI compiler and runtime for CNTK models, achieving 10x+ speedups over native CNTK.

Fall 2009–Summer 2017

IMPACT Lab., UIUC

Supervised by Dr. Wen-mei W. Hwu

Research Assistant

Researched high-performance GPU computing, compiler optimization, and computer architecture.
Built a performance-portable programming system for CPUs, GPUs, FPGAs, and clusters, spanning language, compiler, and runtime design.
Contributed to benchmark suites for hardware characterization and optimization, including Parboil, SPEC ACCEL, and Chai.
Developed GPU optimization techniques for dynamic parallelism, data transformations, reductions, and dense/sparse BLAS; contributed the first GPU pivoting tridiagonal solver for NVIDIA cuSPARSE and multi-dimensional EMD for signal processing.
Analyzed cache sensitivity and implemented cache protection, bypassing, and thread throttling to improve throughput.

Summer 2012

NVIDIA

Intern

Implemented real-time image inpainting optimizations for Tegra 3 to improve graphics performance.

Feb. 2008–July 2009

Ultrasonic Imaging Lab., NTU

Supervised by Dr. Pai-Chi Li

Full-time Research Assistant & Engineer of a Stealth Mode Startup

Designed and developed an embedded heterogeneous system for high-frequency real-time ultrasonic imaging using FPGA and GPU; the system was commercialized by a startup.

Sept. 2004–Jun. 2006

MPAC Lab, NTU

Supervised by Dr. Homer H. Chen

Undergrad. Research Assistant

Conducted early research on the rolling shutter effect in CMOS cameras, developing analysis and compensation techniques.
Explored light-field camera design and its visual effects.

Honors & Awards

2012–2013

ECE, UIUC

Dan Vivoli Endowed Fellowship

2009–2011

NSF, USA

Integrative Graduate Education and Research Traineeship (IGERT): Neuroengineering

2006–2007

Taiwan

Taiwan Merit Scholarship

2005

National Science Council, Taiwan

Undergraduate Student Research Fellowship

2005

Pan Wen-Yuan Foundation, Taiwan

Pan Wen-Yuan Scholarship

2002-2007, 3 times

NTU, Taiwan

Presidential Awards

2001

APMO

Experience