A tiny Research Scientist and Engineering Manager
at ByteDance
Feb. 2025-Present
Principal-level Research Scientist and Engineering Manager (Director)
- Led strategic research and development efforts, focusing on AI compiler optimizations, model deployment, and distributed training and inference acceleration.
- Directed high-impact projects in MLIR-based compilers, GPU and NPU acceleration, and large-scale model training and inference.
- Mentored innovative projects in programming languages, computer architecture, and acceleration domains.
ByteDance Seed
May 2023-Feb. 2025
Senior Staff-level Research Scientist and Engineering Manager
- Led a model compilation team for an MLIR-based AI compiler targeting CPU, GPU, edge, NPUs, and distributed systems. Spearheaded SW-HW co-design for multiple generations of in-house NPUs and DPUs. Directed the
ByteIR project, collaborating with OpenXLA, Torch-MLIR, and ONNX-MLIR.
- Managed a production team deploying 1,000s of models across 10,000s of in-house NPUs.
- Managed a heterogeneous computing team optimizing training, inference, and deployment across commercial DLAs. Oversaw performance benchmarking through the ByteMLPerf project.
- Directed GPU acceleration initiatives, including the FLUX project and the AMD GPU acceleration effort.
- Led two Triton projects for in-house NPUs and distributed systems.
- Supervised distributed LLM training and inference acceleration projects, including the training acceleration framework, veScale, and ShadowKV.
- Led a large-scale model performance estimation initiative.
- Mentored innovative projects in programming languages, computer architecture, and acceleration domains.
ByteDance Seed/AML
Jun 2021-May 2023
Staff-level Research Scientist
- Initiated and led a model compilation team for an MLIR-based AI compiler for CPU, GPU, and NPUs, along with
SW-HW co-design for NPUs. Founded and directed the ByteIR project.
- Led a production team deploying numerous production models on in-house NPUs.
ByteDance AML
Sept 2018-Jun 2021
Senior Software Engineer
- Led an AI compilation team designing an LLVM/MLIR-based compiler for Maia 100, GPUs, and CPUs with
ONNX Runtime. Led the Argo and Nuphar projects.
Achieved up to 3x cost reduction and 10x acceleration in
model release cycles for Azure Cognitive Services.
- Worked on performance optimizations and a programming model for Maia 100.
- Co-designed pipeline parallelism on ONNX Runtime training.
Microsoft Cloud & AI
July 2017-Aug 2018
Software Engineer
- Co-led and initiated an AI compiler and runtime for CNTK (Microsoft Cognitive Toolkit) models, achieving over
10x speedups compared to native CNTK.
Microsoft AI & Research
Fall 2009-Summer 2017
Research Assistant
- Conducted research in high-performance GPU computing, compiler optimization, and computer architecture.
- Developed a high-level, performance-portable programming system for CPUs, GPUs, FPGAs, and clusters,
incorporating a novel language, compiler, and runtime.
- Contributed to widely-used benchmark suites for hardware characterization and optimization studies, including
Parboil, SPEC ACCEL, and Chai for heterogeneous computing systems.
- Developed GPU optimization strategies for diverse computation patterns, including dynamic parallelism, data transformations (padding, transposition, casting, packing), reductions, and dense/sparse BLAS. Contributed
to multiple GPU libraries, including the first GPU pivoting tridiagonal solver for NVIDIA CUSPARSE and multi-dimensional Empirical Mode Decomposition for signal processing.
- Analyzed cache sensitivity of GPU applications and implemented cache protection, cache bypassing, and thread throttling to enhance throughput.
IMPACT Lab., UIUC,
supervised by Dr. Wen-mei W. Hwu
Summer 2012
Intern
- Implemented real-time image inpainting optimizations for Tegra 3, enhancing graphical performance
NVIDIA
Feb. 2008-July 2009
Full-time Research Assistant and Engineer of a Stealth Mode Startup
- Designed and developed an embedded heterogeneous system for high-frequency real-time ultrasonic imaging
using FPGA and GPU. The system was successfully commercialized by a startup.
Ultrasonic Imaging Lab., NTU,
supervised by Dr. Pai-Chi Li
Sept. 2004–Jun. 2006
Undergrad. Research Assistant
- Conducted pioneering research on the rolling shutter effect in CMOS cameras, developing analysis and
compensation techniques.
- Explored light-field camera design and investigated its visual effects.
MPAC Lab at NTU,
supervised by Dr. Homer H. Chen
Jul. 2017
Ph.D. in Electrical and Computer Engineering (ECE)
Under Dr. Wen-mei Hwu's supervision.
University of Illinois at Urbana-Champaign (UIUC), IL
Aug. 2014
Master in ECE
Under Dr. Wen-mei Hwu's
supervision.
University of Illinois at Urbana-Champaign (UIUC), IL
Jun. 2007
BS in Electrical Engineering (EE), with a Minor degree in Mathematics
National Taiwan University (NTU), Taipei, Taiwan
Aug. 2006-May 2007
Visiting student in ECE
UIUC
2012-2013
Dan Vivoli Endowed Fellowship
ECE, UIUC
2009-2011
Integrative Graduate Education and Research Traineeship (IGERT): Neuroengineering
NSF, USA
2006-2007
Taiwan Merit Scholarship
Taiwan
2005
Undergraduate Student Research Fellowship
National Science Council, Taiwan
2005
Pan Wen-Yuan Scholarship
Pan Wen-Yuan Foundation, Taiwan
3 times
Presidential Awards
NTU, Taiwan
2001
Gold medal in Asian Pacific Mathematics Olympiad (APMO)