Experience


A tiny Research Scientist and Engineering Manager
at ByteDance

Work Experience


Feb. 2025-Present

Principal-level Research Scientist and Engineering Manager (Director)
- Led strategic research and development efforts, focusing on AI compiler optimizations, model deployment, and distributed training and inference acceleration.
- Directed high-impact projects in MLIR-based compilers, GPU and NPU acceleration, and large-scale model training and inference.
- Mentored innovative projects in programming languages, computer architecture, and acceleration domains.

ByteDance Seed


May 2023-Feb. 2025

Senior Staff-level Research Scientist and Engineering Manager
- Led a model compilation team for an MLIR-based AI compiler targeting CPU, GPU, edge, NPUs, and distributed systems. Spearheaded SW-HW co-design for multiple generations of in-house NPUs and DPUs. Directed the ByteIR project, collaborating with OpenXLA, Torch-MLIR, and ONNX-MLIR.
- Managed a production team deploying 1,000s of models across 10,000s of in-house NPUs.
- Managed a heterogeneous computing team optimizing training, inference, and deployment across commercial DLAs. Oversaw performance benchmarking through the ByteMLPerf project.
- Directed GPU acceleration initiatives, including the FLUX project and the AMD GPU acceleration effort.
- Led two Triton projects for in-house NPUs and distributed systems.
- Supervised distributed LLM training and inference acceleration projects, including the training acceleration framework, veScale, and ShadowKV.
- Led a large-scale model performance estimation initiative.
- Mentored innovative projects in programming languages, computer architecture, and acceleration domains.

ByteDance Seed/AML


Jun 2021-May 2023

Staff-level Research Scientist
- Initiated and led a model compilation team for an MLIR-based AI compiler for CPU, GPU, and NPUs, along with SW-HW co-design for NPUs. Founded and directed the ByteIR project.
- Led a production team deploying numerous production models on in-house NPUs.

ByteDance AML


Sept 2018-Jun 2021

Senior Software Engineer
- Led an AI compilation team designing an LLVM/MLIR-based compiler for Maia 100, GPUs, and CPUs with ONNX Runtime. Led the Argo and Nuphar projects. Achieved up to 3x cost reduction and 10x acceleration in model release cycles for Azure Cognitive Services.
- Worked on performance optimizations and a programming model for Maia 100.
- Co-designed pipeline parallelism on ONNX Runtime training.

Microsoft Cloud & AI


July 2017-Aug 2018

Software Engineer
- Co-led and initiated an AI compiler and runtime for CNTK (Microsoft Cognitive Toolkit) models, achieving over 10x speedups compared to native CNTK.

Microsoft AI & Research


Fall 2009-Summer 2017

Research Assistant
- Conducted research in high-performance GPU computing, compiler optimization, and computer architecture.
- Developed a high-level, performance-portable programming system for CPUs, GPUs, FPGAs, and clusters, incorporating a novel language, compiler, and runtime.
- Contributed to widely-used benchmark suites for hardware characterization and optimization studies, including Parboil, SPEC ACCEL, and Chai for heterogeneous computing systems.
- Developed GPU optimization strategies for diverse computation patterns, including dynamic parallelism, data transformations (padding, transposition, casting, packing), reductions, and dense/sparse BLAS. Contributed to multiple GPU libraries, including the first GPU pivoting tridiagonal solver for NVIDIA CUSPARSE and multi-dimensional Empirical Mode Decomposition for signal processing.
- Analyzed cache sensitivity of GPU applications and implemented cache protection, cache bypassing, and thread throttling to enhance throughput.

IMPACT Lab., UIUC,
supervised by Dr. Wen-mei W. Hwu


Summer 2012

Intern
- Implemented real-time image inpainting optimizations for Tegra 3, enhancing graphical performance

NVIDIA


Feb. 2008-July 2009

Full-time Research Assistant and Engineer of a Stealth Mode Startup
- Designed and developed an embedded heterogeneous system for high-frequency real-time ultrasonic imaging using FPGA and GPU. The system was successfully commercialized by a startup.

Ultrasonic Imaging Lab., NTU,
supervised by Dr. Pai-Chi Li


Sept. 2004–Jun. 2006

Undergrad. Research Assistant
- Conducted pioneering research on the rolling shutter effect in CMOS cameras, developing analysis and compensation techniques.
- Explored light-field camera design and investigated its visual effects.

MPAC Lab at NTU,
supervised by Dr. Homer H. Chen


Education


Jul. 2017

Ph.D. in Electrical and Computer Engineering (ECE)
Under Dr. Wen-mei Hwu's supervision.

University of Illinois at Urbana-Champaign (UIUC), IL


Aug. 2014

Master in ECE
Under Dr. Wen-mei Hwu's supervision.

University of Illinois at Urbana-Champaign (UIUC), IL


Jun. 2007

BS in Electrical Engineering (EE), with a Minor degree in Mathematics

National Taiwan University (NTU), Taipei, Taiwan


Aug. 2006-May 2007

Visiting student in ECE

UIUC


Honors & Awards


2012-2013

Dan Vivoli Endowed Fellowship

ECE, UIUC


2009-2011

Integrative Graduate Education and Research Traineeship (IGERT): Neuroengineering

NSF, USA


2006-2007

Taiwan Merit Scholarship

Taiwan


2005

Undergraduate Student Research Fellowship

National Science Council, Taiwan


2005

Pan Wen-Yuan Scholarship

Pan Wen-Yuan Foundation, Taiwan


3 times

Presidential Awards

NTU, Taiwan


2001

Gold medal in Asian Pacific Mathematics Olympiad (APMO)


Let's Get In Touch!