A tiny Research Scientist and Engineering Director at ByteDance
Work Experience
Feb. 2025–Present
ByteDance Seed
Principal-level Research Scientist & Engineering Director
Lead research and engineering strategy for AI compilers, model deployment, and distributed training and inference acceleration.
Guide MLIR-based compiler initiatives (Triton-distributed), GPU/ASIC acceleration, and large-scale training and inference systems.
Mentor cross-disciplinary projects in programming languages, computer architecture, and system acceleration.
May 2023–Feb. 2025
ByteDance Seed/AML
Senior Staff-level Research Scientist & Engineering Manager
Led an MLIR-based compiler team targeting CPU, GPU, edge, AI ASICs, and distributed systems; drove SW/HW co-design for multiple generations of in-house AI ASICs and DPUs. Directed ByteIR in collaboration with OpenXLA, Torch-MLIR, and ONNX-MLIR.
Managed production deployment of 1,000s of models across 10,000s of in-house AI ASICs.
Led a heterogeneous computing team optimizing training, inference, and deployment across commercial DLAs; oversaw benchmarking via ByteMLPerf.
Directed GPU acceleration initiatives, including FLUX and an AMD GPU acceleration effort.
Led Triton projects for in-house AI ASICs and distributed systems (Triton-distributed).
Supervised distributed LLM training and inference acceleration, including veScale and ShadowKV.
Initiated a large-scale model performance estimation program.
Mentored projects in programming languages, computer architecture, and acceleration.
Jun 2021–May 2023
ByteDance AML
Staff-level Research Scientist
Founded and led a model compilation team for an MLIR-based AI compiler across CPUs, GPUs, and AI ASICs, including SW/HW co-design for AI ASICs; launched ByteIR.
Led a production team deploying numerous models on in-house AI ASICs.
Sept 2018–Jun 2021
Microsoft Cloud & AI
Senior Software Engineer
Led an AI compilation team building an LLVM/MLIR-based compiler for Maia 100, GPUs, and CPUs within ONNX Runtime. Led the Argo and Nuphar projects, delivering up to 3x cost reduction and 10x faster model release cycles for Azure Cognitive Services.
Developed performance optimizations and a programming model for Maia 100.
Co-designed pipeline parallelism for ONNX Runtime training.
July 2017–Aug 2018
Microsoft AI & Research
Software Engineer
Co-led the AI compiler and runtime for CNTK models, achieving 10x+ speedups over native CNTK.
Fall 2009–Summer 2017
IMPACT Lab., UIUC
Supervised by Dr. Wen-mei W. Hwu
Research Assistant
Researched high-performance GPU computing, compiler optimization, and computer architecture.
Built a performance-portable programming system for CPUs, GPUs, FPGAs, and clusters, spanning language, compiler, and runtime design.
Contributed to benchmark suites for hardware characterization and optimization, including Parboil, SPEC ACCEL, and Chai.
Developed GPU optimization techniques for dynamic parallelism, data transformations, reductions, and dense/sparse BLAS; contributed the first GPU pivoting tridiagonal solver for NVIDIA cuSPARSE and multi-dimensional EMD for signal processing.
Analyzed cache sensitivity and implemented cache protection, bypassing, and thread throttling to improve throughput.
Summer 2012
NVIDIA
Intern
Implemented real-time image inpainting optimizations for Tegra 3 to improve graphics performance.
Feb. 2008–July 2009
Ultrasonic Imaging Lab., NTU
Supervised by Dr. Pai-Chi Li
Full-time Research Assistant & Engineer of a Stealth Mode Startup
Designed and developed an embedded heterogeneous system for high-frequency real-time ultrasonic imaging using FPGA and GPU; the system was commercialized by a startup.
Sept. 2004–Jun. 2006
MPAC Lab, NTU
Supervised by Dr. Homer H. Chen
Undergrad. Research Assistant
Conducted early research on the rolling shutter effect in CMOS cameras, developing analysis and compensation techniques.
Explored light-field camera design and its visual effects.
Education
Jul. 2017
University of Illinois at Urbana-Champaign (UIUC), IL
Ph.D. in Electrical and Computer Engineering (ECE)
Advisor: Dr. Wen-mei Hwu.
Aug. 2014
University of Illinois at Urbana-Champaign (UIUC), IL
M.S. in Electrical and Computer Engineering (ECE)
Advisor: Dr. Wen-mei Hwu.
Jun. 2007
National Taiwan University (NTU), Taipei, Taiwan
B.S. in Electrical Engineering (EE)
Minor in Mathematics.
Aug. 2006–May 2007
UIUC
Visiting Student in ECE
Honors & Awards
2012–2013
ECE, UIUC
Dan Vivoli Endowed Fellowship
2009–2011
NSF, USA
Integrative Graduate Education and Research Traineeship (IGERT): Neuroengineering