About Me
I am a Principal Research Scientist and Engineering Director at ByteDance Seed, working on GenAI systems, distributed training, AI infrastructure, model compilation, and performance-critical libraries across GPUs and AI ASICs. I have led teams and built Triton-distributed (distributed Triton for parallel systems), ShadowKV (high-throughput long-context LLM inference), veScale (a PyTorch-native LLM training framework), Flux (communication-overlapping tensor-parallel primitives), ByteIR (model compilation across heterogeneous hardware), and ByteMLPerf (AI accelerator benchmarking). My CV is available upon request via email (dddscy AT gmail DOT com).
Previously, I was a Senior Software Engineer at Microsoft Cloud & AI, focusing on AI infrastructure, model compilation, and ONNX Runtime deployment across CPUs, GPUs, and AI ASICs.
I completed my Ph.D. under Professor Wen-mei Hwu in the IMPACT group, where I developed TANGRAM (Transformation-, Architecture-, Network-, Granularity-, Runtime-aware Adaptive Machine), a performance-portable high-level language designed to deliver strong performance across CPUs, GPUs, FPGAs, and distributed systems from a single source codebase.
News: We have one performance estimation paper accepted in MLSys 2026
News: We have one programming language and compiler paper (Forge) accepted in ICLR 2026
News: We have one inference paper (ShadowKV) accepted in ICML 2025 as a spotlight