About Me
I am currently a Senior Staff-level Research Scientist and Engineering Manager at ByteDance Seed/AML, working at GenAI, distributed training, AI intrastructure, model compilation, libraries, PyTorch, OpenAI Triton, Tensorflow, ONNX, NPU ASICs. I managed, led several teams, and built ShadowKV (high-throughput long-context LLM inference), veScale (a PyTorch native LLM training framework), Flux (a fast communication-overlapping library for tensor parallelism on GPUs), ByteIR (a model compilation solution for various hardware), and ByteMLPerf (an AI accelerator benchmarking tool). My CV is available upon request through email (dddscy AT gmail DOT com).
Previously, I was Senior SW Engineer at Microsoft Cloud & AI, working at AI intrastructure, model compilation, libraries, PyTorch, ONNX, and NPUs.
I pursued my Ph.D. in Professor Wen-mei Hwu's IMPACT group. I worked on a performance portable high-level language called TANGRAM (Transformation-, Architecture-, Network-, Granularity-, Runtime-aware Adaptive Machine). It is designed to achieve high performance across CPUs, GPUs, FPGAs and distributed systems from single source code.
News: Our high-throughput long-context LLM inference paper, ShadowKV, is released in arXiv.
News: Our high-performance communication overlapping paper, Flux, is released in arXiv.