Xiaoxia Wu

Xiaoxia Wu  

Xiaoxia (Shirley) Wu (吴晓霞)

Principal Scientist, Together AI

Email: shirley AT together dot ai

Google Scholar  |  LinkedIn  |  CV (PDF)

About Me

I am a Principal Scientist at Together AI (promoted May 2026; joined July 2024), where I lead research on LLM inference efficiency. My work spans speculative decoding, quantization, and RL-driven post-training. Concretely, I lead the Aurora and ATLAS projects — unified training–serving systems that reduce the training–serving mismatch in speculative decoding through continuous online adaptation from live traffic. I also build and scale speculator training and distillation pipelines (SFT, distillation, RL post-training) and drive full-stack inference optimization across quantization formats including FP8, FP6, NVFP4, INT4, INT2, ternary, and binary, with deployment in vLLM, SGLang, and TensorRT-LLM. Ping me if you're interested in building fast and efficient LLM inference!

Previously, I was a Senior Researcher on the DeepSpeed team at Microsoft, led by Zhewei Yao and Yuxiong He. I focused on algorithm- and system-level optimizations for large-scale LLM training and inference, with emphasis on compression, quantization, and multi-modal research. Key projects include DeepSpeed-FP6, DeepSpeed-Chat, ZeroQuant, and Extreme Compression for Pre-trained Transformers (NeurIPS 2022 Oral).

Before that, I was a postdoctoral research fellow at the University of Chicago and the Toyota Technological Institute at Chicago, mentored by Rebecca Willett, where I worked on differentially private empirical risk minimization.

I completed my Ph.D. in Machine Learning at The University of Texas at Austin, advised by Rachel Ward and co-advised by Léon Bottou. My dissertation, awarded the Frank Gerth III Dissertation Award (top dissertation in Mathematics at UT Austin), focused on gradient-based optimization and implicit regularization over non-convex landscapes. I interned at Meta AI Research (Fall 2017, with Léon Bottou) and at Google (Summer 2020, with Behnam Neyshabur and Ethan Dyer), where my work on curriculum learning was published as an ICLR 2021 Oral.

I hold an M.Sc. with Distinction in Financial Mathematics from the University of Edinburgh. Before that, I studied Mathematics and Applied Mathematics at Shantou University, where I was awarded the Li Ka-shing Scholarship to participate in Semester at Sea. I am from Guangdong, China, and speak Cantonese and Hakka.

Selected Publications

6,700+ citations, h-index 24 (as of May 2026). For a full list, see my Google Scholar.

LLM Systems & Speculative Decoding

Model Compression & Quantization

Optimization & Theory

*: equal contribution.

Mentees

Teaching (UT Austin)