Xiaoxia Wu

None  

Xiaoxia (Shirley) Wu (吴晓霞)

Email: shirley AT Together dot ai

Google Scholar

About me

I am currently a Senior Staff Scientist at TogetherAI (since July 2024), where I build tools and work on quantization. Ping me if you're interested in building and making inference fast!

Previously, I was a Senior Researcher at Microsoft GenAI in which we developed the Phi-3 family of models. I was fortunate to be a member of the DeepSpeed team, led by Zhewei Yao and Yuxiong He , and worked closely with Weizhu's Team. At DeepSpeed, I focused on system- and algorithm-level optimizations for large-scale training and inference of LLMs, with a particular emphasis on compression, long sequences, multi-modal research. Some of my projects includes DeepSpeed-FP6 and DeepSpeed-Chat . More information, please check deepspeed.ai.

I was a postdoctoral research fellow mentored by Rebecca Willett at University of Chicago and Toyota Techonological Institute at Chicago. I have successfully completed the Ph.D. program at The University of Texas at Austin, where I was fortunately advised by Rachel Ward and informally co-advised by Léon Bottou. My PhD research interests are in the areas of optimization method:it is about efficient and robust methods (to hyperparameter tuning) such as adaptive gradient descent and batch normalization. I was a research intern at Facebook AI Research (New York office) during Fall 2017, and a research intern at Google working with Ethan Dyer and Behnam Neyshabur during Summer 2020.

I hold an M.Sc. with Distinction at the University of Edinburgh in Financial Mathematics. Before that, I spent a wonderful four-year in the Department of Mathematics and Applied Mathematics at Shantou University where I was awarded Li-Kashing Scholarship to participate in Semester at Sea. I am from Guangdong, China, speaking Cantonese and Hakka.

Papers and Preprints (updated on Nov 2021)


*: indicating equal contribution.

Teaching Assistant at UT Austin