Xiaoxia (Shirley) Wu (吴晓霞)
Email: my first name and my last name AT microsoft dot com |
I am currently a researcher at Microsoft, where I work on exciting and cutting-edge methods to reduce the time/budget in large-scale neural networks training. More information, please check deepspeed.ai. My research interests are in the areas of large-scale optimization and, more broadly, machine learning. My Ph.D. study is about efficient and robust methods (to hyperparameter tuning) such as adaptive gradient descent and batch normalization. I am always interested in chatting about research opportunities and collaboration.
I was a postdoctoral research fellow mentored by Rebecca Willett at University of Chicago and Toyota Techonological Institute at Chicago. I have successfully completed the Ph.D. program at The University of Texas at Austin, where I was fortunately advised by Rachel Ward and informally co-advised by Léon Bottou. I was a research intern at Facebook AI Research (New York office) during Fall 2017, and a research intern at Google working with Ethan Dyer and Behnam Neyshabur during Summer 2020.
I hold an M.Sc. with Distinction at the University of Edinburgh in Financial Mathematics. Before that, I spent a wonderful four-year in the Department of Mathematics and Applied Mathematics at Shantou University where I was awarded Li-Kashing Scholarship to participate in Semester at Sea. I am from Guangdong, China, speaking Cantonese and Hakka.
Adaptive Differentially Private Empirical Risk Minimization
Xiaoxia Wu, Lingxiao Wang, Irina Cristali, Quanquan Gu, Rebecca Willett
arXiv:2110.07435
AdaLoss: A computationally-efficient and provably convergent adaptive gradient method
Xiaoxia Wu, Yuege Xie, Simon Du and Rachel Ward
arXiv:2109.08282
Hierarchical Learning for Generation with Long Source Sequences
Tobias Rohde, Xiaoxia Wu, and Yinhan Liu
arXiv:2104.07545
When Do Curricula Work?
Xiaoxia Wu, Ethan Dyer, and Behnam Neyshabur
ICLR (Oral, 53 papers accepted as oral out of 2997 submissions), 2021
[code, slides]
Implicit Regularization and Convergence for Weight
Normalization
Xiaoxia Wu*, Edgar Dobriban*, Tongzheng Ren*, Shanshan Wu*, Yuanzhi Li, Suriya Gunasekar, Rachel Ward and Qiang
Liu
NeurIPS, 2020
[slides]
Choosing the Sample with Lowest Loss makes SGD Robust
Vatsal Shah, Xiaoxia Wu, and Sujay Sanghavi
AISTATS, 2020
Linear Convergence of Adaptive Stochastic Gradient Descent
Yuege Xie, Xiaoxia Wu, and Rachel Ward
AISTATS, 2020
Global Convergence of Adaptive Gradient Methods for An
Over-parameterized Neural Network
Xiaoxia Wu, Simon S. Du, and Rachel Ward
preprint, 2019
AdaGrad stepsizes: Sharp convergence over
nonconvex landscapes
Rachel Ward*, Xiaoxia Wu*, Leon Bottou
ICML (Oral), 2019
(The longer version is published in Journal of Machine Learning Research)
[code, 20 mins video and slides, 机器之心报导]
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu*, Rachel Ward*, Leon Bottou
preprint, 2018
An Optimal Mortgage Refinancing Strategy with Stochastic Interest Rate
Xiaoxia Wu, Dejun Xie, David A Edwards
Computational Economics, 1-23, 2018
Value-at-Risk estimation with stochastic interest rate models for option-bond portfolios
Xiaoyu Wang, Dejun Xie, Jingjing Jiang, Xiaoxia Wu, Jia He
Finance Research Letters 21 (2017): 10-20
Probability I, Spring 19
Sci Computation in Numerical Analysis, Spring 18
Linear Algebra and Matrix Theory, Spring 17
Seq, Series, and Multivariate Calculus, Spring16, Fall16
Differential and Integral Calculus, Fall 14, Spring 15, Fall 16