Guangxuan Xiao

Guangxuan Xiao 肖光烜

I am a fourth-year Ph.D. candidate at MIT EECS, advised by Prof. Song Han.

My research focuses on efficient algorithms and systems for deep learning, particularly large foundation models.

I graduated from Tsinghua University with B.Eng. in Computer Science and B.Econ. in Finance in 2022 with honors, and was a visiting student researcher at Stanford University during 2020-2021.

Email / Google Scholar / Github / X / Linkedin

Blog

Linear: O(d) Softmax: O(e^d)	The Memory Capacity of Attention September 1, 2025 How much information can attention mechanisms store? Using relative error analysis, we show that linear attention scales linearly with head dimension while softmax attention scales exponentially with head dimension.
D_eff = W \|ln(ε)\| \|ln(1-α)\|	Why Stacking Sliding Windows Can't See Very Far August 25, 2025 A mathematical explanation of why sliding window attention's effective receptive field is O(W) rather than the theoretical O(LW), regardless of depth, due to information dilution and exponential decay from residual connections.
SNR = Δμ√(d/2B)	Statistics behind Block Sparse Attention August 22, 2025 A statistical model revealing how block sparse attention achieves efficiency and accuracy through learned similarity gaps.
softmax([sink,a₁,...,aₜ])	How Attention Sinks Keep Language Models Stable August 7, 2025 We discovered that attention sinks—where models park unused attention on initial tokens—are crucial for language model stability. Without them, models catastrophically fail when processing long conversations, but with attention sinks, they maintain stable performance across millions of tokens.

Selected Research

	XAttention: Block Sparse Attention with Antidiagonal Scoring Ruyi Xu, Guangxuan Xiao, Haofeng Huang, Junxian Guo, Song Han ICML 2025 [paper] [code]
	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, Junxian Guo, Shang Yang, Haotian Tang, Yao Fu, Song Han ICLR 2025 [paper] [code] [demo]
	Efficient Streaming Language Models with Attention Sinks Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis ICLR 2024 [paper] [code] [MIT News] [NVIDIA TensorRT-LLM] [on iPhone]
	SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han ICML 2023 [paper] [code] [NVIDIA TensorRT-LLM]
	FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han IJCV 2024 [website] [paper] [code]

Experience

	Massachusetts Institute of Technology 2022.08 - Present S.M. in Computer Science Ph.D. Candidate in Computer Science Advisor: Prof. Song Han
	Tsinghua University 2018.08 - 2022.07 B.Eng. in Computer Science B.Econ. in Economics (Second Major) Advisor: Prof. Zhiyuan Liu.
	Stanford University 2020.07 - 2021.06 Visiting Research Student Advisor: Prof. Jure Leskovec Mentor: Jiaxuan You
	Stanford University 2021.06 - 2021.11 Visiting Research Student through the UGVR program Advisor: Prof. Jiajun Wu, Prof. Leslie Pack Kaelbling Mentor: Jiayuan Mao

Work Experience

	NVIDIA 2024 - 2025 Research Intern Santa Clara, CA with Song Han Researching efficient large language models.
	Meta Inc. 2023 Research Scientist Intern Menlo Park, CA with Mike Lewis Developed efficient streaming language models.

Honors & Awards

Hewlett Packard Fellowship, 2022
Boeing Scholarship, 2021
Tsinghua "Future Scholar" Scientific Research Grant ($30,000), 2021
National Scholarship, 2020
Contemporary Undergraduate Mathematical Contest in Modeling, 1st Prize, 2020
Beijing "Challenge Cup" Academic Science and Technology Competition, 1st Prize, 2020
Tsinghua Comprehensive Excellence Scholarship, 2019

Miscellaneous

I love to play soccer. I was the captain and striker of my department soccer team.
I also love to play table tennis, Go (Weiqi), and piano. Beethoven's works are my favorite.

Template adapted from Jon Barron.