Efficient Learning Orals @NeurIPS 2023
We will walk through 3 oral presentations on Efficient Learning at NeurIPS 2023 in 2 minutes.
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
Presentor: Dan Fu@Stanford. Paper.
Attention and MLP layers scale quadratically in sequence length and model dimension. This work proposes Monarch Mixer, an alternative architecture with sub-quadratic