μLO: Compute-Efficient Meta-Generalization of Learned Optimizers

back to publications

back

Abstract

Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can struggle to optimize unseen tasks (meta-generalize), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametriza-tion (μ P) for two state-of-the-art learned optimizer architectures and propose a simple meta-training recipe for μ -parameterized LOs ( μ LOs). Our empirical eval-uation demonstrates that LOs meta-trained with our recipe substantially improvemeta-generalization to wider unseen tasks when compared to LOs trained understandard parametrization (SP) using the same compute budget. We also empirically observe that μ LOs exhibit unexpectedly improved meta-generalization to deeper networks (5→ meta-training) and surprising generalization to much longer training horizons (25→ meta-training) when compared to SP LOs.

view publication

source code

share

view

source code

share

Authors

Andrei Mircea, Ildus Sadrtdinov, Irina Rish (42), Ekaterina Lobacheva

Venue

ICLR 2026 Workshop on Scientific Methods for Understanding Deep Learning (Sci4DL 2026)

Website Terms of Use

Candidate Privacy Notice

Website Privacy Notice