Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can struggle to optimize unseen tasks (meta-generalize), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametriza-tion (μ P) for two state-of-the-art learned optimizer architectures and propose a simple meta-training recipe for μ -parameterized LOs ( μ LOs). Our empirical eval-uation demonstrates that LOs meta-trained with our recipe substantially improvemeta-generalization to wider unseen tasks when compared to LOs trained understandard parametrization (SP) using the same compute budget. We also empirically observe that μ LOs exhibit unexpectedly improved meta-generalization to deeper networks (5→ meta-training) and surprising generalization to much longer training horizons (25→ meta-training) when compared to SP LOs.
ICLR 2026 Workshop on Scientific Methods for Understanding Deep Learning (Sci4DL 2026)


