We propose shared gradient discovery & superposition as a mechanism underlying generalization in LLMs, where shared gradients lead to inherently generalizing shared solutions. To validate our hypothesis, we study circuit emergence as one form of learning such generalizing solutions. We find that our hypothesis can indeed explain and shed new light on circuit emergence and generalization.
ICLR 2026 Workshop on Scientific Methods for Understanding Deep Learning (Sci4DL 2026)


