[2410.24206] Understanding Optimization in Deep Learning with Central Flows

September 26, 2025 3 Views

[Submitted on 31 Oct 2024 (v1), last revised 25 Sep 2025 (this version, v2)]

View a PDF of the paper titled Understanding Optimization in Deep Learning with Central Flows, by Jeremy M. Cohen and Alex Damian and Ameet Talwalkar and J. Zico Kolter and Jason D. Lee

View PDF

Abstract:Traditional theories of optimization cannot describe the dynamics of optimization in deep learning, even in the simple setting of deterministic training. The challenge is that optimizers typically operate in a complex, oscillatory regime called the “edge of stability.” In this paper, we develop theory that can describe the dynamics of optimization in this regime. Our key insight is that while the *exact* trajectory of an oscillatory optimizer may be challenging to analyze, the *time-averaged* (i.e. smoothed) trajectory is often much more tractable. To analyze an optimizer, we derive a differential equation called a “central flow” that characterizes this time-averaged trajectory. We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. By interpreting these central flows, we are able to understand how gradient descent makes progress even as the loss sometimes goes up; how adaptive optimizers “adapt” to the local loss landscape; and how adaptive optimizers implicitly navigate towards regions where they can take larger steps. Our results suggest that central flows can be a valuable theoretical tool for reasoning about optimization in deep learning.

Submission history

From: Jeremy Cohen [view email]
[v1]
Thu, 31 Oct 2024 17:58:13 UTC (24,837 KB)
[v2]
Thu, 25 Sep 2025 14:29:29 UTC (37,746 KB)

Source link

Deep Insight Think Deeper. See Clearer

[D] Why does BYOL/JEPA like models work? How does EMA prevent model collapse?

[D] cool applications of ML in fixed income markets?

[D] AAAI considered 2nd tier now?

[R] Building a deep learning image model system to identify BJJ positions in matches

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

[2509.18180] Large Language Models and Operations Research: A Structured Survey

Dreaming in Blocks — MineWorld, the Minecraft World Model

Is vibe coding ruining a generation of engineers?

[2410.24206] Understanding Optimization in Deep Learning with Central Flows

Submission history

About AI Writer

Check Also

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

Leave a Reply Cancel reply

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

السعودية.. تحرك أمني ضد سيدتين ترتديان "جرابين لحمل الأسلحة" – CNN Arabic

‘The Only Reason Call of Duty Exists Is Because EA Were Dicks,’ Battlefield Boss Vince Zampella Says

[2509.18180] Large Language Models and Operations Research: A Structured Survey

Stock market today: Live updates

Demystifying Machine Learning: A Beginner’s Guide | machine learning Guide 2025

Demystifying Deep Learning: A Beginner’s Guide | deep learning Guide 2025

Unleashing Creativity: The Power of Generative AI in Art and Design | generative ai Guide 2025

Understanding ChatGPT: The Future of Conversational AI | chatgpt Guide 2025

Transforming Industries: The Impact of OpenAI on Business Innovation | openai Guide 2025