TY - GEN
T1 - Neural Network Training as an Optimal Control Problem :-An Augmented Lagrangian Approach-
AU - Evens, Brecht
AU - Latafat, Puya
AU - Themelis, Andreas
AU - Suykens, Johan
AU - Patrinos, Panagiotis
N1 - Funding Information:
This work was supported by the Research Foundation Flanders (FWO) research projects G0A0920N, G086518N, G086318N, and PhD grant 1196820N; Research Council KU Leuven C1 project No. C14/18/068; Fonds de la Recherche Scientifique – FNRS and the Fonds Wetenschappelijk On-derzoek – Vlaanderen under EOS project no 30468160 (SeLMA); and the Japan Society for the Promotion of Science (JSPS) KAKENHI grant JP21K17710. Johan Suykens and Panagiotis Patrinos are affiliated to Leu-ven.AI - KU Leuven institute for AI, B-3000, Leuven, Belgium.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Training of neural networks amounts to nonconvex optimization problems that are typically solved by using backpropagation and (variants of) stochastic gradient descent. In this work, we propose an alternative approach by viewing the training task as a nonlinear optimal control problem. Under this lens, backpropagation amounts to the sequential approach (single shooting) to optimal control, where the states variables have been eliminated. It is well known that single shooting may lead to ill-conditioning, and for this reason the simultaneous approach (multiple shooting) is typically preferred. Motivated by this hypothesis, an augmented Lagrangian algorithm is developed that only requires an approximate solution to the Lagrangian subproblems up to a user-defined accuracy. By applying this framework to the training of neural networks, it is shown that the inner Lagrangian subproblems are amenable to be solved using Gauss-Newton iterations. To fully exploit the structure of neural networks, the resulting linear least-squares problems are addressed by employing an approach based on forward dynamic programming. Finally, the effectiveness of our method is showcased on regression datasets.
AB - Training of neural networks amounts to nonconvex optimization problems that are typically solved by using backpropagation and (variants of) stochastic gradient descent. In this work, we propose an alternative approach by viewing the training task as a nonlinear optimal control problem. Under this lens, backpropagation amounts to the sequential approach (single shooting) to optimal control, where the states variables have been eliminated. It is well known that single shooting may lead to ill-conditioning, and for this reason the simultaneous approach (multiple shooting) is typically preferred. Motivated by this hypothesis, an augmented Lagrangian algorithm is developed that only requires an approximate solution to the Lagrangian subproblems up to a user-defined accuracy. By applying this framework to the training of neural networks, it is shown that the inner Lagrangian subproblems are amenable to be solved using Gauss-Newton iterations. To fully exploit the structure of neural networks, the resulting linear least-squares problems are addressed by employing an approach based on forward dynamic programming. Finally, the effectiveness of our method is showcased on regression datasets.
UR - http://www.scopus.com/inward/record.url?scp=85126032097&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126032097&partnerID=8YFLogxK
U2 - 10.1109/CDC45484.2021.9682842
DO - 10.1109/CDC45484.2021.9682842
M3 - Conference contribution
AN - SCOPUS:85126032097
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 5136
EP - 5143
BT - 60th IEEE Conference on Decision and Control, CDC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 60th IEEE Conference on Decision and Control, CDC 2021
Y2 - 13 December 2021 through 17 December 2021
ER -