A Bregman Learning Framework for Sparse Neural Networks

Bungert L, Roith T, Tenbrinck D, Burger M (2022)

Publication Language: English

Publication Type: Journal article, Original article

Publication year: 2022

Journal

Journal of Machine Learning Research Microtome Publishing

Open Access Link: https://www.jmlr.org/papers/v23/21-0545.html

Abstract

We propose a learning framework based on stochastic Bregman iterations, also known as mirror descent, to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated version using momentum, and AdaBreg, which is a Bregmanized generalization of the Adam algorithm. In contrast to established methods for sparse training the proposed family of algorithms constitutes a regrowth strategy for neural networks that is solely optimization-based without additional heuristics. Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network. The proposed approach is extremely easy and efficient, yet supported by the rich mathematical theory of inverse scale space methods. We derive a statistically profound sparse parameter initialization strategy and provide a rigorous stochastic convergence analysis of the loss decay and additional convergence proofs in the convex regime. Using only 3.4%" role="presentation" style="display: inline; line-height: normal; font-size: medium; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; color: rgb(0, 0, 0); font-family: "Times New Roman"; position: relative;">3.4%3.4% of the parameters of ResNet-18 we achieve 90.2%" role="presentation" style="display: inline; line-height: normal; font-size: medium; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; color: rgb(0, 0, 0); font-family: "Times New Roman"; position: relative;">90.2.2% test accuracy on CIFAR-10, compared to 93.6%" role="presentation" style="display: inline; line-height: normal; font-size: medium; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; color: rgb(0, 0, 0); font-family: "Times New Roman"; position: relative;">93.6.6% using the dense network. Our algorithm also unveils an autoencoder architecture for a denoising task. The proposed framework also has a huge potential for integrating sparse backpropagation and resource-friendly training. Code is available at https://github.com/TimRoith/BregmanLearning.

Authors with CRIS profile

Leon Bungert
Tim Roith Lehrstuhl für Angewandte Mathematik (Modellierung und Numerik) Daniel Tenbrinck Lehrstuhl für Angewandte Mathematik (Modellierung und Numerik) Martin Burger Lehrstuhl für Angewandte Mathematik (Modellierung und Numerik)

How to cite

APA:

Bungert, L., Roith, T., Tenbrinck, D., & Burger, M. (2022). A Bregman Learning Framework for Sparse Neural Networks. Journal of Machine Learning Research.

MLA:

Bungert, Leon, et al. "A Bregman Learning Framework for Sparse Neural Networks." Journal of Machine Learning Research (2022).

BibTeX: Download