Zuazua Iriondo E (2024)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2024
Pages Range: 116-123
Conference Proceedings Title: Proceedings FGS 20204
Open Access Link: https://digibuo.uniovi.es/dspace/handle/10651/74691
This paper presents our recent advancements at the intersection of machine learning and control theory.
We focus specifically on utilizing control theoretical tools to elucidate the underlying mechanisms driving
the success of machine learning algorithms. By enhancing the explainability of these algorithms, we aim
to contribute to their ongoing improvement and more effective application. Our research explores several
critical areas:
Firstly, we investigate the memorization, representation, classification, and approximation properties of
residual neural networks (ResNets). By framing these tasks as simultaneous or ensemble control problems,
we have developed nonlinear and constructive algorithms for training. Our work provides insights into the
parameter complexity and computational requirements of ResNets.
Similarly, we delve into the properties of neural ODEs (NODEs). We demonstrate that autonomous
NODEs of sufficient width can ensure approximate memorization properties. Furthermore, we prove that
by allowing biases to be time-dependent, NODEs can track dynamic data. This showcases their potential for
synthetic model generation and helps elucidate the success of methodologies such as Reservoir Computing.
Next, we analyze the optimal architectures of multilayer perceptrons (MLPs). Our findings offer guidelines for designing MLPs with minimal complexity, ensuring efficiency and effectiveness for supervised
learning tasks.
The generalization and prediction capacity of trained networks plays a crucial role. To address these
properties, we present two nonconvex optimization problems related to shallow neural networks, capturing the ”sparsity” of parameters and robustness of representation. We introduce a ”mean-field” model,
proving, via representer theorems, the absence of a relaxation gap. This aids in designing an optimal tolerance strategy for robustness and, through convexification, efficient algorithms for training.
In the context of large language models (LLMs), we explore the integration of residual networks with
self-attention layers for context capture. We treat ”attention” as a dynamical system acting on a collection of
points and characterize their asymptotic dynamics, identifying convergence towards special points called
leaders. These theoretical insights have led to the development of an interpretable model for sentiment
analysis of movie reviews, among other possible applications.
Lastly, we address federated learning, which enables multiple clients to collaboratively train models
without sharing private data, thus addressing data collection and privacy challenges. We examine training efficiency, incentive mechanisms, and privacy concerns within this framework, proposing solutions to
enhance the effectiveness and security of federated learning methods.
Our work underscores the potential of applying control theory principles to improve machine learning
models, resulting in more interpretable and efficient algorithms. This interdisciplinary approach opens
up a fertile ground for future research, raising profound mathematical questions and application-oriented
challenges and opportunities.
APA:
Zuazua Iriondo, E. (2024). Progress and future directions in machine learning through control theory. In Proceedings FGS 20204 (pp. 116-123). Gijon, Spain, ES.
MLA:
Zuazua Iriondo, Enrique. "Progress and future directions in machine learning through control theory." Proceedings of the FGS 2024. French-German-Spanish Conference on Optimization, Gijon, Spain 2024. 116-123.
BibTeX: Download