Friday, June 14 |
07:00 - 09:00 |
Breakfast (Restaurant - Hotel Tent Granada) |
09:30 - 10:00 |
Daniel Sanz Alonso: Structured Covariance Operator Estimation ↓ Covariance operator estimation is a fundamental task underpinning many algorithms for data assimilation, inverse problems, and machine learning. This talk introduces a notion of sparsity for infinite-dimensional covariance operators and a family of thresholded estimators which exploits it. In a small lengthscale regime, we show that thresholded estimators achieve an exponential improvement in sample complexity over the standard sample covariance estimator. Our analysis explains the importance of using covariance localization techniques in ensemble Kalman methods for data assimilation and inverse problems. (Main Meeting Room - Calle Rector López Argüeta) |
10:10 - 10:40 |
Cristina Cipriani: An optimal control perspective on Neural ODEs and adversarial training ↓ Neural ODEs are a special type of neural networks, which allow the interpretation of deep neural networks as discretizations of control systems. This unique perspective provides the advantage of employing powerful tools from control theory to advance and comprehend machine learning. Specifically, the training of Neural ODEs can be viewed as an optimal control problem, allowing for numerical approaches inspired by this control-oriented viewpoint.
In this talk, we consider the mean-field formulation of the problem and derive first-order optimality conditions in the form of a mean-field Pontryagin Maximum Principle, which we apply to different numerical examples. Moreover, we extend this perspective to the case of adversarial training of neural ODEs, which is a way to enforce reliable and stable outcomes in neural networks. We formalize adversarial training with perturbed data as a minimax optimal control problem and derive first-order optimality conditions in the form of Pontryagin’s Maximum Principle. Moreover, we provide a novel interpretation of robust training leading to an alternative weighted technique, which we test on a low-dimensional classification task. (Main Meeting Room - Calle Rector López Argüeta) |
10:40 - 11:10 |
Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |
10:40 - 11:10 |
Checkout by 11AM (Front Desk - Hotel Tent Granada) |
11:10 - 11:40 |
Lisa Kreusser: Fokker-Planck equations for score-based diffusion models ↓ Generative models have become very popular over the last few years in the machine learning community. These are generally based on likelihood based models (e.g. variational autoencoders), implicit models (e.g. generative adversarial networks), as well as score-based models. As part of this talk, I will provide insights into our recent research in this field focussing on score-based diffusion models. Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, link it to an associated Fokker–Planck equation, and provide a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker–Planck residual. We also show numerically that reducing the Fokker–Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality. (Main Meeting Room - Calle Rector López Argüeta) |
11:45 - 12:15 |
Jaume de Dios Pont: Complexity lower bounds for log-concave sampling ↓ Given a density rho(x), how does one effectively generate samples from a random variable with this density rho? Variations of this question arise in most computational fields, and significant effort has been devoted to designing more and more efficient algorithms, ranging from relatively simple algorithms to increasingly sophisticated such as Langevin-based or diffusion based models.
This talk will focus on the model case in which log-density is a strongly concave smooth function. We will discuss some of the most widely used algorithms, and study fundamental limitations to the problem by finding universal complexity bounds that no algorithm can beat.
Based on joint work with Sinho Chewi, Jerry Li, Chen Lu and Shyam Narayanan. (Main Meeting Room - Calle Rector López Argüeta) |
12:20 - 12:50 |
Raphaël Barboni: Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport ↓ We study the convergence of gradient flow for the training of deep neural networks. If Residual Neural Networks are a popular example of very deep architectures, their training constitutes a challenging optimization problem due notably to the non-convexity and the non-coercivity of the objective. Yet, in applications, those tasks are successfully solved by simple optimization algorithms such as gradient descent. To better understand this phenomenon, we focus here on a ``mean-field'' model of infinitely deep and arbitrarily wide ResNet, parameterized by probability measures over the product set of layers and parameters and with constant marginal on the set of layers. Indeed, in the case of shallow neural networks, mean field models have proven to benefit from simplified loss-landscapes and good theoretical guarantees when trained with gradient flow for the Wasserstein metric on the set of probability measures. Motivated by this approach, we propose to train our model with gradient flow w.r.t. the conditional Optimal Transport distance: a restriction of the classical Wasserstein distance which enforces our marginal condition. Relying on the theory of gradient flows in metric spaces we first show the well-posedness of the gradient flow equation and its consistency with the training of ResNets at finite width. Performing a local Polyak-Łojasiewicz analysis, we then show convergence of the gradient flow for well-chosen initializations: if the number of features is finite but sufficiently large and the risk is sufficiently small at initialization, the gradient flow converges towards a global minimizer. This is the first result of this type for infinitely deep and arbitrarily wide ResNets. (Main Meeting Room - Calle Rector López Argüeta) |
12:55 - 13:25 |
Nikola Kovachki: Function Space Diffusion for Video Modeling ↓ We present a generalization of score-based diffusion models to function space by perturbing functional data via a Gaussian process at multiple scales. We obtain an appropriate notion of score by defining densities with respect to Guassian measures and generalize denoising score matching. We then define the generative process by integrating a function-valued Langevin dynamic. We show that the corresponding discretized algorithm generates samples at a fixed cost that is independent of the data discretization. As an application for such a model, we formulate video generation as a sequence of joint inpainting and interpolation problems defined by frame deformations. We train an image diffusion model using Gaussian process inputs and use it to solve the video generation problem by enforcing equivariance with respect to frame deformations. Our results are state-of-the-art for video generation using models trained only on image data. (Main Meeting Room - Calle Rector López Argüeta) |
13:30 - 15:00 |
Lunch (Restaurant - Hotel Tent Granada) |