# Schedule for: 24w5301 - Structured Machine Learning and Time–Stepping for Dynamical Systems

Beginning on Sunday, February 18 and ending Friday February 23, 2024

All times in Banff, Alberta time, MST (UTC-7).

Sunday, February 18 | |
---|---|

16:00 - 17:30 | Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre) |

17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |

20:00 - 22:00 |
Informal gathering ↓ Informal Meet and Greet at the BIRS Lounge in PDC 2nd floor. (Other (See Description)) |

Monday, February 19 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

08:45 - 09:00 |
Introduction and Welcome by BIRS Staff ↓ A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions. (TCPL 201) |

09:00 - 09:45 |
Emil Constantinescu: Subgrid-Scale Operators with Neural Ordinary Differential Equations ↓ We discuss a new approach to learning the subgrid-scale model effects when simulating partial differential equations (PDEs) solved by the method of lines and their representation in chaotic ordinary differential equations, based on neural ordinary differential equations (NODEs). Solving systems with fine temporal and spatial grid scales is an ongoing computational challenge, and closure models are generally difficult to tune. Machine learning approaches have increased the accuracy and efficiency of computational fluid dynamics solvers. In this approach neural networks are used to learn the coarse- to fine-grid map, which can be viewed as subgrid scale parameterization. We propose a strategy that uses the NODE and partial knowledge to learn the source dynamics at a continuous level. Our method inherits the advantages of NODEs and can be used to parameterize subgrid scales, approximate coupling operators, and improve the efficiency of low-order solvers. Numerical results using the two-scale Lorenz 96 equation, the convection-diffusion equation, and 2D/3D Navier--Stokes equations are used to illustrate this approach. (TCPL 201) |

09:45 - 10:30 |
Takaharu Yaguchi: Numerical integrators for learning neural ordinary differential equation models ↓ In recent years, neural network models for learning differential equation models from observed data of physical phenomena have been attracting attention. To train the neural network models it is often necessary to discretize the models using numerical integrators. In this talk, the effect of discretization in such cases is investigated. For example, I will explain that models become unidentifiable when general Runge-Kutta methods are used. (TCPL 201) |

10:30 - 11:00 | Coffee Break (TCPL Foyer) |

11:00 - 11:45 |
Lisa Kreusser: Dynamical systems in deep generative modelling ↓ Generative models have become very popular over the last few years in the machine learning community. These are generally based on likelihood based models (e.g. variational autoencoders), implicit models (e.g. generative adversarial networks), as well as score-based models. As part of this talk, I will provide insights into our recent research in this field focussing on score-based diffusion models which have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, link it to an associated Fokker–Planck equation, and provide a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker–Planck residual. We also show numerically that reducing the Fokker–Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality. (TCPL 201) |

11:55 - 13:20 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

13:30 - 14:15 |
Hongkun Zhang: Machine learning of conservation laws for dynamical systems ↓ This talk presents some recent work jointed with my colleague Panayotis Kevrekidis and Wei Zhu. I will discuss machine learning techniques that we have successfully designed to learn the number of conservation laws, with applications to models including nonlinear lattice systems. (TCPL 201) |

14:15 - 15:00 |
Christian Offen: Learning Lagrangian dynamics from data with UQ ↓ I will show how to use Gaussian Process regression to learn variational dynamical systems from data. The method is fitted into a GP framework by Chen et al for solving pdes such that uncertainty quantification and convergence of the method can be derived. (TCPL 201) |

15:00 - 15:30 | Coffee Break (TCPL Foyer) |

15:30 - 16:15 |
Brynjulf Owren: Stability of numerical methods set on Euclidean spaces and manifolds with applications to neural networks ↓ Stability of numerical integrators play a crucial role in approximating the flow of differential equations. Issues related to convergence and step size limitations have been successfully resolved by studying the stability properties of numerical schemes. Stability also plays a role in the existence and uniqueness to the solution of the nonlinear algebraic equations that need to be solved in each time step for an implicit method. However, very little has up to now been known about stability properties of numerical methods on manifolds, such as Lie group integrators. An interest in these questions has recently been sparked by the efforts in constructing ODE based neural networks that are robust against adversarial attacks. In this talk we shall discuss a new framework for B-stability on Riemannian manifolds. A method is B-stable if the numerical method exhibits a non-expansive behaviour in the Riemannian distance measure when applied to problems which have non-expansive solutions. We shall in particular see how the sectional curvature of the manifold plays a role, and show some surprising results regarding the non-uniqueness of geodesic implicit integrators for positively curved spaces. If time permits, we shall also discuss how to make use of the results in neural networks where the data belong to a Riemannian manifold. (TCPL 201) |

16:15 - 17:00 |
Simone Brugiapaglia: Practical existence theorems for deep learning approximation in high dimensions ↓ Deep learning is having a profound impact on industry and scientific research. Yet, while this paradigm continues to show impressive performance in a wide variety of applications, its mathematical foundations are far from being well understood. Motivated by deep learning methods for scientific computing, I will present new practical existence theorems that aim at bridging the gap between theory and practice in this area. Combining universal approximation results for deep neural networks with sparse high-dimensional polynomial approximation theory, these theorems identify sufficient conditions on the network architecture, the training strategy, and the size of the training set able to guarantee a target accuracy. I will illustrate practical existence theorems in the contexts of high-dimensional function approximation via feedforward networks, reduced order modeling based on convolutional autoencoders, and physics-informed neural networks for high-dimensional PDEs. (TCPL 201) |

17:00 - 17:45 | Free Discussion (TCPL Foyer) |

17:55 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |

Tuesday, February 20 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

09:00 - 09:45 |
Davide Murari: Improving the robustness of Graph Neural Networks with coupled dynamical systems ↓ Graph Neural Networks (GNNs) have established themselves as a key component in addressing diverse graph-based tasks, like node classification. Despite their notable successes, GNNs remain susceptible to input perturbations in the form of adversarial attacks. In this talk, we present a new approach to fortify GNNs against adversarial perturbations through the lens of coupled contractive dynamical systems. (TCPL 201) |

09:45 - 10:30 |
Eldad Haber: Time dependent graph neural networks ↓ Graph Neural Networks (GNNs) have demonstrated remarkable success in modeling complex relationships in graph-structured data. A recent innovation in this field is the family of Differential Equation-Inspired Graph Neural Networks (DE-GNNs), which leverage principles from continuous dynamical systems to model information flow on graphs with built-in properties such as feature smoothing or preservation. However, existing DE-GNNs rely on first or second-order temporal dependencies. In this talk, we propose a neural extension to those pre-defined temporal dependencies. We show that our model, called TDE-GNN, can capture a wide range of temporal dynamics that go beyond typical first or second-order methods, and provide use cases where existing temporal models are challenged. We demonstrate the benefit of learning the temporal dependencies using our method rather than using pre-defined temporal dynamics on several graph benchmarks. (TCPL 201) |

10:30 - 11:00 | Group Photo / Coffee Break (TCPL Foyer) |

11:00 - 11:45 |
Melanie Weber: Representation Trade-Offs in Geometric Machine Learning ↓ The utility of encoding geometric structure, such as known symmetries, into machine learning architectures has been demonstrated empirically, in domains ranging from biology to computer vision. However, rigorous analysis of its impact on the learnability of neural networks is largely missing. A recent line of learning theoretic research has demonstrated that learning shallow, fully-connected neural networks, which are agnostic to data geometry, has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this talk, we ask, whether knowledge on data geometry is sufficient to alleviate the fundamental hardness of learning neural networks? We discuss learnability in several geometric settings, including equivariant neural networks, a class of geometric machine learning architectures that explicitly encode symmetries. Based on joined work with Bobak Kiani, Jason Wang, Thien Le, Hannah Lawrence, and Stefanie Jegelka. (Online) |

11:55 - 13:20 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

13:30 - 14:15 |
Geoffrey McGregor: Conservative Hamiltonian Monte Carlo ↓ Hamiltonian Monte Carlo (HMC) is a prominent Markov Chain Monte Carlo algorithm often used to generate samples from a target distribution by evolving an associated Hamiltonian system using symplectic integrators. HMC’s improved sampling efficacy over traditional Gaussian random walk algorithms is primarily due to its higher acceptance probability on distant proposals, thereby reducing the correlation between successive samples more effectively and thus requiring fewer samples overall. Yet, thin high density regions can occur in high dimensional target distributions, which can lead to a significant decrease in the acceptance probability of HMC proposals when symplectic integrators are used. Instead, we introduce a variant of HMC called Conservative Hamiltonian Monte Carlo (CHMC), which utilizes a symmetric R-reversible second-order energy-preserving integrator to generate distant proposals with high probability of acceptance. We show that CHMC satisfies approximate stationarity with an error proportional to the integrator’s accuracy order. We also highlight numerical examples, with improvements in convergence over HMC, persisting even for large step sizes and narrowing widths of high density regions. This work is in collaboration with Andy Wan. (TCPL 201) |

14:15 - 15:00 |
Wu Lin: (Lie-group) Structured Inverse-free Second-order Optimization for Large Neural Nets ↓ Optimization is an essential ingredient of machine learning. Many optimization problems can be formulated from a probabilistic perspective to exploit the Fisher-Rao geometric structure of a probability family. By leveraging the structure, we can design new optimization methods. A classic approach to exploiting the Fisher-Rao structure is natural-gradient descent (NGD). In this talk, we show that performing NGD on a Gaussian manifold recovers Newton's method for unconstrained optimization, where the inverse covariance matrix is viewed as a preconditioning matrix. This connection allows us to develop (Lie-group) structured second-order methods by reparameterizing a preconditioning matrix and exploiting the parameterization invariance of natural gradients. We show applications where we propose structured matrix-inverse-free second-order optimizers and use them to train large-scale neural nets with millions of parameters in half precision settings. (TCPL 201) |

15:00 - 15:30 | Coffee Break (TCPL Foyer) |

15:30 - 16:15 |
Molei Tao: Optimization and Sampling in Non-Euclidean Spaces ↓ Machine learning in non-Euclidean spaces have been rapidly attracting attention in recent years, and this talk will give some examples of progress on its mathematical and algorithmic foundations. I will begin with variational optimization, which, together with delicate interplays between continuous- and discrete-time dynamics, enables the construction of momentum-accelerated algorithms that optimize functions defined on manifolds. Selected applications, namely a generic improvement of Transformer, and a low-dim. approximation of high-dim. optimal transport distance, will be described. Then I will turn the optimization dynamics into an algorithm that samples probability distributions on Lie groups. If time permits, the efficiency and accuracy of the sampler will also be quantified via a new, non-asymptotic error analysis. (TCPL 201) |

16:15 - 17:00 |
Melvin Leok: The Connections Between Discrete Geometric Mechanics, Information Geometry, Accelerated Optimization and Machine Learning ↓ Geometric mechanics describes Lagrangian and Hamiltonian mechanics geometrically, and information geometry formulates statistical estimation, inference, and machine learning in terms of geometry. A divergence function is an asymmetric distance between two probability densities that induces differential geometric structures and yields efficient machine learning algorithms that minimize the duality gap. The connection between information geometry and geometric mechanics will yield a unified treatment of machine learning and structure-preserving discretizations. In particular, the divergence function of information geometry can be viewed as a discrete Lagrangian, which is a generating function of a symplectic map, that arise in discrete variational mechanics. This identification allows the methods of backward error analysis to be applied, and the symplectic map generated by a divergence function can be associated with the exact time-h flow map of a Hamiltonian system on the space of probability distributions. We will also discuss how time-adaptive Hamiltonian variational integrators can be used to discretize the Bregman Hamiltonian, whose flow generalizes the differential equation that describes the dynamics of the Nesterov accelerated gradient descent method. (TCPL 201) |

17:00 - 17:45 | Free Discussion (TCPL Foyer) |

17:55 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |

Wednesday, February 21 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

09:00 - 09:45 | Elena Celledoni: Deep neural networks on diffeomorphism groups for optimal shape reparameterization (TCPL 201) |

09:45 - 10:30 |
Giacomo Dimarco: Control and neural network uncertainty quantification for plasma simulation ↓ We will consider the development of numerical methods for simulating plasmas in magnetic confinement nuclear fusion reactors. In particular, we focus on the Vlasov-Maxwell equations describing out of equilibrium plasmas influenced by an external magnetic field and we approximate this model through the use of particle methods. We will additionally set an optimal control problem aiming at minimizing the temperature at the boundaries of the fusion device or alternatively the number of particles hitting the boundary. Our goal consists then in confining the plasma in the center of the physical domain. In this framework, we consider the construction of multifidelity methods based on neural network architectures for estimating the uncertainties due to the lack of knowledge of all the physical aspects arising in the modeling of plasma. (TCPL 201) |

10:30 - 11:00 | Coffee Break (TCPL Foyer) |

11:00 - 11:45 |
Bethany Lusch: Computationally Efficient Data-Driven Discovery and Linear Representation of Nonlinear Systems For Control ↓ Linear dynamics are desirable for control, and Koopman theory offers hope of a globally linear (albeit infinite-dimensional) representation of nonlinear dynamics. However, it is challenging to find a good finite-dimensional approximation of the theoretical representation. I will present a deep learning approach with recursive learning that reduces error accumulation. The resulting linear system is controlled using a linear quadratic control. An illustrative example using a pendulum system will be presented with simulations on noisy data. We show that our proposed method is trained more efficiently and is more accurate than an autoencoder baseline. (TCPL 201) |

11:55 - 13:20 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

13:30 - 17:30 | Free Afternoon (Banff National Park) |

17:30 - 19:30 |
Dinner ↓ |

Thursday, February 22 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ |

09:00 - 09:45 |
Chris Budd: Adaptivity and expressivity in neural network approximations ↓ We consider the training of a Free Knot Spline (FKS) and a ReLU based NN to approximate regular and singular functions of a scalar variable over a fixed interval. Neural networks have a high theoretical expressivity, but the training of these with a natural choice of loss functions leads to non-convex problems where the resulting architecture is far from optimal and the ReLU NN with the usual loss function can give a poor approximation. Similar issues arise with a FKS, but here the training is more robust with a crucial role played by the best interpolating FKS. The latter can be trained more easily and then acts as a good starting point for training the FKS. This also gives insight into a better choice of loss function based on knot equidistribution which allows a better calculation of the knots of the FKS.
We look at the interplay between the training of a FKS and the expressivity of a FKS, with the training of formally equivalent shallow and a deep ReLU NNs. The fact that we can train a FKS to achieve a high expressivity, even for a singular function by making use of the optimal interpolating FKS (with associated optimal knots), gives insights into how we can train a ReLU NN. (TCPL 201) |

09:45 - 10:30 |
Yen-Hsi Tsai: Efficient gradient descent algorithms for learning from multiscale data ↓ We will discuss a gradient descent based multiscale algorithm for minimizing loss functions arising from multiscale data distributions. (TCPL 201) |

10:30 - 11:00 | Coffee Break (TCPL Foyer) |

11:00 - 11:45 |
Michael Graham: Data-driven modeling of complex chaotic dynamics on invariant manifolds ↓ Fluid flows often exhibit chaotic or turbulent dynamics and require a large number of degrees of freedom for accurate simulation. Nevertheless, because of the fast damping of small scales by viscosity, these flows can in principle be characterized with a much smaller number of dimensions, as their long-time dynamics relax in state space to a finite-dimensional invariant manifold. We describe a data-driven reduced order modeling method, “Data-driven Manifold Dynamics” (DManD), that finds a nonlinear coordinate representation of the manifold using an autoencoder, then learns a system of ordinary differential equations for the dynamics on the manifold. Exploitation of symmetries substantially improves performance. We apply DManD to a range of systems including transitional turbulence, where we accurately represent the dynamics with 25 degrees of freedom, as compared to the 10^5 degrees of freedom of the direct simulation. We then use the model to efficiently train a reinforcement learning control policy that is highly effective in laminarizing the flow. We also introduce an autoencoder architecture that yields an explicit estimate of manifold dimension. DManD can be combined with a clustering algorithm to represent the invariant manifold as an atlas of overlapping local representations (charts). This approach, denoted CANDyMan (Charts and atlases for nonlinear data-driven dynamics on manifolds) enables minimal-dimensional representation of the dynamics and is particularly useful for systems with discrete symmetries or intermittent dynamics. (TCPL 201) |

11:55 - 13:20 |
Lunch ↓ |

13:30 - 14:15 |
Seth Taylor: A spatiotemporal discretization for diffeomorphism approximation ↓ We present a characteristic mapping method for the approximation of diffeomorphisms arising in fluid mechanics. The method utilizes a spatiotemporal discretization defined by a composition of sub-interval flows represented by spline interpolants. By leveraging the composite structure, exponentially fine scale fluid motions can be captured using only a linear increase in computational resources. We will explain how certain unique resolution properties of the method result from this discretization and the preservation of a relabelling symmetry for transported quantities. Numerical examples showcasing the ability to resolve the direct energy cascade at sub-grid scales and capture some associated inverse cascade phenomena for barotropic flows on the sphere will be given. This is joint work with Jean-Christophe Nave and Xi-Yuan Yin. (TCPL 201) |

14:15 - 15:00 |
James Jackaman: Limited area weather modelling: interpolating with neural networks ↓ In this talk we discuss the use of neural networks as operators to interpolate finite element functions between mesh resolutions. These operators aim to improve regional weather predictions by capturing underresolved features near the model boundary from coarse global information. (TCPL 201) |

15:00 - 15:30 | Coffee Break (TCPL Foyer) |

15:30 - 16:15 |
Daisuke Furihata: A particle method based on Voronoi decomposition for the Cahn–Hilliard equation ↓ We want to perform appropriate and fast numerical calculations using machine learning of the time-evolving operator of the Cahn-Hilliard equation. Still, in the context of FDM and FEM, the amount of machine learning becomes enormous. Therefore, we consider applying the Voronoi particle method and calculating particle behavior using machine learning. (TCPL 201) |

16:15 - 17:00 |
David Ketcheson: Explicit time discretizations that preserve dissipative or conservative energy dynamics ↓ Many systems modeled by differential equations possess a conserved energy or dissipated entropy functional, and preservation of this qualitative structure is essential to accurately capture their dynamics. Traditional methods that enforce such structure are usually implicit and expensive. I will describe a class of relatively cheap and explicit methods that can be used to accomplish this. I’ll also describe the extension to conserving multiple functionals and show several illustrative examples. (TCPL 201) |

17:00 - 17:45 | Open Problems / Discussion (TCPL Foyer) |

17:30 - 19:30 |
Dinner ↓ |

Friday, February 23 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ |

09:00 - 09:45 |
Kyriakos Flouris: Geometry aware neural operators for hemodynamics ↓ The standard approach for estimating hemodynamics parameters involves running CFD simulations on patient-specific models extracted from medical images. Personalization of these models can be performed by integrating further data from MRA and Flow MRI. While this technique can provide estimations for crucial parameters, such as wall shear stress (WSS), as well as its time and space averaged summaries, its implementation in clinical practice is hindered by the heavy computational load of CFD simulations and manual interventions required to obtain accurate geometric models, where simulations can be computed on. We aim to estimate hemodynamics parameters from flow and anatomical MRI, which can be routinely acquired in clinical practice. The flow information and the geometry can be combined together in a computational mesh. Working directly on the wall a geometry aware graph-convolution-neural-network can be trained to predict the WSS given a computational domain and some flow information near the wall. However, for 4D MRI clinical data resolution can be prohibitively low for capturing WSS accurately enough. An ideal model will be able to faithfully upscale a Navier-stokes solution from the anatomical and flow clinical MRI. We investigate how neural operators can be coupled to geometry aware graph neural networks and the potential of geometry-informed and novel covariant neural operators for predicting the hemodynamic parameters from the clinical data. (TCPL 201) |

09:45 - 10:30 |
Yolanne Lee: Learning PDEs from image data using invariant features ↓ Model discovery techniques which retrieve governing equations from data and a small amount of physical knowledge have emerged, including those based on genetic algorithms and symbolic regression, sparse regression, and, more recently, deep learning. However, the many complex systems may be inherently hard to simulate or measure experimentally, which results in limited data. We consider examples where the governing model is frame-independent, i.e. invariant under translation, rotation, and reflection, since knowledge of such invariances can be used to learn more efficiently. We propose a set of invariant features inspired by a classical multi-scale image feature analysis approach which can be implemented as a data pre-processing step, and investigate its impact on learning invariant models using PySR, SINDy, and the neural ODE. Using these invariant features is shown to improve the stability, learning time, and generalisability of the learned models. (TCPL 201) |

10:30 - 11:00 | Coffee Break (TCPL Foyer) |

10:30 - 11:00 |
Checkout by 11AM ↓ 5-day workshop participants are welcome to use BIRS facilities (TCPL ) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 11AM. (Front Desk - Professional Development Centre) |

11:00 - 11:45 |
Juntao Huang: Hyperbolic machine learning moment closures for kinetic equations ↓ In this talk, we present our work on hyperbolic machine learning (ML) moment closure models for kinetic equations. Most of the existing ML closure models are not able to guarantee the stability, which directly causes blow up in the long-time simulations. In our work, with carefully designed neural network architectures, the ML closure model can guarantee the stability (or hyperbolicity). Moreover, other mathematical properties, such as physical characteristic speeds, are also discussed. Extensive benchmark tests show the good accuracy, long-time stability, and good generalizability of our ML closure model. (Online) |

11:55 - 13:20 | Lunch (Vistas Dining Room) |