Thursday, October 28 |
09:00 - 09:45 |
Alex Cloninger: Learning with Optimal Transport ↓ Discriminating between distributions is an important problem in a number of scientific fields. This motivated the introduction of Linear Optimal Transportation (LOT), which has a number of benefits when it comes to speed of computation and to determining classification boundaries. We characterize a number of settings in which the LOT embeds families of distributions into a space in which they are linearly separable. This is true in arbitrary dimensions, and for families of distributions generated through a variety of actions on a fixed distribution. We also establish results on discrete spaces using Entropically Regularized Optimal Transport, and establish results about active learning with a small number of labels in the space of LOT embeddings. This is joint work with Caroline Moosmueller (UCSD). (Zoom) |
10:00 - 10:45 |
Ron Kimmel: On Geometry and Learning ↓ Geometry means understanding in the sense that it involves finding the most basic invariants or Ockham’s razor explanation for a given phenomenon. At the other end, modern Machine Learning has little to do with explanation or interpretation of solutions to a given problem.
I’ll try to give some examples about the relation between learning and geometry, focusing on learning geometry, starting with the most basic notion of planar shape invariants, efficient distance computation on surfaces, and treating surfaces as metric spaces within a deep learning framework. I will introduce some links between these two seemingly orthogonal philosophical directions. (Zoom) |
11:00 - 11:45 |
Pratik Chaudhari: Does the Data Induce Capacity Control in Deep Learning? ↓ Deep networks are mysterious. These over-parametrized
machine learning models, trained with rudimentary optimization
algorithms on non-convex landscapes in millions of dimensions have
defied attempts to put a sound theoretical footing beneath their
impressive performance.
This talk will shed light upon some of these mysteries. The first part
of this talk will employ ideas from thermodynamics and optimal
transport to paint a picture of the training process of deep networks
and unravel a number of peculiar properties of algorithms like
stochastic gradient. The second part of the talk will argue that these
peculiarities observed during training, as also the anomalous
generalization, may be coming from data that we train upon. This part
will discuss how typical datasets are “sloppy“, i.e., the data
correlation matrix has a strong structure and consists of a large
number of small eigenvalues that are distributed uniformly over
exponentially a large range . This structure is completely mirrored in
a trained deep network: a number of quantities such as the Hessian,
the Fisher Information Matrix, as well as others activation
correlations and Jacobians, are also sloppy. This talk will develop
these concepts to demonstrate analytical non-vacuous generalization
bounds.
This talk will discuss work from the following two papers.
1. Does the data induce capacity control in deep learning?. Yang
Rubing, Mao Jialin, Chaudhari Pratik. [arXiv preprint, 2021]
https://arxiv.org/abs/2110.14163.
2. Stochastic gradient descent performs variational inference,
converges to limit cycles for deep networks. Pratik Chaudhari and
Stefano Soatto [ICLR ’18] https://arxiv.org/abs/1710.11029 (Zoom) |
12:00 - 13:00 |
Lunch (Zoom/Gathertown) |
13:00 - 13:45 |
David Alvarez Melis: Principled Data Manipulation with Optimal Transport ↓ Success stories in machine learning seem to be ubiquitous, but they tend to be concentrated on ‘ideal’ scenarios where clean, homogenous, labeled data are abundant. But machine learning in practice is rarely so 'pristine’. In most real-life applications, clean data is typically scarce, is collected from multiple heterogeneous sources, and is often only partially labeled. Thus, successful application of ML in practice often requires substantial effort in terms of dataset preprocessing, such as augmenting, merging, mixing, or reducing datasets. In this talk I will present some recent work that seeks to formalize all these forms of dataset ‘manipulation’ under a unified approach based on the theory of Optimal Transport. Through applications in machine translation, transfer learning, and dataset shaping, I will show that besides enjoying sound theoretical footing, these approaches yield efficient, flexible, and high-performing algorithms. This talk is based on joint work with Tommi Jaakkola, Stefanie Jegelka, Nicolo Fusi, Youssef Mroueh, and Yair Schiff. (Zoom) |
14:00 - 14:45 |
Elizabeth Gross: Learning phylogenetic networks using invariants ↓ Phylogenetic networks provide a means of describing the evolutionary history of sets of species believed to have undergone hybridization or gene flow during the course of their evolution. The mutation process for a set of such species can be modeled as a Markov process on a phylogenetic network. Previous work has shown that a site-pattern probability distributions from a Jukes-Cantor phylogenetic network model must satisfy certain algebraic invariants. As a corollary, aspects of the phylogenetic network are theoretically identifiable from site-pattern frequencies. In practice, because of the probabilistic nature of sequence evolution, the phylogenetic network invariants will rarely be satisfied, even for data generated under the model. Thus, using network invariants for inferring phylogenetic networks requires some means of interpreting the residuals, or deviations from zero, when observed site-pattern frequencies are substituted into the invariants. In this work, we propose a machine learning algorithm utilizing invariants to infer small, level-one phylogenetic networks. Given a data set, the algorithm is trained on model data to learn the patterns of residuals corresponding to different network structures to classify the network that produced the data. This is joint work with Travis Barton, Colby Long, and Joseph Rusinko. (Zoom) |
15:00 - 15:45 |
Break ↓ Freedom. (Zoom) |
15:45 - 16:30 |
Soledad Villar: Equivariant machine learning structured like classical physics ↓ There has been enormous progress in the last few years in designing neural networks that respect the fundamental symmetries and coordinate freedoms of physical law. Some of these frameworks make use of irreducible representations, some make use of high-order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries, but a large fraction (possibly all) of classical physics is equivariant to translation, rotation, reflection (parity), boost (relativity), and permutations. In this talk we show that it is simple to parameterize universally approximating polynomial functions that are equivariant under these symmetries, or under the Euclidean, Lorentz, and Poincare groups, at any dimensionality d. The key observation is that nonlinear O(d)-equivariant (and related-group-equivariant) functions can be expressed in terms of a lightweight collection of scalars---scalar products and scalar contractions of the scalar, vector, and tensor inputs. Our numerical results show that our approach can be used to design simple equivariant deep learning models for classical physics with good scaling. (Zoom) |
16:30 - 17:30 |
Ilke Demir: Panel; AI & Industry, with Dr Juan Carlos Catana, Dr Ilke Dermir, Dr David Alvares Melis ↓ A conversation with several actors and researchers about their roles in AI & Industry, with Ilke Demir (Intel), Juan Carlos Catana (HP Labs Mx) and David Alvarez Melis (Microsoft Research). (Zoom) |