Thursday, May 4 |
07:30 - 09:00 |
Breakfast (Restaurant at your assigned hotel) |
09:00 - 10:00 |
Adam Oberman: PDE approach to regularization in deep learning ↓ Deep neural networks have achieved significant success in a number of challenging engineering problems.
There is consensus in the community that some form of smoothing of the loss function is needed, and there have been hundreds of papers and many conferences in the past three years on this topic. However, so far there has been little analysis by mathematicians.
The fundamental tool in training deep neural networks is Stochastic Gradient Descent (SGD) applied to the ``loss'' function, f(x), which is high dimensional and nonconvex.
dxt=−∇f(xt)dt+dWt
There is a consensus in the field that some for of regularization of the loss function is needed, but so far there has been little progress. This may be in part because smoothing techniques, such a convolution, which are useful in low dimensions, are computationally intractable in the high dimensional setting.
Two recent algorithms have shown promise in this direction. The first, \cite{zhang2015deep}, uses a mean field approach to perform SGD in parallel. The second, \cite{chaudhari2016entropy}, replaced f in (SDG) with fγ(x), the \emph{local entropy} of f, which is defined using notions from statistical physics \cite{baldassi2016unreasonable}.
We interpret both algorithms as replacing f with fγ, where fγ=u(x,γ) and u is the solution of the viscous Hamilton-Jacobi PDE
ut(x,t)=−12|\gradu(x,t)|2+β−1Δu(x,t)
along with u(x,0)=f(x). This interpretation leads to theoretical validation for empirical results.
However, what is needed for (SDG) is \gradfγ(x). Remarkably, for short times, this vector can be computed efficiently by solving an auxiliary \emph{convex optimization} problem, which has much better convergence properties than the original non-convex problem. Tools from optimal transportation \cite{santambrogio2016euclidean} are used to justify the fast convergence of the solution of the auxiliary problem.
In practice, this algorithm has significantly improved the training time (speed of convergence) for Deep Networks in high dimensions. The algorithm can also be applied to nonconvex minimization problems where (SDG) is used. (Conference Room San Felipe) |
10:00 - 10:30 |
Rémi Flamary: Joint distribution optimal transportation for domain adaptation ↓ This paper deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function f in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. Our work makes the following assumption: there exists a non-linear transformation between the joint feature/labels space distributions of the two domain ps and pt. We propose a solution of this problem with optimal transport, that allows to recover an estimated target ptf=(X,f(X)) by optimizing simultaneously the optimal coupling and f. We show that our method corresponds to the minimization of a generalization bound, and provide an efficient algorithmic solution, for which convergence is proved. The versatility of our approach, both in terms of class of hypothesis
or loss functions is demonstrated with real world classification and regression problems, for which we reach or surpass state-of-the-art results.
Joint work with Nicolas COURTY, Amaury Habrard, and Alain RAKOTOMAMONJY (Conference Room San Felipe) |
10:30 - 11:00 |
Coffee Break (Conference Room San Felipe) |
11:00 - 11:30 |
Jianbo Ye: New numerical tools for optimal transport and their machine learning applications ↓ In this talk, I plan to introduce my recent work on two numerical
tools that solve optimal transport and its related problems. The first
tool is a Bregman ADMM approach to solve optimal transport and
Wasserstein barycenter. Based on my empirical experience, I will
discuss its pro and cons in practice, and compare it with the popular
entropic regularization approach. The second tool is a simulated
annealing approach to solve Wasserstein minimization problems, in
which I will illustrate there exists a simple Markov chain
underpinning the dual OT. This approach gives very different
approximate solution compared to other smoothing techniques. I will
also discuss how this approach will be related to solve some more
recent problems in machine learning, such as Wasserstein NMF,
out-of-sample mapping estimation. Finally, I will present several
applications in document analysis, sequence modeling, and image
analytics, using those tools which I have developed during my PhD
research. (Conference Room San Felipe) |
11:30 - 12:30 |
Jun Kitagawa: On the multi-marginal optimal partial transport and partial barycenter problems ↓ Relatively recently, there has been much activity on two particular generalizations of the classical two-marginal optimal transport problem. The first is the partial transport problem, where the total mass of the two distributions to be coupled may not match, and one is forced to choose submeasures of the constraints for coupling. The other generalization is the multi-marginal transport problem, where there are 3 or more probability distributions to be coupled together in some optimal manner. By combining the above two generalizations we obtain a natural extension: the multi-marginal optimal partial transport problem. In joint work with Brendan Pass (University of Alberta), we have obtained uniqueness of solutions (under hypotheses analogous to the two-marginal partial transport problem given by Figalli) by relating the problem to what we deem the “partial barycenter problem” for finite measures. Interestingly enough, we also observe some significantly different behavior of solutions compared to the two marginal case. (Conference Room San Felipe) |
12:30 - 13:00 |
Christoph Brune: Combined modelling of optimal transport and segmentation ↓ For studying vascular structures in 4D biomedical imaging, it is of great importance to automatically determine the velocity of flow in video sequences, for example blood flow in vessel networks. In this work, new optimal transport models focusing on direction and segmentation are investigated to find an accurate displacement between two density distributions. By incorporating fluid dynamics constraints, one can obtain a realistic description of the displacement. With an a-priori given segmentation of the network structure, transport models can be improved. However, a segmentation is not always known beforehand. Therefore, in this work a joint segmentation-optimal transport model has been described. Other contributions are the ability of the model to allow for inflow or outflow and the incorporation of anisotropy in the displacement cost. For the problem, a convex variational method has been used and primal-dual proximal splitting algorithms have been implemented. Existence of a solution of the model has been proved. The framework has been applied to synthetic vascular structures and real data, obtained from a collaboration with the hospital in Cambridge. This is joint work with Yoeri Boink and Carola Schönlieb. (Conference Room San Felipe) |
13:00 - 13:30 |
Peyman Mohajerin Esfahani: Data-driven Optimization Using the Wasserstein Metric: Performance Guarantees and Tractable Reformulations ↓ We consider stochastic programs where the distribution of the uncertain parameters is only observable through a finite training dataset. Using the Wasserstein metric, we construct a ball in the space of probability distributions centered at the uniform distribution on the training samples, and we seek decisions that perform best in view of the worst-case distribution within this Wasserstein ball. The state-of-the-art methods for solving the resulting distributionally robust optimization (DRO) problems rely on global optimization techniques, which quickly become computationally excruciating. In this talk we demonstrate that, under mild assumptions, the DRO problems over Wasserstein balls can in fact be reformulated as finite convex programs---in many interesting cases even as tractable linear programs. We further discuss performance guarantees as well as connection to the celebrated regularization techniques in the Machine Learning literature. This talk is based on a joint work with Daniel Kuhn (EPFL). (Conference Room San Felipe) |
13:30 - 15:00 |
Lunch (Restaurant Hotel Hacienda Los Laureles) |
15:00 - 16:00 |
Robert McCann: On Concavity of the Monopolist's Problem Facing Consumers with Nonlinear Price Preferences ↓ The principal-agent problem is an important paradigm in economic theory
for studying the value of private information; the nonlinear pricing problem
faced by a monopolist is a particular example. In this lecture, we identify
structural conditions on the consumers' preferences and the monopolist's
profit functions which guarantee either concavity or convexity of the monopolist's
profit maximization. Uniqueness and stability of the solution are particular consequences
of this concavity. Our conditions are similar to (but simpler than) criteria given by Trudinger and others
for prescribed Jacobian equations to have smooth solutions.
By allowing for different dimensions of agents and contracts, nonlinear dependence of agent preferences on prices,
and of the monopolist's profits on agent identities, we improve on the literature in a number of ways.
The same mathematics can also be adapted to the maximization of societal welfare by a regulated monopoly,
This is joint work with PhD student Shuangjian Zhang. (Conference Room San Felipe) |
16:00 - 16:30 |
Coffee Break (Conference Room San Felipe) |
19:00 - 21:00 |
Dinner (Restaurant Hotel Hacienda Los Laureles) |