Schedule for: 25w5335 - Efficient Approximate Bayesian Inference

Beginning on Sunday, March 9 and ending Friday March 14, 2025

All times in Banff, Alberta time, MST (UTC-7).

Sunday, March 9
16:00 - 17:30	Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre)
17:30 - 19:30	Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room)
20:00 - 22:00	Informal gathering ↓ Meet and Greet at the BIRS Lounge (PDC second floor). (Other (See Description))

Monday, March 10
07:00 - 08:45	Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
08:45 - 09:00	Introduction and Welcome by BIRS Staff ↓ A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions. (TCPL 201)
09:00 - 10:00	Panel Discussion: Challenges & Opportunities in VI (TCPL 201)
10:00 - 10:30	Coffee Break (TCPL Foyer)
10:30 - 11:00	Tamara Broderick: Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box ↓ Automatic differentiation variational inference (ADVI) offers fast and easy-to-use posterior approximation in multiple modern probabilistic programming languages. However, its stochastic optimizer lacks clear convergence criteria and requires tuning parameters. Moreover, ADVI inherits the poor posterior uncertainty estimates of mean-field variational Bayes (MFVB). We introduce "deterministic ADVI" (DADVI) to address these issues. DADVI replaces the intractable MFVB objective with a fixed Monte Carlo approximation, a technique known in the stochastic optimization literature as the "sample average approximation" (SAA). By optimizing an approximate but deterministic objective, DADVI can use off-the-shelf second-order optimization, and, unlike standard mean-field ADVI, is amenable to more accurate posterior covariances via linear response (LR). In contrast to existing worst-case theory, we show that, on certain classes of common statistical problems, DADVI and the SAA can perform well with relatively few samples even in very high dimensions, though we also show that such favorable results cannot extend to variational approximations that are too expressive relative to mean-field ADVI. We show on a variety of real-world problems that DADVI reliably finds good solutions with default settings (unlike ADVI) and, together with LR covariances, is typically faster and more accurate than standard ADVI. (Online)
11:00 - 11:15	Diana Cai: Batch and match: score-based approaches to black-box variational inference ↓ Probabilistic modeling is a cornerstone of modern data analysis, uncertainty quantification, and decision making. A key challenge of probabilistic inference is computing a target distribution of interest, which is often intractable. Variational inference (VI) is a popular method that has enabled scalable probabilistic inference across a range of applications in science and engineering. Black-box variational inference (BBVI) algorithms, in which the target needs only to be available as the log of an unnormalized distribution that is typically differentiable, are becoming a major focus of VI due to their ease of application. Moreover, thanks to advances in automatic differentiation, BBVI algorithms are now widely available in popular probabilistic programming languages, providing automated VI algorithms to data analysis practitioners. But gradient-based methods can be plagued by high-variance gradients and sensitivity to hyperparameters of the learning algorithms; these issues are further exacerbated when using richer variational families that model the correlations between different latent variables. In this work, we present “batch and match” (BaM), an alternative approach to BBVI that uses a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization. Finally, we conclude with a discussion of extensions of score-based BBVI to high-dimensional settings, large data settings, and non-Gaussian variational families based on orthogonal function expansions. This is joint work with David M. Blei (Columbia University), Robert M. Gower (Flatiron Institute), Charles C. Margossian (Flatiron Institute), Chirag Modi (New York University), Loucas Pillaud-Vivien (Ecole des Ponts ParisTech), and Lawrence K. Saul (Flatiron Institute). (Online)
11:15 - 11:30	Marusic Juraj: Black-Box VI estimates the gradient by sampling. Is it differentially private by default? ↓ There has been an increasing body of work on privatizing variational inference methods. These methods typically rely on adding external noise to the optimization procedures to mask the influence of any single individual's data, similar to noisy stochastic gradient descent. At the same time, these large-scale optimization techniques already use random sampling when estimating the gradients of the ELBO. Here, we study the extent of privacy preserved by this inherent random sampling during the Black-Box Variational Inference procedure and investigate whether the injection of additional noise is necessary to ensure differential privacy. (TCPL 201)
11:30 - 13:00	Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
13:00 - 13:30	Camila de Souza: Clustering Functional Data via Variational Inference ↓ Among different functional data analyses, clustering analysis aims to determine underlying groups of curves in the dataset when there is no information on the group membership of each curve. In this work, we develop a novel variational Bayes (VB) algorithm for clustering and smoothing functional data simultaneously via a B-spline regression mixture model with random intercepts. We employ the deviance information criterion to select the best number of clusters. The proposed VB algorithm is evaluated and compared with other methods (k-means, functional k-means and two other model-based methods) via a simulation study under various scenarios. We apply our proposed methodology to two publicly available datasets. We demonstrate that the proposed VB algorithm achieves satisfactory clustering performance in both simulation and real data analyses. (TCPL 201)
13:30 - 13:45	Chengqian Xian: Fast Variational Bayesian Inference for Survival Data Using Log-Logistic Accelerated Failure Time Models ↓ Survival analysis plays a crucial role in various scientific and medical research areas, with the log-logistic accelerated failure time (AFT) model being a widely used approach for modeling survival data. While frequentist and Markov Chain Monte Carlo (MCMC)-based Bayesian inference methods have been extensively developed for such models, they often come with computational challenges. In this work, we propose an efficient mean-field variational Bayes (VB) algorithm for parameter estimation in log-logistic AFT models. Our VB approach incorporates a piecewise approximation technique to achieve conjugacy, enabling computationally efficient inference. We extend our methodology to correlated survival data, which frequently arises in clinical applications. We focus on clustered survival data, where patients within the same unit exhibit similar characteristics, leading to intra-cluster dependence. To accommodate this structure, we introduce a shared frailty log-logistic AFT model with a cluster-specific random intercept. We evaluate our proposed VB algorithm through extensive simulations under various settings, comparing its performance with frequentist, h-likelihood, and MCMC-based methods. Our results demonstrate that the VB algorithm consistently achieves competitive accuracy while significantly reducing computational cost - achieving an average speedup of up to 300 times compared to MCMC. We illustrate the practical benefits of our method through an application to invasive mechanical ventilation duration data collected from intensive care units (ICUs) in Ontario, Canada. Our analysis reveals the impact of ICU site-level effects on ventilation duration while leveraging the computational efficiency of VB for rapid inference. The proposed framework offers a scalable and effective alternative for analyzing both independent and correlated survival data, making it particularly valuable in large-scale biomedical applications. (TCPL 201)
13:45 - 14:00	Ana Carolina da Cruz: Variational Bayes for Basis Function Selection for Functional Data Representation with Correlated Errors ↓ Functional data analysis (FDA) has found extensive application across various fields, driven by the increasing recording of data continuously over a time interval or at several discrete points. FDA provides the statistical tools specifically designed for handling such data. Over the past decade, Variational Bayes (VB) algorithms have gained popularity in FDA, primarily due to their speed advantages over MCMC methods. This work proposes a VB algorithm for basis function selection for functional data representation while allowing for a complex error covariance structure. We assess and compare the effectiveness of our proposed VB algorithm with MCMC via simulations. We also apply our approach to a publicly available dataset. Our results show the accuracy in coefficient estimation and the efficacy of our VB algorithm to find the true set of basis functions. Notably, our proposed VB algorithm demonstrates a performance comparable to MCMC but with substantially reduced computational cost. (TCPL 201)
14:00 - 14:20	Group Photo ↓ Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo! (TCPL Foyer)
14:30 - 14:45	Shrijita Bhattacharya: Variational Inference Aided Variable Selection For Spatially Structured High Dimensional Covariates ↓ We consider the problem of Bayesian high dimensional variable selection under linear regression when a spatial structure exists among the covariates. We use an Ising prior to model the structural connectivity of the covariates with an undirected graph and the connectivity strength with Ising distribution parameters. Ising models which originated in statistical physics, is widely used in computer vision and spatial data modeling. Although a Gibbs solution to this problem exists, the solution involves the computation of determinants and inverses of high dimensional matrices rendering it unscalable to higher dimensions. Further, the lack of theoretical support limits this important tool's use for the broader community. This paper proposes a variational inference-aided Gibbs approach that enjoys the same variable recovery power as the standard Gibbs solution all the while being computationally scalable to higher dimensions. We establish strong selection consistency of our proposed approach together with its competitive numerical performance under varying simulation scenarios. (TCPL 201)
14:45 - 15:00	Gonzalo Mena: Approximate Bayesian methods for pseudo-time inference based on relaxed permutations ↓ The problem of pseudotime inference can be casted as an instance of the classical ordination or seriation problem, which in turn is an example of inference in a factor model. In modern applications, such as in the study and modeling of Alzheimer’s disease (AD) progression from comprehensive neuropathological measurements (that we study here), such factor models are extremely complex as they require the inference of factor-specific non-linearities, deviating significantly from the original formulation. We show that it is possible to recast this problem as the simultaneous inference of a latent ordering (i.e. a permutation) and the non-parametric inference of factor-specific increasing functions. In the Bayesian setting, this formulation implies the immediate computational challenge of the posterior inference of a permutation-based distribution, a notoriously hard problem. We surmount this challenge by implementing an approximated sampler that relaxes this distribution to the continuum. This distribution is described in terms of the so-called Sinkhorn algorithm and can be implemented seamlessly in the usual probabilistic programming pipelines, and can profit from GPU acceleration. We show that our method and associated computational pipeline lead to better modeling (and improved fit) of the relevant SEA-AD cohort dataset. (TCPL 201)
15:00 - 15:30	Coffee Break (TCPL Foyer)
15:30 - 15:45	Lydia Gabirc: Addressing Antidiscrimination with Variational Inference ↓ Within the insurance industry, evolving antidiscrimination regulations have led insurers to exclude protected information in pricing calculations. With the rise of complex algorithmic methods and big data, insurers face the dual challenge in complying with regulatory standards while maintaining reasonable estimates. Many researchers have proposed methods to achieve discrimination-free pricing and varying fairness measures, but these techniques often require protected information at the individual-level which is often inaccessible. In this work, we propose to study the indirect discrimination via a hierarchical finite mixture model with latent protected variables. The hierarchical structure of the model will be represented through the prior specification, and the posterior distribution is used to impute the unobserved sensitive variable through the indirect discriminatory dependence structure. To tackle the indirect discrimination existing in the true posterior, we propose the use of variational inference with a mean-field variational family that enforces independence between unknown variables, eliminating the indirect discrimination by definition. A further step of importance sampling is incorporated to achieve insurance unbiasedness with the optimized discrimination-free distribution. Our method is supported with a simulation study inspired by real insurance data. (TCPL 201)
16:00 - 17:30	Break-Out Sessions (Other (See Description))
17:30 - 19:30	Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room)
19:30 - 20:00	Daniel Andrade: Stabilizing training of affine coupling layers for high-dimensional variational inference ↓ Variational inference with normalizing flows is an increasingly popular alternative to MCMC methods. In particular, normalizing flows based on affine coupling layers (Real NVPs) are frequently used due to their good empirical performance. In theory, increasing the depth of normalizing flows should lead to more accurate posterior approximations. However, in practice, training deep normalizing flows for approximating high-dimensional posterior distributions is often infeasible due to the high variance of the stochastic gradients. In this talk, I will first recap variational inference and Real NVPs and then present several well-known and less-known/new techniques to achieve stable training of Real NVPs. In particular, we propose a combination of two methods: (1) soft-thresholding of the scale in Real NVPs, and (2) a bijective soft log transformation of the samples. Finally, I will present our evaluation of these and other previously proposed modification on several challenging target distributions, including a high-dimensional horseshoe logistic regression model. Our experiments show that with our modifications, stable training of Real NVPs for posteriors with several thousand dimensions and heavy tails is possible, allowing for more accurate marginal likelihood estimation via importance sampling. (Online)
20:00 - 20:15	Jin Yongshan: Low-Rank Parameterization for Efficient Bayesian Neural Networks ↓ This talk introduces a computationally efficient approach to Bayesian Neural Networks using low-rank weight parameterization. By decomposing weights as MAP estimates plus structured low-rank updates, we drastically reduce parameter count while preserving uncertainty quantification quality. Our experiments demonstrate that rank-1 and rank-2 models significantly lower computational costs while maintaining reasonable performance, with higher ranks offering improved uncertainty estimates at modest additional expense. (Online)

Tuesday, March 11
07:00 - 08:45	Mentorship Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
09:00 - 09:30	Trevor Campbell: Making Variational Inference Work for Statisticians: Parallel Tempering with a Variational Reference (TCPL 201)
09:30 - 09:45	Ryan Giordano: Targeted simulation--based inference for efficient posterior marginal estimation ↓ In Bayesain causal inference, practitioners are often interested only in a low--dimensional marginal of a high--dimensional posterior, such as a single causal effect in a model that controls for a large number of confounders. But most standard posterior approximation techniques require estimation of the entire high--dimensional posterior to accurately compute even a single marginal, typically at considerable computational cost. In this work, we view marginal estimation in Bayesian causal inference as an intractable likelihood problem, which we address using two well--known techniques from simulation--based inference (SBI): approximate Bayesian computation (ABC) and neural ratio estimation (NRE). We show that the ABC and NRE approximations can be used to correct one another, giving a simple, combined procedure that is more invariant to hyperparameter tuning, computationally efficient, and accurate than either ABC or NRE alone. We demonstrate our techniques on simulated and applied examples. (TCPL 201)
09:45 - 10:00	Jonathan Huggins: Random kernel MCMC ↓ Noisy MCMC algorithms rely on various types of stochastic estimates of log-likelihoods to make accept/reject decisions. They have found applications to large-scale inference problems and approximate sampling from doubly intractable distributions. I introduce a new type of noisy MCMC algorithm called random kernel MCMC. Examples where random kernel MCMC can be used include multiple data imputation, accounting for data uncertainty, improving UQ when using likelihood emulation, and estimating cut posteriors. I provide some simple approximation error bounds and discuss directions for further work. (TCPL 201)
10:00 - 10:30	Coffee Break (TCPL Foyer)
10:30 - 10:45	Christian Andersson Naesseth: SDE Matching: Scalable Variational Inference for Latent Stochastic Differential Equations ↓ The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation through approximate SDE solutions, which limit scalability. In this work, we propose SDE Matching, a new simulation-free method for training Latent SDEs. Inspired by modern Score- and Flow Matching algorithms for learning generative dynamics, we extend these ideas to the domain of stochastic dynamics for time series and sequence modeling, eliminating the need for costly numerical simulations. Our results demonstrate that SDE Matching achieves performance comparable to adjoint sensitivity methods while drastically reducing computational complexity.The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation through approximate SDE solutions, which limit scalability. In this work, we propose SDE Matching, a new simulation-free method for training Latent SDEs. Inspired by modern Score- and Flow Matching algorithms for learning generative dynamics, we extend these ideas to the domain of stochastic dynamics for time series and sequence modeling, eliminating the need for costly numerical simulations. Our results demonstrate that SDE Matching achieves performance comparable to adjoint sensitivity methods while drastically reducing computational complexity. (TCPL 201)
10:45 - 11:00	Yixin Wang: Posterior Mean Matching: Generative Modeling through Online Bayesian Inference ↓ This talk will introduce posterior mean matching (PMM), a new method for generative modeling that is grounded in Bayesian inference. PMM uses conjugate pairs of distributions to model complex data of various modalities like images and text, offering a flexible alternative to existing methods like diffusion models. PMM models iteratively refine noisy approximations of the target distribution using updates from online Bayesian inference. PMM is flexible because its mechanics are based on general Bayesian models. We demonstrate this flexibility by developing specialized examples: a generative PMM model of real-valued data using the Normal-Normal model, a generative PMM model of count data using a Gamma-Poisson model, and a generative PMM model of discrete data using a Dirichlet-Categorical model. For the Normal-Normal PMM model, we establish a direct connection to diffusion models by showing that its continuous-time formulation converges to a stochastic differential equation (SDE). Additionally, for the Gamma-Poisson PMM, we derive a novel SDE driven by a Cox process, which is a significant departure from traditional Brownian motion-based generative models. PMMs achieve performance that is competitive with generative models for language modeling and image generation. (TCPL 201)
11:00 - 11:30	Krishnakumar Balasubramanian: Improved rates for Stein Variational Gradient Descent ↓ We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy (𝖪𝖲𝖣) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of N particle locations and the N-fold product target measure, starting from a regular initial distribution, splits into a dominant `negative part' proportional to N times the expected 𝖪𝖲𝖣2 and a smaller `positive part'. This observation leads to 𝖪𝖲𝖣 rates of order 1/sqrt{N}, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi and Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension d. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of `bilinear + Matérn' kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws. (Online)
11:30 - 13:00	Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
13:00 - 13:30	Minh-Ngoc Tran: Particle Flow Variational Inference Inspired by Wasserstein Gradient Flow ↓ Variational Inference (VI) aims to approximate a complex distribution π by framing Bayesian computation as an optimization problem. One recent advancing approach involves traversing a set of particles to approximate π by minimizing the Kullback–Leibler (KL) divergence of the particles’ measure to π. The continuity evolution of the measures is the Wasserstein gradient flow of the KL divergence to π. For which, the Jordan- Kinderlehrer-Otto (JKO) scheme serves as a discretization scheme for approximating the gradient flow, with results indicating that the limiting solution of the scheme converges to the Fokker–Planck equation. In each iteration, the JKO scheme computes an optimal map and its corresponding pushforward measure, and this map is the gradient of a convex function. However, in practical applications, this map is often intractable and requires estimation. Our work considers an efficient way to model this map. We study the resulting particle-based Bayesian computation algorithms, termed Particle Flow VI (PFVI), and provides a convergence analysis on this class of algorithms. (TCPL 201)
13:30 - 13:45	Cheng Zhang: Particle-based Variational Inference with Generalized Wasserstein Gradient Flow ↓ Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this work, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems. (TCPL 201)
13:45 - 14:00	Yian Ma: Reverse diffusion Monte Carlo ↓ I will introduce a novel Monte Carlo sampling approach that uses the reverse diffusion process. In particular, the intermediary updates---the score functions---can be explicitly estimated to arbitrary accuracy, leading to an unbiased Bayesian inference algorithm. (Online)
14:00 - 14:15	Zuheng Xu: Variational flows with MCMC convergence ↓ Variational flows involves building an invertible map that transforms a simple reference distribution to the target distirbution. However, unlike Markov chain Monte Carlo (MCMC) methods, variational flows typically do not exhibit convergence guarantees. This proposal focuses on a new class of variational family—mixed flows— that not only enables tractable i.i.d. sampling and density evaluation, but also comes with MCMC-like convergence guarantees. Mixed flows consist of a mixture of repeated applications of a measure-preserving and ergodic map to an initial reference distribution. We pro- vide mild conditions under which the variational distribution converges weakly and in total variation to the target as the number of steps in the flow increases; this convergence holds regardless of the value of variational parameters, though parameter values may influence the speed of convergence. We then develop an implementation of the flow family using Hamiltonian dynamics combined with deterministic momentum refreshment. Simulated and real data experiments verify the convergence theory, and demonstrate that the method provides more reliable pos- terior approximations than several black-box normalizing flows, as well as samples of comparable quality to those obtained from state-of-the-art MCMC methods. (TCPL 201)
14:30 - 15:00	Break-Out Sessions (Other (See Description))
15:00 - 15:30	Coffee Break (TCPL Foyer)
15:30 - 16:30	Break-Out Sessions (Other (See Description))
16:30 - 17:30	Mentorship Sessions (Online)
17:30 - 19:30	Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room)

Wednesday, March 12
07:00 - 08:45	Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
09:00 - 10:00	Subhabrata Sen: Introduction to Statistical Physics (TCPL 201)
10:00 - 10:30	Coffee Break (TCPL Foyer)
10:30 - 11:30	Igor Boettcher: Variational Approaches to Quantum Many Body Problems (Interdisciplanary Lecture) ↓ Physics research is often associated with finding the fundamental constituents of matter: breaking molecules down into atoms, those into electrons in nuclei, those in protons and neutrons, those into quarks and gluons, those into strings, and so on. However, a significant amount of physics (and most of theoretical chemistry) research is dominated by the so-called quantum many-body problem. It comprises the dilemma that although we identified the fundamental interactions between, say, atoms and molecules very accurately, we have little computational or analytical means to predict how large collections of atoms and molecules behave---even in thermal equilibrium. This obstacle prevents us from creating new technologies like room-temperature superconductors or energy-efficient materials. A practical solution to the quantum many-body problem would be no short of a revolution in science and technology. Motivated enough? Then let me invite you in this interdisciplinary lecture to an introduction to the quantum many-body problem and some variational approaches to approach it. In particular, I will discuss areas where traditionally Markov Chain Monte Carlo has been a dominant method of choice, and elaborate whether and how Variational Inference may be a suitable advancement in the future. (Online)
11:30 - 13:00	Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
13:30 - 17:30	Free Afternoon (Banff National Park)
17:30 - 19:30	Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room)
19:30 - 21:00	Daniel Andrade: Implementing Recent VI Methods (Interactive Tutorial) ↓ This tutorial will explain the basics of variational inference (VI) from mean-field VI to normalizing flows, and then show how to use/implement VI with PyTorch. The goal of this tutorial is to provide a good starting point for graduate students/researchers interested in applying VI to their Bayesian models, and also to acquire skills in analysis/debugging of VI. Code and other material is available here: https://github.com/andrade-stats/normalizing-flows-vi (Online)

Thursday, March 13
07:00 - 08:45	Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
09:00 - 09:30	Nicolas Chopin: Least-squares variational inference ↓ Variational inference consists in finding the best approximation of a target distribution within a certain family, where ‘best’ means (typically) smallest Kullback-Leiber divergence. We show that, when the approximation family is exponential, the best approximation is the solution of a fixed-point equation. We introduce LSVI (Least-Squares Variational Inference), a Monte Carlo variant of the corresponding fixed-point recursion, where each iteration boils down to ordinary least squares regression, and does not require computing gradients. We show that LSVI is equivalent to stochastic mirror descent; we use this insight to derive convergence guarantees. We introduce various ideas to improve LSVI further when the approximation family is Gaussian, leading to a O(d^3) complexity in the dimension d of the target in the full covariance case, a O(d) complexity in the mean-field case. We show that LSVI outperforms state-of-the-art methods in a range of examples, while remaining gradient-free, that is, it does not require computing gradients. (Online)
09:30 - 10:00	Yingzhen Li: Neural flow shortcut sampler (Online)
10:00 - 10:30	Coffee Break (TCPL Foyer)
10:30 - 11:00	Jeremias Knoblauch: Near-Optimal Approximations for Bayesian Inference in Function Space ↓ We propose a scalable inference algorithm for Bayes posteriors defined on a reproducing kernel Hilbert space (RKHS). Given a likelihood function and a Gaussian random element representing the prior, the corresponding Bayes posterior measure can be obtained as the stationary distribution of an RKHS-valued Langevin diffusion. We approximate the infinite-dimensional Langevin diffusion via a projection onto the first M components of the Kosambi–Karhunen–Loeve expansion. Exploiting the thus obtained approximate posterior for these M components, we perform inference for ΠB by relying on the law of total probability and a sufficiency assumption. The resulting method scales as $\mathcal{O}(M^3 + JM^2)$ , where $J$ is the number of samples produced from the posterior measure ΠB. Interestingly, the algorithm recovers the posterior arising from the sparse variational Gaussian process (SVGP) (Titsias, 2009) as a special case—owed to the fact that the sufficiency assumption underlies both methods. However, whereas the SVGP is parametrically constrained to be a Gaussian process, our method is based on a non-parametric variational family $\mathcal{P}(\mathbb{R}^M)$ consisting of all probability measures on $\mathbb{R}^M$ . As a result, our method is provably close to the optimal $M$ -dimensional variational approximation of the Bayes posterior in $\mathcal{P}(\mathbb{R}^M)$ for convex and Lipschitz continuous negative log likelihoods, and coincides with SVGP for the special case of a Gaussian error likelihood. (TCPL 201)
11:00 - 11:30	Samuel Livingstone: Linear preconditioning in sampling & optimisation ↓ I'll discuss some recent work in which we seek to quantify the effects of linear preconditioning for well-conditioned distributions, mostly focusing on MCMC sampling, but also of relevance for variational approaches such as CAVI and SVGD. Preconditioning is modifying the posterior so that it is more amenable to sampling/optimization approaches. Linear preconditioning is the most common choice - we pre multiply the state variable by a constant matrix. It's a common option in e.g. Stan, where the preconditioner is the mass matrix. We leverage recent results on algorithm mixing times/convergence rates for well-conditioned distributions to show scenarios in which commonly used preconditioners will and will not improve convergence. In particular we find examples where diagonal approaches - the default in most software - can make the problem worse than doing nothing, while a dense preconditioner can significantly improve performance in the same setting. If time permits I will also discuss more recent work designing subquadratic linear preconditioners that can perform well in the presence of high correlation. This is joint work with UCL PhD student Max Hird. (Online)
11:30 - 13:00	Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
13:00 - 13:30	Botond Szabo: Frequentist properties of Variational Gaussian Processes ↓ We investigate the frequentist properties of inducing variable variational Gaussian Processes. We provide lower and upper bound on the number of inducing variables required to achieve rate optimal posterior contraction. We also show that this variational Bayes approach can result in reliable uncertainty quantification and in general, in contrast to mean-field methods, it provides more conservative uncertainty statements. We also discuss algorithms the connection to conjugate gradient descent and Lanczos iteration and its application to linear and non-linear inverse problems. The talk is based on joint works with Harry van Zanten (VU Amsterdam), Dennis Nieman (Humboldt University), Bernhard Stankewitz (Potsdam University), and Thibault Randrianarisoa (Vector Institute, Toronto). (TCPL 201)
13:30 - 14:00	Sumit Mukherjee: CLT in HD Bayesian linear regression ↓ In this talk we study a CLT for linear statistics of the posterior in high dimensional Bayesian linear regression with an iid prior. In contrast to the existing literature which focuses on the high SNR regime where the prior washes away and one obtains Bernstein-von-Mises type results, our work focuses on the low SNR regime where the prior has a significant effect. As application of our result, we derive asymptotic coverage of posterior credible intervals when the prior is mis-specified. This talk is based on joint work with Seunghyun Lee (Columbia) and Nabarun Deb (UChicago) (Online)
14:00 - 14:15	Kamelia Daudel: Learning with Importance Weighted Variational Inference ↓ Several popular variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund (ELBO) in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE) and the Variational Rényi (VR) bounds. The methodology to learn the parameters of interest using these bounds typically amounts to running gradient-based variational inference algorithms that incorporate the reparameterization trick. However, the way the choice of the variational bound impacts the outcome of variational inference algorithms can be unclear. In this talk, we will present and motivate the VR-IWAE bound, a novel variational bound that unifies the ELBO, IWAE and VR bounds methodologies. In particular, we will provide asymptotic analyses for the VR-IWAE bound and its reparameterized gradient estimator, which reveal the advantages and limitations of the VR-IWAE bound methodology while enabling us to compare of the ELBO, IWAE and VR bounds methodologies. Our work advances the understanding of importance weighted variational inference methods and we will illustrate our theoretical findings empirically. (TCPL 201)
14:15 - 14:30	Anya Katsevich: Accuracy of High-Dimensional Laplace Approximations ↓ I will present work on higher-order approximations to posterior normalizing constants (and Laplace-type integrals more generally) in the high-dimensional regime. I will also present both upper and lower bounds on the accuracy of the Laplace approximation (LA) to high-dimensional posterior densities. Finally, I will present a skew correction to the LA which is provably higher-order accurate in high dimensions. (Online)
14:30 - 15:00	Break-Out Sessions (Other (See Description))
15:00 - 15:30	Coffee Break (TCPL Foyer)
15:30 - 17:30	Break-Out Sessions (Other (See Description))
17:30 - 19:30	Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room)

Friday, March 14
07:00 - 08:45	Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room)
09:00 - 10:00	Closing Remarks (TCPL 201)
10:00 - 10:30	Coffee Break (TCPL Foyer)
10:30 - 11:00	Checkout by 11AM ↓ 5-day workshop participants are welcome to use BIRS facilities (TCPL ) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 11AM. (Front Desk - Professional Development Centre)
12:00 - 13:30	Lunch from 11:30 to 13:30 (Vistas Dining Room)