Monday, March 10 |
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
08:45 - 09:00 |
Introduction and Welcome by BIRS Staff ↓ A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions. (TCPL 201) |
09:00 - 10:00 |
Panel Discussion: Challenges & Opportunities in VI (TCPL 201) |
10:00 - 10:30 |
Coffee Break (TCPL Foyer) |
10:30 - 11:00 |
Tamara Broderick: Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box ↓ Automatic differentiation variational inference (ADVI) offers fast and easy-to-use posterior approximation in multiple modern probabilistic programming languages. However, its stochastic optimizer lacks clear convergence criteria and requires tuning parameters. Moreover, ADVI inherits the poor posterior uncertainty estimates of mean-field variational Bayes (MFVB). We introduce "deterministic ADVI" (DADVI) to address these issues. DADVI replaces the intractable MFVB objective with a fixed Monte Carlo approximation, a technique known in the stochastic optimization literature as the "sample average approximation" (SAA). By optimizing an approximate but deterministic objective, DADVI can use off-the-shelf second-order optimization, and, unlike standard mean-field ADVI, is amenable to more accurate posterior covariances via linear response (LR). In contrast to existing worst-case theory, we show that, on certain classes of common statistical problems, DADVI and the SAA can perform well with relatively few samples even in very high dimensions, though we also show that such favorable results cannot extend to variational approximations that are too expressive relative to mean-field ADVI. We show on a variety of real-world problems that DADVI reliably finds good solutions with default settings (unlike ADVI) and, together with LR covariances, is typically faster and more accurate than standard ADVI. (Online) |
11:00 - 11:15 |
Diana Cai: Batch and match: score-based approaches to black-box variational inference ↓ Probabilistic modeling is a cornerstone of modern data analysis, uncertainty quantification, and decision making. A key challenge of probabilistic inference is computing a target distribution of interest, which is often intractable. Variational inference (VI) is a popular method that has enabled scalable probabilistic inference across a range of applications in science and engineering. Black-box variational inference (BBVI) algorithms, in which the target needs only to be available as the log of an unnormalized distribution that is typically differentiable, are becoming a major focus of VI due to their ease of application. Moreover, thanks to advances in automatic differentiation, BBVI algorithms are now widely available in popular probabilistic programming languages, providing automated VI algorithms to data analysis practitioners. But gradient-based methods can be plagued by high-variance gradients and sensitivity to hyperparameters of the learning algorithms; these issues are further exacerbated when using richer variational families that model the correlations between different latent variables.
In this work, we present “batch and match” (BaM), an alternative approach to BBVI that uses a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization. Finally, we conclude with a discussion of extensions of score-based BBVI to high-dimensional settings, large data settings, and non-Gaussian variational families based on orthogonal function expansions.
This is joint work with David M. Blei (Columbia University), Robert M. Gower (Flatiron Institute), Charles C. Margossian (Flatiron Institute), Chirag Modi (New York University), Loucas Pillaud-Vivien (Ecole des Ponts ParisTech), and Lawrence K. Saul (Flatiron Institute). (Online) |
11:15 - 11:30 |
Marusic Juraj: Black-Box VI estimates the gradient by sampling. Is it differentially private by default? ↓ There has been an increasing body of work on privatizing variational inference methods. These methods typically rely on adding external noise to the optimization procedures to mask the influence of any single individual's data, similar to noisy stochastic gradient descent. At the same time, these large-scale optimization techniques already use random sampling when estimating the gradients of the ELBO. Here, we study the extent of privacy preserved by this inherent random sampling during the Black-Box Variational Inference procedure and investigate whether the injection of additional noise is necessary to ensure differential privacy. (TCPL 201) |
11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
13:00 - 13:30 |
Camila de Souza: Clustering Functional Data via Variational Inference ↓ Among different functional data analyses, clustering analysis aims to determine underlying groups of curves in the dataset when there is no information on the group membership of each curve. In this work, we develop a novel variational Bayes (VB) algorithm for clustering and smoothing functional data simultaneously via a B-spline regression mixture model with random intercepts. We employ the deviance information criterion to select the best number of clusters. The proposed VB algorithm is evaluated and compared with other methods (k-means, functional k-means and two other model-based methods) via a simulation study under various scenarios. We apply our proposed methodology to two publicly available datasets. We demonstrate that the proposed VB algorithm achieves satisfactory clustering performance in both simulation and real data analyses. (TCPL 201) |
13:30 - 13:45 |
Chengqian Xian: Fast Variational Bayesian Inference for Survival Data Using Log-Logistic Accelerated Failure Time Models ↓ Survival analysis plays a crucial role in various scientific and medical research areas, with the log-logistic accelerated failure time (AFT) model being a widely used approach for modeling survival data. While frequentist and Markov Chain Monte Carlo (MCMC)-based Bayesian inference methods have been extensively developed for such models, they often come with computational challenges. In this work, we propose an efficient mean-field variational Bayes (VB) algorithm for parameter estimation in log-logistic AFT models. Our VB approach incorporates a piecewise approximation technique to achieve conjugacy, enabling computationally efficient inference.
We extend our methodology to correlated survival data, which frequently arises in clinical applications. We focus on clustered survival data, where patients within the same unit exhibit similar characteristics, leading to intra-cluster dependence. To accommodate this structure, we introduce a shared frailty log-logistic AFT model with a cluster-specific random intercept. We evaluate our proposed VB algorithm through extensive simulations under various settings, comparing its performance with frequentist, h-likelihood, and MCMC-based methods. Our results demonstrate that the VB algorithm consistently achieves competitive accuracy while significantly reducing computational cost - achieving an average speedup of up to 300 times compared to MCMC.
We illustrate the practical benefits of our method through an application to invasive mechanical ventilation duration data collected from intensive care units (ICUs) in Ontario, Canada. Our analysis reveals the impact of ICU site-level effects on ventilation duration while leveraging the computational efficiency of VB for rapid inference. The proposed framework offers a scalable and effective alternative for analyzing both independent and correlated survival data, making it particularly valuable in large-scale biomedical applications. (TCPL 201) |
13:45 - 14:00 |
Ana Carolina da Cruz: Variational Bayes for Basis Function Selection for Functional Data Representation with Correlated Errors ↓ Functional data analysis (FDA) has found extensive application across various fields, driven by the increasing recording of data continuously over a time interval or at several discrete points. FDA provides the statistical tools specifically designed for handling such data. Over the past decade, Variational Bayes (VB) algorithms have gained popularity in FDA, primarily due to their speed advantages over MCMC methods. This work proposes a VB algorithm for basis function selection for functional data representation while allowing for a complex error covariance structure. We assess and compare the effectiveness of our proposed VB algorithm with MCMC via simulations. We also apply our approach to a publicly available dataset. Our results show the accuracy in coefficient estimation and the efficacy of our VB algorithm to find the true set of basis functions. Notably, our proposed VB algorithm demonstrates a performance comparable to MCMC but with substantially reduced computational cost. (TCPL 201) |
14:00 - 14:20 |
Group Photo ↓ Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo! (TCPL Foyer) |
14:30 - 14:45 |
Shrijita Bhattacharya: Variational Inference Aided Variable Selection For Spatially Structured High Dimensional Covariates ↓ We consider the problem of Bayesian high dimensional variable selection under linear regression when a spatial structure exists among the covariates. We use an Ising prior to model the structural connectivity of the covariates with an undirected graph and the connectivity strength with Ising distribution parameters. Ising models which originated in statistical physics, is widely used in computer vision and spatial data modeling. Although a Gibbs solution to this problem exists, the solution involves the computation of determinants and inverses of high dimensional matrices rendering it unscalable to higher dimensions. Further, the lack of theoretical support limits this important tool's use for the broader community. This paper proposes a variational inference-aided Gibbs approach that enjoys the same variable recovery power as the standard Gibbs solution all the while being computationally scalable to higher dimensions. We establish strong selection consistency of our proposed approach together with its competitive numerical performance under varying simulation scenarios. (TCPL 201) |
14:45 - 15:00 |
Gonzalo Mena: Approximate Bayesian methods for pseudo-time inference based on relaxed permutations ↓ The problem of pseudotime inference can be casted as an instance of the classical ordination or seriation problem, which in turn is an example of inference in a factor model. In modern applications, such as in the study and modeling of Alzheimer’s disease (AD) progression from comprehensive neuropathological measurements (that we study here), such factor models are extremely complex as they require the inference of factor-specific non-linearities, deviating significantly from the original formulation.
We show that it is possible to recast this problem as the simultaneous inference of a latent ordering (i.e. a permutation) and the non-parametric inference of factor-specific increasing functions. In the Bayesian setting, this formulation implies the immediate computational challenge of the posterior inference of a permutation-based distribution, a notoriously hard problem. We surmount this challenge by implementing an approximated sampler that relaxes this distribution to the continuum. This distribution is described in terms of the so-called Sinkhorn algorithm and can be implemented seamlessly in the usual probabilistic programming pipelines, and can profit from GPU acceleration.
We show that our method and associated computational pipeline lead to better modeling (and improved fit) of the relevant SEA-AD cohort dataset. (TCPL 201) |
15:00 - 15:30 |
Coffee Break (TCPL Foyer) |
15:30 - 15:45 |
Lydia Gabirc: Addressing Antidiscrimination with Variational Inference ↓ Within the insurance industry, evolving antidiscrimination regulations have led insurers to exclude protected information in pricing calculations. With the rise of complex algorithmic methods and big data, insurers face the dual challenge in complying with regulatory standards while maintaining reasonable estimates. Many researchers have proposed methods to achieve discrimination-free pricing and varying fairness measures, but these techniques often require protected information at the individual-level which is often inaccessible. In this work, we propose to study the indirect discrimination via a hierarchical finite mixture model with latent protected variables. The hierarchical structure of the model will be represented through the prior specification, and the posterior distribution is used to impute the unobserved sensitive variable through the indirect discriminatory dependence structure. To tackle the indirect discrimination existing in the true posterior, we propose the use of variational inference with a mean-field variational family that enforces independence between unknown variables, eliminating the indirect discrimination by definition. A further step of importance sampling is incorporated to achieve insurance unbiasedness with the optimized discrimination-free distribution. Our method is supported with a simulation study inspired by real insurance data. (TCPL 201) |
16:00 - 17:30 |
Break-Out Sessions (Other (See Description)) |
17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |
19:30 - 20:00 |
Daniel Andrade: Stabilizing training of affine coupling layers for high-dimensional variational inference ↓ Variational inference with normalizing flows is an increasingly popular alternative to MCMC methods. In particular, normalizing flows based on affine coupling layers (Real NVPs) are frequently used due to their good empirical performance. In theory, increasing the depth of normalizing flows should lead to more accurate posterior approximations. However, in practice, training deep normalizing flows for approximating high-dimensional posterior distributions is often infeasible due to the high variance of the stochastic gradients.
In this talk, I will first recap variational inference and Real NVPs and then present several
well-known and less-known/new techniques to achieve stable training of Real NVPs. In particular, we propose a combination of two methods: (1) soft-thresholding of the scale in Real NVPs, and (2) a bijective soft log transformation of the samples.
Finally, I will present our evaluation of these and other previously proposed modification on several challenging target distributions, including a high-dimensional horseshoe logistic regression model. Our experiments show that with our modifications, stable training of Real NVPs for posteriors with several thousand dimensions and heavy tails is possible, allowing for more accurate marginal likelihood estimation via importance sampling. (Online) |
20:00 - 20:15 |
Jin Yongshan: Low-Rank Parameterization for Efficient Bayesian Neural Networks ↓ This talk introduces a computationally efficient approach to Bayesian Neural Networks using low-rank weight parameterization. By decomposing weights as MAP estimates plus structured low-rank updates, we drastically reduce parameter count while preserving uncertainty quantification quality. Our experiments demonstrate that rank-1 and rank-2 models significantly lower computational costs while maintaining reasonable performance, with higher ranks offering improved uncertainty estimates at modest additional expense. (Online) |