Thursday, December 14 |
07:00 - 08:30 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Xianghu Lake National Tourist Resort (Dining Hall - Yuxianghu Hotel(御湘湖酒店餐厅)) |
09:00 - 09:30 |
Marten Wegkamp: Discriminant analysis in high-dimensional Gaussian mixtures ↓ We consider binary classification of high-dimensional features under a postulated model with a low-dimensional latent Gaussian mixture structure and non-vanishing noise. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier and we prove that the obtained rates are optimal, up to some logarithmic factor, in the minimax sense. In the second part of the talk, we retain all PCs to estimate the direction of the optimal separating hyperplane. The estimated hyperplane is shown to interpolate on the training data. While the direction vector can be consistently estimated as could be expected from recent results in linear regression, a naive plug-in estimate fails to consistently estimate the intercept. A simple correction, that requires an independent hold-out sample, renders the procedure consistent and even minimax optimal in many scenarios. The interpolation property of the latter procedure can be retained, but surprisingly depends on the way we encode the labels. (Zoom (Online)) |
09:30 - 10:00 |
Yuxin Tao: Homogeneity pursuit in ranking inferences based on pairwise comparison data ↓ The Bradley-Terry-Luce (BTL) model is one of the most celebrated models for ranking inferences based on pairwise comparison data, which associates individuals with latent preference scores and produces ranks. An important question that arises is the uncertainty quantification for ranks. It is natural to think that ranks for two individuals are not trustworthy if there is only a subtle difference in their preference scores. In this paper, we explore the homogeneity of scores in the BTL model, which assumes that individuals cluster into groups with the same preference scores. We introduce the clustering algorithm in regression via data-driven segmentation (CARDS) penalty into the likelihood function, which can rigorously and effectively separate parameters and uncover group structure. Statistical properties of two versions of CARDS are analyzed. As a result, we achieve a faster convergence rate and sharper confidence intervals for the maximum likelihood estimation (MLE) of preference scores, providing insight into the power of exploring low-dimensional structure in a high-dimensional setting. Real data, including NBA basketball ranking and netflix movie ranking, are analyzed, which demonstrate the improved prediction performance and interpretation ability of our method. (Lecture Hall - Academic island(定山院士岛报告厅)) |
10:00 - 10:30 |
Emre Demirkaya: Optimal nonparametric inference with two-scale distributional nearest neighbors ↓ The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth-order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing finite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application. (Zoom (Online)) |
10:30 - 11:00 |
Coffee Break (Lecture Hall - Academic island(定山院士岛报告厅)) |
11:00 - 11:30 |
Wei Lin: Nonasymptotic theory for two-layer neural networks: Beyond the bias-variance trade-off ↓ Large neural networks have proved remarkably effective in modern deep learning practice, even in the overparametrized regime where the number of active parameters is large relative to the sample size. This contradicts the classical perspective that a machine learning model must trade off bias and variance for optimal generalization. To resolve this conflict, we present a nonasymptotic generalization theory for two-layer neural networks with ReLU activation function by incorporating scaled variation regularization. Interestingly, the regularizer is equivalent to ridge regression from the angle of gradient-based optimization, but plays a similar role to the group lasso in controlling the model complexity. By exploiting this âridge-lasso duality,â we obtain new prediction bounds for all network widths, which reproduce the double descent phenomenon. Moreover, the overparametrized minimum risk is lower than its underparametrized counterpart when the signal is strong, and is nearly minimax optimal over a suitable class of functions. By contrast, we show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal. (Lecture Hall - Academic island(定山院士岛报告厅)) |
11:30 - 12:00 |
Lan Gao: Robust knockoff inference with coupling ↓ We investigate the robustness of the model-X knockoffs framework with respect to
the misspecified or estimated feature distribution.
We achieve such a goal by theoretically studying the feature selection performance of a practically implemented knockoffs
algorithm, which we name as the approximate knockoffs (ARK) procedure, under the
measures of the false discovery rate (FDR) and family wise error rate (FWER). The
approximate knockoffs procedure differs from the model-X knockoffs procedure only in
that the former uses the misspecified or estimated feature distribution. A key technique
in our theoretical analyses is to couple the approximate knockoffs procedure with the
model-X knockoffs procedure so that random variables in these two procedures can be
close in realizations. We prove that if such coupled model-X knockoffs procedure exists,
the approximate knockoffs procedure can achieve the asymptotic FDR or FWER control
at the target level. We showcase three specific constructions of such coupled model-X
knockoff variables, verifying their existence and justifying the robustness of the model-X
knockoffs framework. (Lecture Hall - Academic island(定山院士岛报告厅)) |
12:00 - 13:30 |
Lunch (Dining Hall - Academic island(定山院士岛餐厅)) |
13:30 - 14:00 |
Qiong Zhang: Distributed learning of finite mixture models ↓ Advances in information technology have led to extremely large datasets that are often kept in different storage centers.
Existing statistical methods must be adapted to overcome the resulting computational obstacles
while retaining statistical validity and efficiency.
In this situation, the split-and-conquer strategy is among the most effective solutions to many statistical problems,
including quantile processes, regression analysis, principal eigenspaces, and exponential families.
This paper applies this strategy to develop a distributed learning procedure of finite Gaussian mixtures.
We recommend a reduction strategy and invent an effective majorization-minimization algorithm.
The new estimator is consistent and retains root-n consistency under some general conditions.
Experiments based on simulated and real-world datasets show that the proposed
estimator has comparable statistical performance with the global estimator based on the full dataset,
if the latter is feasible.
It can even outperform the global estimator for the purpose of clustering
if the model assumption does not fully match the real-world data.
It also has better statistical and computational performance than some existing split-and-conquer approaches. (Lecture Hall - Academic island(定山院士岛报告厅)) |
14:00 - 14:30 |
Jinchi Lv: SOFARI: High-dimensional manifold-based inference ↓ Multi-task learning is a widely used technique for harnessing information from various tasks. Recently, the sparse orthogonal factor regression (SOFAR) framework, based on the sparse singular value decomposition (SVD) within the coefficient matrix, was introduced for interpretable multi-task learning, enabling the discovery of meaningful latent feature-response association networks across different layers. However, conducting precise inference on the latent factor matrices has remained challenging due to orthogonality constraints inherited from the sparse SVD constraint. In this paper, we suggest a novel approach called high-dimensional manifold-based SOFAR inference (SOFARI), drawing on the Neyman near-orthogonality inference while incorporating the Stiefel manifold structure imposed by the SVD constraints. By leveraging the underlying Stiefel manifold structure, SOFARI provides bias-corrected estimators for both latent left factor vectors and singular values, for which we show to enjoy the asymptotic mean-zero normal distributions with estimable variances. We introduce two SOFARI variants to handle strongly and weakly orthogonal latent factors, where the latter covers a broader range of applications. We illustrate the effectiveness of SOFARI and justify our theoretical results through simulation examples and a real data application in economic forecasting. This is a joint work with Yingying Fan, Zemin Zheng and Xin Zhou. (Zoom (Online)) |
14:30 - 15:00 |
Coffee break (Lecture Hall - Academic island(定山院士岛报告厅)) |
15:00 - 15:30 |
Puying Zhao: Augmented two-step estimating equations with nuisance functionals and complex survey data ↓ Statistical inference in the presence of nuisance functionals with complex survey data is an important topic in social and economic studies. The Gini index, Lorenz curves and quantile shares are among the commonly encountered examples. The nuisance functionals are usually handled by a plug-in nonparametric estimator and the main inferential procedure can be carried out through a two-step generalized empirical likelihood method. Unfortunately, the resulting inference is not efficient and the nonparametric version of the Wilks' theorem breaks down even under simple random sampling. We propose an augmented estimating equations method with nuisance functionals and complex surveys. The second-step augmented estimating functions obey the Neyman orthogonality condition and
automatically handle the impact of the first-step plug-in estimator, and the resulting estimator of the main parameters of interest is invariant to the first step method. More importantly, the generalized empirical likelihood based Wilks' theorem holds for the main parameters of interest under the design-based framework for commonly used survey designs, and the maximum generalized empirical likelihood estimators achieve the semiparametric efficiency bound. Performances of the proposed methods are demonstrated through simulation studies and an application using the dataset from the New York City Social Indicators Survey.
This is joint work with Changbao Wu (Lecture Hall - Academic island(定山院士岛报告厅)) |
15:30 - 16:00 |
Jyoti U. Devkota: Identification of latent structures in qualitative variables – Examples from renewable energy users of Nepal ↓ This study is based on data collected from two sample surveys. They are namely survey of 300 households of national grid energy users and 400 households of biogas users. It was conducted in three different rural settings of Nepal. The responses to questions were classified into multiple choice options. This generated categorical data and reduced ambiguity and confusion between interviewer and interviewee. Such data were classified into ordinal scale and modeled. As the dependent variable had more than two categories, polytomous and not dichotomous models are developed and fitted. Ten different hypotheses assessing and measuring the energy consumption dynamics are tested. Values of parameters of these model and odds ratio are used in quantifying the impact of change with respect to energy consumption. The variables considered were namely time spent in the collection of firewood, type of house, amount of firewood saved, time saved, employer and school located within 15 min distance. Such data-based studies are very crucial for country like Nepal which lacks a strong backbone of accurate and regularly updated official records. These studies can be generalized to other countries of Asia and Africa. The results obtained can provide guidelines to policy makers and planners regarding formulation of realistic energy policies for such countries. (Lecture Hall - Academic island(定山院士岛报告厅)) |
16:00 - 16:30 |
Yukun Liu: Navigating challenges in classification and outlier detection: a remedy based on semi-parametric density ratio models ↓ The goal of classification is to assign categorical labels to unlabelled test data based on patterns and relationships learned from a labeled training dataset.Yet this task become challenging when the training data and the test data exhibit distributional mismatches. The unlabelled test data follow a finite mixture model, which is not identifiable without any model assumptions. In this paper, we propose to model the test data by a finite semiparametric mixture model under density ratio model, and construct a semiparametric empirical likelihood prediction set (SELPS) for the labels in the test data. Our approach tries to optimize the out-of-sample performance, aiming to include the correct class and to detect outliers as often as possible. It has the potential to enhance the robustness and effectiveness of classification models when dealing with varying distributions between training and test data. Our method circumvents a stringent separation assumption between training data and outliers, which is required by Guan and Tibshirani (2022) but is often violated by commonly-used distributions. We prove asymptotic consistency and normalities of our parameter estimators and asymptotic optimality of the proposed SELPS. We illustrate our methods by analyzing four real-world datasets. (Lecture Hall - Academic island(定山院士岛报告厅)) |
18:00 - 20:30 |
Dinner (Dining Hall - Yuxianghu Hotel(御湘湖酒店餐厅)) |