Statistics Theory
- [1] arXiv:2405.12260 [pdf, ps, other]
-
Title: On an upper bound of the set of copulas with a given curvilinear sectionSubjects: Statistics Theory (math.ST); Probability (math.PR)
The characterizations when two natural upper bounds of the set of copulas with a given diagonal section are copulas have been well studied in the literature. Given a curvilinear section, however, there is only a partial result concerning the characterization when a natural upper bound of the set of copulas is a copula. In this paper, we completely solve the characterization problem for this natural upper bound to be a copula in the curvilinear case.
- [2] arXiv:2405.12343 [pdf, ps, other]
-
Title: Determine the Number of States in Hidden Markov Models via Marginal LikelihoodSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem, which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covariance. In this paper, we propose a consistent method for determining the number of hidden states of HMM based on the marginal likelihood, which is obtained by integrating out both the parameters and hidden states. Moreover, we show that the model selection problem of HMM includes the order selection problem of finite mixture models as a special case. We give rigorous proof of the consistency of the proposed marginal likelihood method and provide an efficient computation method for practical implementation. We numerically compare the proposed method with the Bayesian information criterion (BIC), demonstrating the effectiveness of the proposed marginal likelihood method.
- [3] arXiv:2405.12567 [pdf, ps, other]
-
Title: Marginal and training-conditional guarantees in one-shot federated conformal predictionPierre Humbert (LMO, CELESTE), Batiste Le Bars (ARGO, DI-ENS), Aurélien Bellet (PREMEDICAL, UM), Sylvain Arlot (LMO, CELESTE, IUF)Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
We study conformal prediction in the one-shot federated learning setting. The main goal is to compute marginally and training-conditionally valid prediction sets, at the server-level, in only one round of communication between the agents and the server. Using the quantile-of-quantiles family of estimators and split conformal prediction, we introduce a collection of computationally-efficient and distribution-free algorithms that satisfy the aforementioned requirements. Our approaches come from theoretical results related to order statistics and the analysis of the Beta-Beta distribution. We also prove upper bounds on the coverage of all proposed algorithms when the nonconformity scores are almost surely distinct. For algorithms with training-conditional guarantees, these bounds are of the same order of magnitude as those of the centralized case. Remarkably, this implies that the one-shot federated learning setting entails no significant loss compared to the centralized case. Our experiments confirm that our algorithms return prediction sets with coverage and length similar to those obtained in a centralized setting.
New submissions for Wednesday, 22 May 2024 (showing 3 of 3 entries )
- [4] arXiv:2405.12293 (cross-list from cs.DS) [pdf, ps, other]
-
Title: Exact Random Graph Matching with Multiple GraphsComments: 20 pages, 3 figuresSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Statistics Theory (math.ST)
This work studies fundamental limits for recovering the underlying correspondence among multiple correlated random graphs. We identify a necessary condition for any algorithm to correctly match all nodes across all graphs, and propose two algorithms for which the same condition is also sufficient. The first algorithm employs global information to simultaneously match all the graphs, whereas the second algorithm first partially matches the graphs pairwise and then combines the partial matchings by transitivity. Both algorithms work down to the information theoretic threshold. Our analysis reveals a scenario where exact matching between two graphs alone is impossible, but leveraging more than two graphs allows exact matching among all the graphs. Along the way, we derive independent results about the k-core of Erdos-Renyi graphs.
- [5] arXiv:2405.12553 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Uncertainty quantification by block bootstrap for differentially private stochastic gradient descentSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Statistics Theory (math.ST); Computation (stat.CO)
Stochastic Gradient Descent (SGD) is a widely used tool in machine learning. In the context of Differential Privacy (DP), SGD has been well studied in the last years in which the focus is mainly on convergence rates and privacy guarantees. While in the non private case, uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors, these procedures cannot be transferred to differential privacy due to multiple queries to the private data. In this paper, we propose a novel block bootstrap for SGD under local differential privacy that is computationally tractable and does not require an adjustment of the privacy budget. The method can be easily implemented and is applicable to a broad class of estimation problems. We prove the validity of our approach and illustrate its finite sample properties by means of a simulation study. As a by-product, the new method also provides a simple alternative numerical tool for UQ for non-private SGD.
- [6] arXiv:2405.12958 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Online Learning of Halfspaces with Massart NoiseSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
We study the task of online learning in the presence of Massart noise. Instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context $\mathbf{x}$ is selected adversarially but the label $y$ presented to the learner disagrees with the ground-truth label of $\mathbf{x}$ with unknown probability at most $\eta$. We study the fundamental class of $\gamma$-margin linear classifiers and present a computationally efficient algorithm that achieves mistake bound $\eta T + o(T)$. Our mistake bound is qualitatively tight for efficient algorithms: it is known that even in the offline setting achieving classification error better than $\eta$ requires super-polynomial time in the SQ model.
We extend our online learning model to a $k$-arm contextual bandit setting where the rewards -- instead of satisfying commonly used realizability assumptions -- are consistent (in expectation) with some linear ranking function with weight vector $\mathbf{w}^\ast$. Given a list of contexts $\mathbf{x}_1,\ldots \mathbf{x}_k$, if $\mathbf{w}^*\cdot \mathbf{x}_i > \mathbf{w}^* \cdot \mathbf{x}_j$, the expected reward of action $i$ must be larger than that of $j$ by at least $\Delta$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k)~ \Delta T - o(T)$ bigger than choosing a random action at every round.
Cross submissions for Wednesday, 22 May 2024 (showing 3 of 3 entries )
- [7] arXiv:2210.14086 (replaced) [pdf, ps, other]
-
Title: A Global Wavelet Based Bootstrapped Test of Covariance StationaritySubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We propose a covariance stationarity test for an otherwise dependent and possibly globally non-stationary time series. We work in a generalized version of the new setting in Jin, Wang and Wang (2015), who exploit Walsh (1923) functions in order to compare sub-sample covariances with the full sample counterpart. They impose strict stationarity under the null, only consider linear processes under either hypothesis in order to achieve a parametric estimator for an inverted high dimensional asymptotic covariance matrix, and do not consider any other orthonormal basis. Conversely, we work with a general orthonormal basis under mild conditions that include Haar wavelet and Walsh functions; and we allow for linear or nonlinear processes with possibly non-iid innovations. This is important in macroeconomics and finance where nonlinear feedback and random volatility occur in many settings. We completely sidestep asymptotic covariance matrix estimation and inversion by bootstrapping a max-correlation difference statistic, where the maximum is taken over the correlation lag $h$ and basis generated sub-sample counter $k$ (the number of systematic samples). We achieve a higher feasible rate of increase for the maximum lag and counter $\mathcal{H}_{T}$ and $\mathcal{K}_{T}$. Of particular note, our test is capable of detecting breaks in variance, and distant, or very mild, deviations from stationarity.
- [8] arXiv:2305.13152 (replaced) [pdf, ps, other]
-
Title: Covariate-informed reconstruction of partially observed functional data via factor modelsSubjects: Statistics Theory (math.ST)
This paper studies linear reconstruction of partially observed functional data which are recorded on a discrete grid. We propose a novel estimation approach based on approximate factor models with increasing rank taking into account potential covariate information. Whereas alternative reconstruction procedures commonly involve some preliminary smoothing, our method separates the signal from noise and reconstructs missing fragments at once. We establish uniform convergence rates of our estimator and introduce a new method for constructing simultaneous prediction bands for the missing trajectories. A simulation study examines the performance of the proposed methods in finite samples. Finally, a real data application of temperature curves demonstrates that our theory provides a simple and effective method to recover missing fragments.
- [9] arXiv:2308.15728 (replaced) [pdf, ps, other]
-
Title: Computational Lower Bounds for Graphon Estimation via Low-degree PolynomialsComments: Add low-degree upper bound in v2Subjects: Statistics Theory (math.ST); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Graphon estimation has been one of the most fundamental problems in network analysis and has received considerable attention in the past decade. From the statistical perspective, the minimax error rate of graphon estimation has been established by Gao et al (2015) for both stochastic block model and nonparametric graphon estimation. The statistical optimal estimators are based on constrained least squares and have computational complexity exponential in the dimension. From the computational perspective, the best-known polynomial-time estimator is based universal singular value thresholding, but it can only achieve a much slower estimation error rate than the minimax one. The computational optimality of the USVT or the existence of a computational barrier in graphon estimation has been a long-standing open problem. In this work, we provide rigorous evidence for the computational barrier in graphon estimation via low-degree polynomials. Specifically, in SBM graphon estimation, we show that for low-degree polynomial estimators, their estimation error rates cannot be significantly better than that of the USVT under a wide range of parameter regimes and in nonparametric graphon estimation, we show low-degree polynomial estimators achieve estimation error rates strictly slower than the minimax rate. Our results are proved based on the recent development of low-degree polynomials by Schramm and Wein (2022), while we overcome a few key challenges in applying it to the general graphon estimation problem. By leveraging our main results, we also provide a computational lower bound on the clustering error for community detection in SBM with a growing number of communities and this yields a new piece of evidence for the conjectured Kesten-Stigum threshold for efficient community recovery. Finally, we extend our computational lower bounds to sparse graphon estimation and biclustering.
- [10] arXiv:2311.15845 (replaced) [pdf, ps, other]
-
Title: On Learning the Optimal Regularization Parameter in Inverse ProblemsSubjects: Statistics Theory (math.ST); Optimization and Control (math.OC)
Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. Recently, data-driven approaches have become popular to tackle this challenge. These approaches are appealing since they do require less a priori knowledge, but their theoretical analysis is limited. In this paper, we propose and study a statistical machine learning approach, based on empirical risk minimization. Our main contribution is a theoretical analysis, showing that, provided with enough data, this approach can reach sharp rates while being essentially adaptive to the noise and smoothness of the problem. Numerical simulations corroborate and illustrate the theoretical findings. Our results are a step towards grounding theoretically data-driven approaches to inverse problems.
- [11] arXiv:2404.08278 (replaced) [pdf, ps, other]
-
Title: Minimax Optimal Goodness-of-Fit Testing with Kernel Stein DiscrepancyComments: 54 pagesSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
We explore the minimax optimality of goodness-of-fit tests on general domains using the kernelized Stein discrepancy (KSD). The KSD framework offers a flexible approach for goodness-of-fit testing, avoiding strong distributional assumptions, accommodating diverse data structures beyond Euclidean spaces, and relying only on partial knowledge of the reference distribution, while maintaining computational efficiency. We establish a general framework and an operator-theoretic representation of the KSD, encompassing many existing KSD tests in the literature, which vary depending on the domain. We reveal the characteristics and limitations of KSD and demonstrate its non-optimality under a certain alternative space, defined over general domains when considering $\chi^2$-divergence as the separation metric. To address this issue of non-optimality, we propose a modified, minimax optimal test by incorporating a spectral regularizer, thereby overcoming the shortcomings of standard KSD tests. Our results are established under a weak moment condition on the Stein kernel, which relaxes the bounded kernel assumption required by prior work in the analysis of kernel-based hypothesis testing. Additionally, we introduce an adaptive test capable of achieving minimax optimality up to a logarithmic factor by adapting to unknown parameters. Through numerical experiments, we illustrate the superior performance of our proposed tests across various domains compared to their unregularized counterparts.
- [12] arXiv:2304.12414 (replaced) [pdf, ps, other]
-
Title: Bayesian Geostatistics Using Predictive StackingComments: 51 pages, 22 figuresSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO)
We develop Bayesian predictive stacking for geostatistical models, where the primary inferential objective is to provide inference on the latent spatial random field and conduct spatial predictions at arbitrary locations. We exploit analytically tractable posterior distributions for regression coefficients of predictors and the realizations of the spatial process conditional upon process parameters. We subsequently combine such inference by stacking these models across the range of values of the hyper-parameters. We devise stacking of means and posterior densities in a manner that is computationally efficient without resorting to iterative algorithms such as Markov chain Monte Carlo (MCMC) and can exploit the benefits of parallel computations. We offer novel theoretical insights into the resulting inference within an infill asymptotic paradigm and through empirical results showing that stacked inference is comparable to full sampling-based Bayesian inference at a significantly lower computational cost.
- [13] arXiv:2312.00590 (replaced) [pdf, ps, other]
-
Title: Inference on common trends in functional time seriesSubjects: Econometrics (econ.EM); Statistics Theory (math.ST)
We study statistical inference on unit roots and cointegration for time series in a Hilbert space. We develop statistical inference on the number of common stochastic trends embedded in the time series, i.e., the dimension of the nonstationary subspace. We also consider tests of hypotheses on the nonstationary and stationary subspaces themselves. The Hilbert space can be of an arbitrarily large dimension, and our methods remain asymptotically valid even when the time series of interest takes values in a subspace of possibly unknown dimension. This has wide applicability in practice; for example, to the case of cointegrated vector time series that are either high-dimensional or of finite dimension, to high-dimensional factor model that includes a finite number of nonstationary factors, to cointegrated curve-valued (or function-valued) time series, and to nonstationary dynamic functional factor models. We include two empirical illustrations to the term structure of interest rates and labor market indices, respectively.
- [14] arXiv:2401.14277 (replaced) [pdf, ps, other]
-
Title: An Instance-Based Approach to the Trace Reconstruction ProblemComments: 7 pages, part of this paper was presented at the 58th Annual Conference on Information Sciences and Systems (CISS 2024), funding information added in updated documentSubjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS); Probability (math.PR); Statistics Theory (math.ST)
In the trace reconstruction problem, one observes the output of passing a binary string $s \in \{0,1\}^n$ through a deletion channel $T$ times and wishes to recover $s$ from the resulting $T$ "traces." Most of the literature has focused on characterizing the hardness of this problem in terms of the number of traces $T$ needed for perfect reconstruction either in the worst case or in the average case (over input sequences $s$). In this paper, we propose an alternative, instance-based approach to the problem. We define the "Levenshtein difficulty" of a problem instance $(s,T)$ as the probability that the resulting traces do not provide enough information for correct recovery with full certainty. One can then try to characterize, for a specific $s$, how $T$ needs to scale in order for the Levenshtein difficulty to go to zero, and seek reconstruction algorithms that match this scaling for each $s$. For a class of binary strings with alternating long runs, we precisely characterize the scaling of $T$ for which the Levenshtein difficulty goes to zero. For this class, we also prove that a simple "Las Vegas algorithm" has an error probability that decays to zero with the same rate as that with which the Levenshtein difficulty tends to zero.