Statistics Theory
- [1] arXiv:2405.16088 [pdf, ps, html, other]
-
Title: Estimating the normal-inverse-Wishart distributionSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
The normal-inverse-Wishart (NIW) distribution is commonly used as a prior distribution for the mean and covariance parameters of a multivariate normal distribution. The family of NIW distributions is also a minimal exponential family. In this short note we describe a convergent procedure for converting from mean parameters to natural parameters in the NIW family, or -- equivalently -- for performing maximum likelihood estimation of the natural parameters given observed sufficient statistics. This is needed, for example, when using a NIW base family in expectation propagation
- [2] arXiv:2405.16331 [pdf, ps, html, other]
-
Title: Confirming the Null: Remarks on Equivalence Testing and the Topology of ConfirmationComments: These are old notes on a three-valued logic relating to equivalence/inequivalence. Comments and feedback welcome. Further revisions are expected over timeSubjects: Statistics Theory (math.ST); Logic (math.LO)
Null Hypothesis Statistical Testing is a dominant framework for conducting statistical analysis across the sciences. There remains considerable debate as to whether, and under what circumstances, evidence can be said to be confirmatory of a null hypothesis. This paper presents a modal logic of short-run frequentist confirmation developed by leveraging the duality between hypothesis testing and statistical estimation.
It is shown that a hypothesis is confirmable if and only if it satisfies the topological condition of having nonempty interior. Consequently, two-sided hypotheses are not statistically confirmable owing to defects in their topological structure. Equivalence hypotheses are, by contrast, confirmable. - [3] arXiv:2405.16469 [pdf, ps, html, other]
-
Title: On Correlation CoefficientsSubjects: Statistics Theory (math.ST)
In the present paper, we discuss the Pearson, Spearman, Kendall correlation coefficients and their statistical analogues. We propose a new correlation coefficient r and its statistical analogue. The coefficient r is based on Kendal's and Spearman's correlation coefficients. A new extension of the Pearson correlation coefficient is also discussed. We conduct simulation experiments and study the behavior of the above correlation coefficients. We observe that the behavior of Pearson's sample correlation coefficient can be very different from the behavior of the rank correlation coefficients, which, in turn, behave in a similar way. The question arises: which correlation coefficient better measures the dependence rate? We try to answer this question in the final conclusion.
- [4] arXiv:2405.16515 [pdf, ps, html, other]
-
Title: Adaptive estimation of $\mathbb{L}_2$-norm of a probability density and related topics I. Lower boundsSubjects: Statistics Theory (math.ST)
We deal with the problem of the adaptive estimation of the $\mathbb{L}_2$-norm of a probability density on $\mathbb{R}^d$, $d\geq 1$, from independent observations. The unknown density is assumed to be uniformly bounded and to belong to the union of balls in the isotropic/anisotropic Nikolskii's spaces. We will show that the optimally adaptive estimators over the collection of considered functional classes do no exist. Also, in the framework of an abstract density model we present several generic lower bounds related to the adaptive estimation of an arbitrary functional of a probability density. These results having independent interest have no analogue in the existing literature. In the companion paper Cleanthous et al (2024) we prove that established lower bounds are tight and provide with explicit construction of adaptive estimators of $\mathbb{L}_2$-norm of the density.
- [5] arXiv:2405.16527 [pdf, ps, html, other]
-
Title: Adaptive estimation of the $\mathbb{L}_2$-norm of a probability density and related topics II. Upper bounds via the oracle approachSubjects: Statistics Theory (math.ST)
This is the second part of the research project initiated in Cleanthous et al (2024). We deal with the problem of the adaptive estimation of the $\mathbb{L}_2$-norm of a probability density on $\mathbb{R}^d$, $d\geq 1$, from independent observations. The unknown density is assumed to be uniformly bounded by unknown constant and to belong to the union of balls in the isotropic/anisotropic Nikolskii's spaces. In Cleanthous et al (2024) we have proved that the optimally adaptive estimators do no exist in the considered problem and provided with several lower bounds for the adaptive risk. In this part we show that these bounds are tight and present the adaptive estimator which is obtained by a data-driven selection from a family of kernel-based estimators. The proposed estimation procedure as well as the computation of its risk are heavily based on new concentration inequalities for decoupled $U$-statistics of order two established in Section 4. It is also worth noting that all our results are derived from the unique oracle inequality which may be of independent interest.
- [6] arXiv:2405.16696 [pdf, ps, html, other]
-
Title: How many samples are needed to train a deep neural network?Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/\sqrt{n}$ in the sample size $n$ rather than the usual "parametric rate" $1/n$. Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples.
- [7] arXiv:2405.16736 [pdf, ps, html, other]
-
Title: A Separation in Heavy-Tailed Sampling: Gaussian vs. Stable Oracles for Proximal SamplersSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
We study the complexity of heavy-tailed sampling and present a separation result in terms of obtaining high-accuracy versus low-accuracy guarantees i.e., samplers that require only $O(\log(1/\varepsilon))$ versus $\Omega(\text{poly}(1/\varepsilon))$ iterations to output a sample which is $\varepsilon$-close to the target in $\chi^2$-divergence. Our results are presented for proximal samplers that are based on Gaussian versus stable oracles. We show that proximal samplers based on the Gaussian oracle have a fundamental barrier in that they necessarily achieve only low-accuracy guarantees when sampling from a class of heavy-tailed targets. In contrast, proximal samplers based on the stable oracle exhibit high-accuracy guarantees, thereby overcoming the aforementioned limitation. We also prove lower bounds for samplers under the stable oracle and show that our upper bounds cannot be fundamentally improved.
- [8] arXiv:2405.17318 [pdf, ps, other]
-
Title: Extremal correlation coefficient for functional dataSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We propose a coefficient that measures dependence in paired samples of functions. It has properties similar to the Pearson correlation, but differs in significant ways: 1) it is designed to measure dependence between curves, 2) it focuses only on extreme curves. The new coefficient is derived within the framework of regular variation in Banach spaces. A consistent estimator is proposed and justified by an asymptotic analysis and a simulation study. The usefulness of the new coefficient is illustrated on financial and and climate functional data.
New submissions for Tuesday, 28 May 2024 (showing 8 of 8 entries )
- [9] arXiv:2405.15952 (cross-list from stat.CO) [pdf, ps, html, other]
-
Title: Theoretical guarantees for lifted samplersSubjects: Computation (stat.CO); Statistics Theory (math.ST)
Lifted samplers form a class of Markov chain Monte Carlo methods which has drawn a lot attention in recent years due to superior performance in challenging Bayesian applications. A canonical example of such sampler is the one that is derived from a random walk Metropolis algorithm for a totally-ordered state space such as the integers or the real numbers. The lifted sampler is derived by splitting into two the proposal distribution: one part in the increasing direction, and the other part in the decreasing direction. It keeps following a direction, until a rejection, upon which it flips the direction. In terms of asymptotic variances, it outperforms the random walk Metropolis algorithm, regardless of the target distribution, at no additional computational cost. Other studies show, however, that beyond this simple case, lifted samplers do not always outperform their Metropolis counterparts. In this paper, we leverage the celebrated work of Tierney (1998) to provide an analysis in a general framework encompassing a broad class of lifted samplers. Our finding is that, essentially, the asymptotic variances cannot increase by a factor of more than 2, regardless of the target distribution, the way the directions are induced, and the type of algorithm from which the lifted sampler is derived (be it a Metropolis--Hastings algorithm, a reversible jump algorithm, etc.). This result indicates that, while there is potentially a lot to gain from lifting a sampler, there is not much to lose.
- [10] arXiv:2405.16006 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Multifractal Analysis of the Sinkhorn Algorithm: Unveiling the Intricate Structure of Optimal Transport MapsComments: Submitted to the Journal of Fractal GeometrySubjects: Optimization and Control (math.OC); Statistics Theory (math.ST)
The Sinkhorn algorithm has emerged as a powerful tool for solving optimal transport problems, finding applications in various domains such as machine learning, image processing, and computational biology. Despite its widespread use, the intricate structure and scaling properties of the coupling matrices generated by the Sinkhorn algorithm remain largely unexplored. In this paper, we delve into the multifractal properties of these coupling matrices, aiming to unravel their complex behavior and shed light on the underlying dynamics of the Sinkhorn algorithm. We prove the existence of the multifractal spectrum and the singularity spectrum for the Sinkhorn coupling matrices. Furthermore, we derive bounds on the generalized dimensions, providing a comprehensive characterization of their scaling properties. Our findings not only deepen our understanding of the Sinkhorn algorithm but also pave the way for novel applications and algorithmic improvements in the realm of optimal transport.
- [11] arXiv:2405.16458 (cross-list from econ.TH) [pdf, ps, html, other]
-
Title: Comparing experiments in discounted problemsComments: 44 pagesSubjects: Theoretical Economics (econ.TH); Statistics Theory (math.ST)
This paper compares statistical experiments in discounted problems, ranging from the simplest ones where the state is fixed and the flow of information exogenous to more complex ones, where the decision-maker controls the flow of information or the state changes over time.
- [12] arXiv:2405.16644 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR); Statistics Theory (math.ST)
In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Our findings reveal that the fastest rate of normal approximation is achieved when setting the most aggressive step size $\alpha_{k} \asymp k^{-1/2}$. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.
- [13] arXiv:2405.16732 (cross-list from stat.ML) [pdf, ps, other]
-
Title: The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant StepsizeSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $\alpha>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two structures leads to complications that are not captured by prior techniques. By leveraging the smoothness and recurrence properties of the SA updates, we develop a fine-grained analysis of the correlation between the SA iterates $\theta_k$ and Markovian data $x_k$. This enables us to overcome the obstacles in existing analysis and establish for the first time the weak convergence of the joint process $(x_k, \theta_k)_{k\geq0}$. Furthermore, we present a precise characterization of the asymptotic bias of the SA iterates, given by $\mathbb{E}[\theta_\infty]-\theta^\ast=\alpha(b_\text{m}+b_\text{n}+b_\text{c})+O(\alpha^{3/2})$. Here, $b_\text{m}$ is associated with the Markovian noise, $b_\text{n}$ is tied to the nonlinearity, and notably, $b_\text{c}$ represents a multiplicative interaction between the Markovian noise and nonlinearity, which is absent in previous works. As a by-product of our analysis, we derive finite-time bounds on higher moment $\mathbb{E}[\|\theta_k-\theta^\ast\|^{2p}]$ and present non-asymptotic geometric convergence rates for the iterates, along with a Central Limit Theorem.
- [14] arXiv:2405.16828 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Kernel-based optimally weighted conformal prediction intervalsSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Conformal prediction has been a popular distribution-free framework for uncertainty quantification. In this paper, we present a novel conformal prediction method for time-series, which we call Kernel-based Optimally Weighted Conformal Prediction Intervals (KOWCPI). Specifically, KOWCPI adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learns optimal data-adaptive weights. Theoretically, we tackle the challenge of establishing a conditional coverage guarantee for non-exchangeable data under strong mixing conditions on the non-conformity scores. We demonstrate the superior performance of KOWCPI on real time-series against state-of-the-art methods, where KOWCPI achieves narrower confidence intervals without losing coverage.
- [15] arXiv:2405.17117 (cross-list from stat.ME) [pdf, ps, html, other]
-
Title: Robust Reproducible Network ExplorationSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
We propose a novel method of network detection that is robust against any complex dependence structure. Our goal is to conduct exploratory network detection, meaning that we attempt to detect a network composed of ``connectable'' edges that are worth investigating in detail for further modelling or precise network analysis. For a reproducible network detection, we pursuit high power while controlling the false discovery rate (FDR). In particular, we formalize the problem as a multiple testing, and propose p-variables that are used in the Benjamini-Hochberg procedure. We show that the proposed method controls the FDR under arbitrary dependence structure with any sample size, and has asymptotic power one. The validity is also confirmed by simulations and a real data example.
Cross submissions for Tuesday, 28 May 2024 (showing 7 of 7 entries )
- [16] arXiv:2012.02985 (replaced) [pdf, ps, html, other]
-
Title: Selecting the number of components in PCA via random signflipsComments: 38 pages, 14 figuresSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Principal component analysis (PCA) is a foundational tool in modern data analysis, and a crucial step in PCA is selecting the number of components to keep. However, classical selection methods (e.g., scree plots, parallel analysis, etc.) lack statistical guarantees in the increasingly common setting of large-dimensional data with heterogeneous noise, i.e., where each entry may have a different noise variance. Moreover, it turns out that these methods, which are highly effective for homogeneous noise, can fail dramatically for data with heterogeneous noise. This paper proposes a new method called signflip parallel analysis (FlipPA) for the setting of approximately symmetric noise: it compares the data singular values to those of "empirical null" matrices generated by flipping the sign of each entry randomly with probability one-half. We develop a rigorous theory for FlipPA, showing that it has nonasymptotic type I error control and that it consistently selects the correct rank for signals rising above the noise floor in the large-dimensional limit (even when the noise is heterogeneous). We also rigorously explain why classical permutation-based parallel analysis degrades under heterogeneous noise. Finally, we illustrate that FlipPA compares favorably to state-of-the art methods via numerical simulations and an illustration on data coming from astronomy.
- [17] arXiv:2301.02098 (replaced) [pdf, ps, other]
-
Title: Another look at Stein's method for Studentized nonlinear statistics with an application to U-statisticsComments: This is an improved version, where the B-E bound for the Studentized U-statistics now shows an explicit dependence on the degree of the kernel. Moreover, we added Liqian Zhang as a co-author who helped to work out this improved bound for Studentized U-statisticsSubjects: Statistics Theory (math.ST)
We take another look at using Stein's method to establish uniform Berry-Esseen bounds for Studentized nonlinear statistics, highlighting variable censoring and an exponential randomized concentration inequality for a sum of censored variables as the essential tools to carry the arguments involved. As an important application, we prove a uniform Berry-Esseen bound for Studentized U-statistics in a form that exhibits the dependence on the degree of the kernel.
- [18] arXiv:2301.10498 (replaced) [pdf, ps, html, other]
-
Title: On deviation probabilities in non-parametric regressionComments: The lower bound has been improved and the paper has been reorganizedSubjects: Statistics Theory (math.ST)
This paper is devoted to the problem of determining the concentration bounds that are achievable in non-parametric regression. We consider the setting where features are supported on a bounded subset of $\mathbb{R}^d$, the regression function is Lipschitz, and the noise is only assumed to have a finite second moment. We first specify the fundamental limits of the problem by establishing a general lower bound on deviation probabilities, and then construct explicit estimators that achieve this bound. These estimators are obtained by applying the median-of-means principle to classical local averaging rules in non-parametric regression, including nearest neighbors and kernel procedures.
- [19] arXiv:2305.04199 (replaced) [pdf, ps, html, other]
-
Title: On statistics which are almost sufficient from the viewpoint of the Fisher metricsComments: 8 pagesSubjects: Statistics Theory (math.ST); Differential Geometry (math.DG); Probability (math.PR)
A statistic on a statistical model is sufficient if it has no information loss, namely, the Fisher metric of the induced model coincides with that of the original model due to Kullback and Ay-Jost-Lê-Schwachhöfer. We introduce a quantitatively weak version of sufficient statistics such that the Fisher metric of the induced model is bi-Lipschitz equivalent to that of the original model. We characterize such statistics in terms of the conditional probability or by the existence of a certain decomposition of the density function in a way similar to characterizations of sufficient statistics due to Fisher-Neyman and Ay-Jost-Lê-Schwachhöfer.
- [20] arXiv:2306.16790 (replaced) [pdf, ps, html, other]
-
Title: Quasi-Likelihood Analysis for Student-L\'evy RegressionSubjects: Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO)
We consider the quasi-likelihood analysis for a linear regression model driven by a Student-t Lévy process with constant scale and arbitrary degrees of freedom. The model is observed at high frequency over an extending period, under which we can quantify how the sampling frequency affects estimation accuracy. In that setting, joint estimation of trend, scale, and degrees of freedom is a non-trivial problem. The bottleneck is that the Student-t distribution is not closed under convolution, making it difficult to estimate all the parameters fully based on the high-frequency time scale. To efficiently deal with the intricate nature from both theoretical and computational points of view, we propose a two-step quasi-likelihood analysis: first, we make use of the Cauchy quasi-likelihood for estimating the regression-coefficient vector and the scale parameter; then, we construct the sequence of the unit-period cumulative residuals to estimate the remaining degrees of freedom. In particular, using full data in the first step causes a problem stemming from the small-time Cauchy approximation, showing the need for data thinning.
- [21] arXiv:2402.11219 (replaced) [pdf, ps, html, other]
-
Title: Estimators for multivariate allometric regression modelComments: 20 pagesSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
In a regression model with multiple response variables and multiple explanatory variables, if the difference of the mean vectors of the response variables for different values of explanatory variables is always in the direction of the first principal eigenvector of the covariance matrix of the response variables, then it is called a multivariate allometric regression model. This paper studies the estimation of the first principal eigenvector in the multivariate allometric regression model. A class of estimators that includes conventional estimators is proposed based on weighted sum-of-squares matrices of regression sum-of-squares matrix and residual sum-of-squares matrix. We establish an upper bound of the mean squared error of the estimators contained in this class, and the weight value minimizing the upper bound is derived. Sufficient conditions for the consistency of the estimators are discussed in weak identifiability regimes under which the difference of the largest and second largest eigenvalues of the covariance matrix decays asymptotically and in ``large $p$, large $n$" regimes, where $p$ is the number of response variables and $n$ is the sample size. Several numerical results are also presented.
- [22] arXiv:2405.09511 (replaced) [pdf, ps, html, other]
-
Title: Stability via resampling: statistical problems beyond the real lineSubjects: Statistics Theory (math.ST)
Model averaging techniques based on resampling methods (such as bootstrapping or subsampling) have been utilized across many areas of statistics, often with the explicit goal of promoting stability in the resulting output. We provide a general, finite-sample theoretical result guaranteeing the stability of bagging when applied to algorithms that return outputs in a general space, so that the output is not necessarily a real-valued -- for example, an algorithm that estimates a vector of weights or a density function. We empirically assess the stability of bagging on synthetic and real-world data for a range of problem settings, including causal inference, nonparametric regression, and Bayesian model selection.
- [23] arXiv:2405.13140 (replaced) [pdf, ps, html, other]
-
Title: On Convergence of the Alternating Directions SGHMC AlgorithmSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Probability (math.PR)
We study convergence rates of Hamiltonian Monte Carlo (HMC) algorithms with leapfrog integration under mild conditions on stochastic gradient oracle for the target distribution (SGHMC). Our method extends standard HMC by allowing the use of general auxiliary distributions, which is achieved by a novel procedure of Alternating Directions.
The convergence analysis is based on the investigations of the Dirichlet forms associated with the underlying Markov chain driving the algorithms. For this purpose, we provide a detailed analysis on the error of the leapfrog integrator for Hamiltonian motions with both the kinetic and potential energy functions in general form. We characterize the explicit dependence of the convergence rates on key parameters such as the problem dimension, functional properties of both the target and auxiliary distributions, and the quality of the oracle. - [24] arXiv:2310.08566 (replaced) [pdf, ps, other]
-
Title: Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Statistics Theory (math.ST); Machine Learning (stat.ML)
Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.
- [25] arXiv:2312.06098 (replaced) [pdf, ps, html, other]
-
Title: Mixture Matrix-valued Autoregressive ModelSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
Time series of matrix-valued data are increasingly available in various areas including economics, finance, social science, etc. These data may shed light on the inter-dynamical relationships between two sets of attributes, for instance countries and economic indices. The matrix autoregressive (MAR) model provides a parsimonious approach for analyzing such data. However, the MAR model, being a linear model with parametric constraints, cannot capture the nonlinear patterns in the data, such as regime shifts in the dynamics. We propose a mixture matrix autoregressive (MMAR) model for analyzing potential regime shifts in the dynamics between two attributes, for instance, due to recession vs. blooming, or quiet period vs. pandemic. We propose an EM algorithm for maximum likelihood estimation. We derive some theoretical properties of the proposed method including consistency and asymptotic distribution, and illustrate its performance via simulations and real applications.
- [26] arXiv:2402.09626 (replaced) [pdf, ps, html, other]
-
Title: Degrees of the Wasserstein Distance to Small Toric ModelsComments: 22 pages, 6 figures, 3 tablesSubjects: Algebraic Geometry (math.AG); Statistics Theory (math.ST)
The study of the closest point(s) on a statistical model from a given distribution in the probability simplex with respect to a fixed Wasserstein metric gives rise to a polyhedral norm distance optimization problem. There are two components to the complexity of determining the Wasserstein distance from a data point to a model. One is the combinatorial complexity that is governed by the combinatorics of the Lipschitz polytope of the finite metric to be used. Another is the algebraic complexity, which is governed by the polar degrees of the Zariski closure of the model. We find formulas for the polar degrees of rational normal scrolls and graphical models whose underlying graphs are star trees. Also, the polar degrees of the graphical models with four binary random variables where the graphs are a path on four vertices and the four-cycle, as well as for small, no-three-way interaction models, were computed. We investigate the algebraic degree of computing the Wasserstein distance to a small subset of these models. It was observed that this algebraic degree is typically smaller than the corresponding polar degree.
- [27] arXiv:2402.17886 (replaced) [pdf, ps, html, other]
-
Title: Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising DiffusionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST); Methodology (stat.ME)
This paper considers the problem of sampling from non-logconcave distribution, based on queries of its unnormalized density. It first describes a framework, Diffusion Monte Carlo (DMC), based on the simulation of a denoising diffusion process with its score function approximated by a generic Monte Carlo estimator. DMC is an oracle-based meta-algorithm, where its oracle is the assumed access to samples that generate a Monte Carlo score estimator. Then we provide an implementation of this oracle, based on rejection sampling, and this turns DMC into a true algorithm, termed Zeroth-Order Diffusion Monte Carlo (ZOD-MC). We provide convergence analyses by first constructing a general framework, i.e. a performance guarantee for DMC, without assuming the target distribution to be log-concave or satisfying any isoperimetric inequality. Then we prove that ZOD-MC admits an inverse polynomial dependence on the desired sampling accuracy, albeit still suffering from the curse of dimensionality. Consequently, for low dimensional distributions, ZOD-MC is a very efficient sampler, with performance exceeding latest samplers, including also-denoising-diffusion-based RDMC and RS-DMC. Last, we experimentally demonstrate the insensitivity of ZOD-MC to increasingly higher barriers between modes or discontinuity in non-convex potential.
- [28] arXiv:2405.14686 (replaced) [pdf, ps, html, other]
-
Title: Efficient Algorithms for the Sensitivities of the Pearson Correlation Coefficient and Its Statistical Significance to Online DataComments: Edited typos in: -Section 1, paragraph 2 - Section 1, paragraph 4 - Proof of Lemma 3.2 - Numbering in Lemma 3.6 - Section 4, paragraph 2 - Section 5.1, paragraph 1 - Section 5.1, paragraph 2 Added final sentence to Section 5.2, paragraph 4Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
Reliably measuring the collinearity of bivariate data is crucial in statistics, particularly for time-series analysis or ongoing studies in which incoming observations can significantly impact current collinearity estimates. Leveraging identities from Welford's online algorithm for sample variance, we develop a rigorous theoretical framework for analyzing the maximal change to the Pearson correlation coefficient and its p-value that can be induced by additional data. Further, we show that the resulting optimization problems yield elegant closed-form solutions that can be accurately computed by linear- and constant-time algorithms. Our work not only creates new theoretical avenues for robust correlation measures, but also has broad practical implications for disciplines that span econometrics, operations research, clinical trials, climatology, differential privacy, and bioinformatics. Software implementations of our algorithms in Cython-wrapped C are made available at this https URL for reproducibility, practical deployment, and future theoretical development.