Computation
- [1] arXiv:2405.13574 [pdf, ps, html, other]
-
Title: Reinforcement Learning for Adaptive MCMCSubjects: Computation (stat.CO); Machine Learning (cs.LG)
An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis--Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm on $\approx 90 \%$ of tasks in the PosteriorDB benchmark.
- [2] arXiv:2405.13821 [pdf, ps, html, other]
-
Title: Normalizing Basis Functions: Approximate Stationary Models for Large Spatial DataSubjects: Computation (stat.CO); Numerical Analysis (math.NA); Applications (stat.AP)
In geostatistics, traditional spatial models often rely on the Gaussian Process (GP) to fit stationary covariances to data. It is well known that this approach becomes computationally infeasible when dealing with large data volumes, necessitating the use of approximate methods. A powerful class of methods approximate the GP as a sum of basis functions with random coefficients. Although this technique offers computational efficiency, it does not inherently guarantee a stationary covariance. To mitigate this issue, the basis functions can be "normalized" to maintain a constant marginal variance, avoiding unwanted artifacts and edge effects. This allows for the fitting of nearly stationary models to large, potentially non-stationary datasets, providing a rigorous base to extend to more complex problems. Unfortunately, the process of normalizing these basis functions is computationally demanding. To address this, we introduce two fast and accurate algorithms to the normalization step, allowing for efficient prediction on fine grids. The practical value of these algorithms is showcased in the context of a spatial analysis on a large dataset, where significant computational speedups are achieved. While implementation and testing are done specifically within the LatticeKrig framework, these algorithms can be adapted to other basis function methods operating on regular grids.
New submissions for Friday, 24 May 2024 (showing 2 of 2 entries )
- [3] arXiv:2405.13149 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Gaussian Measures Conditioned on Nonlinear Observations: Consistency, MAP Estimators, and SimulationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Probability (math.PR); Computation (stat.CO)
The article presents a systematic study of the problem of conditioning a Gaussian random variable $\xi$ on nonlinear observations of the form $F \circ \phi(\xi)$ where $\phi: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $\xi \mid F\circ \phi(\xi)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.
- [4] arXiv:2405.13537 (cross-list from stat.ME) [pdf, ps, html, other]
-
Title: Sequential Bayesian inference for stochastic epidemic models of cumulative incidenceComments: 27 pagesSubjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
Epidemics are inherently stochastic, and stochastic models provide an appropriate way to describe and analyse such phenomena. Given temporal incidence data consisting of, for example, the number of new infections or removals in a given time window, a continuous-time discrete-valued Markov process provides a natural description of the dynamics of each model component, typically taken to be the number of susceptible, exposed, infected or removed individuals. Fitting the SEIR model to time-course data is a challenging problem due incomplete observations and, consequently, the intractability of the observed data likelihood. Whilst sampling based inference schemes such as Markov chain Monte Carlo are routinely applied, their computational cost typically restricts analysis to data sets of no more than a few thousand infective cases. Instead, we develop a sequential inference scheme that makes use of a computationally cheap approximation of the most natural Markov process model. Crucially, the resulting model allows a tractable conditional parameter posterior which can be summarised in terms of a set of low dimensional statistics. This is used to rejuvenate parameter samples in conjunction with a novel bridge construct for propagating state trajectories conditional on the next observation of cumulative incidence. The resulting inference framework also allows for stochastic infection and reporting rates. We illustrate our approach using synthetic and real data applications.
- [5] arXiv:2405.13655 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: A Deep Learning Approach to Multi-Fiber Parameter Estimation and Uncertainty Quantification in Diffusion MRISubjects: Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM); Applications (stat.AP); Computation (stat.CO)
Diffusion MRI (dMRI) is the primary imaging modality used to study brain microstructure in vivo. Reliable and computationally efficient parameter inference for common dMRI biophysical models is a challenging inverse problem, due to factors such as variable dimensionalities (reflecting the unknown number of distinct white matter fiber populations in a voxel), low signal-to-noise ratios, and non-linear forward models. These challenges have led many existing methods to use biologically implausible simplified models to stabilize estimation, for instance, assuming shared microstructure across all fiber populations within a voxel. In this work, we introduce a novel sequential method for multi-fiber parameter inference that decomposes the task into a series of manageable subproblems. These subproblems are solved using deep neural networks tailored to problem-specific structure and symmetry, and trained via simulation. The resulting inference procedure is largely amortized, enabling scalable parameter estimation and uncertainty quantification across all model parameters. Simulation studies and real imaging data analysis using the Human Connectome Project (HCP) demonstrate the advantages of our method over standard alternatives. In the case of the standard model of diffusion, our results show that under HCP-like acquisition schemes, estimates for extra-cellular parallel diffusivity are highly uncertain, while those for the intra-cellular volume fraction can be estimated with relatively high precision.
- [6] arXiv:2405.13731 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Control, Transport and Sampling: Towards Better Loss DesignSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
Leveraging connections between diffusion-based sampling, optimal transport, and optimal stochastic control through their shared links to the Schrödinger bridge problem, we propose novel objective functions that can be used to transport $\nu$ to $\mu$, consequently sample from the target $\mu$, via optimally controlled dynamics. We highlight the importance of the pathwise perspective and the role various optimality conditions on the path measure can play for the design of valid training losses, the careful choice of which offer numerical advantages in practical implementation.
- [7] arXiv:2405.13794 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Conditioning diffusion models by explicit forward-backward bridgingComments: 24 pages, 12 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)
Given an unconditional diffusion model $\pi(x, y)$, using it to perform conditional simulation $\pi(x \mid y)$ is still largely an open question and is typically achieved by learning conditional drifts to the denoising SDE after the fact. In this work, we express conditional simulation as an inference problem on an augmented space corresponding to a partial SDE bridge. This perspective allows us to implement efficient and principled particle Gibbs and pseudo-marginal samplers marginally targeting the conditional distribution $\pi(x \mid y)$. Contrary to existing methodology, our methods do not introduce any additional approximation to the unconditional diffusion model aside from the Monte Carlo error. We showcase the benefits and drawbacks of our approach on a series of synthetic and real data examples.
- [8] arXiv:2405.14048 (cross-list from stat.ME) [pdf, ps, html, other]
-
Title: fsemipar: an R package for SoF semiparametric regressionComments: 41 pages, 7 figures, 11 tablesSubjects: Methodology (stat.ME); Computation (stat.CO)
Functional data analysis has become a tool of interest in applied areas such as economics, medicine, and chemistry. Among the techniques developed in recent literature, functional semiparametric regression stands out for its balance between flexible modelling and output interpretation. Despite the large variety of research papers dealing with scalar-on-function (SoF) semiparametric models, there is a notable gap in software tools for their implementation. This article introduces the R package \texttt{fsemipar}, tailored for these models. \texttt{fsemipar} not only estimates functional single-index models using kernel smoothing techniques but also estimates and selects relevant scalar variables in semi-functional models with multivariate linear components. A standout feature is its ability to identify impact points of a curve on the response, even in models with multiple functional covariates, and to integrate both continuous and pointwise effects of functional predictors within a single model. In addition, it allows the use of location-adaptive estimators based on the $k$-nearest-neighbours approach for all the semiparametric models included. Its flexible interface empowers users to customise a wide range of input parameters and includes the standard S3 methods for prediction, statistical analysis, and estimate visualization (\texttt{predict}, \texttt{summary}, \texttt{print}, and \texttt{plot}), enhancing clear result interpretation. Throughout the article, we illustrate the functionalities and the practicality of \texttt{fsemipar} using two chemometric datasets.
- [9] arXiv:2405.14149 (cross-list from stat.ME) [pdf, ps, html, other]
-
Title: A Direct Importance Sampling-based Framework for Rare Event Uncertainty Quantification in Non-Gaussian SpacesSubjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
This work introduces a novel framework for precisely and efficiently estimating rare event probabilities in complex, high-dimensional non-Gaussian spaces, building on our foundational Approximate Sampling Target with Post-processing Adjustment (ASTPA) approach. An unnormalized sampling target is first constructed and sampled, relaxing the optimal importance sampling distribution and appropriately designed for non-Gaussian spaces. Post-sampling, its normalizing constant is estimated using a stable inverse importance sampling procedure, employing an importance sampling density based on the already available samples. The sought probability is then computed based on the estimates evaluated in these two stages. The proposed estimator is theoretically analyzed, proving its unbiasedness and deriving its analytical coefficient of variation. To sample the constructed target, we resort to our developed Quasi-Newton mass preconditioned Hamiltonian MCMC (QNp-HMCMC) and we prove that it converges to the correct stationary target distribution. To avoid the challenging task of tuning the trajectory length in complex spaces, QNp-HMCMC is effectively utilized in this work with a single-step integration. We thus show the equivalence of QNp-HMCMC with single-step implementation to a unique and efficient preconditioned Metropolis-adjusted Langevin algorithm (MALA). An optimization approach is also leveraged to initiate QNp-HMCMC effectively, and the implementation of the developed framework in bounded spaces is eventually discussed. A series of diverse problems involving high dimensionality (several hundred inputs), strong nonlinearity, and non-Gaussianity is presented, showcasing the capabilities and efficiency of the suggested framework and demonstrating its advantages compared to relevant state-of-the-art sampling methods.
- [10] arXiv:2405.14373 (cross-list from math.PR) [pdf, ps, html, other]
-
Title: Skew-symmetric schemes for stochastic differential equations with non-Lipschitz drift: an unadjusted Barker algorithmComments: 43 pages, 3 figures Keywords: Skew-symmetric distributions, Stochastic differential equations, Sampling algorithms, Markov Chain Monte Carlo,Subjects: Probability (math.PR); Numerical Analysis (math.NA); Computation (stat.CO)
We propose a new simple and explicit numerical scheme for time-homogeneous stochastic differential equations. The scheme is based on sampling increments at each time step from a skew-symmetric probability distribution, with the level of skewness determined by the drift and volatility of the underlying process. We show that as the step-size decreases the scheme converges weakly to the diffusion of interest. We then consider the problem of simulating from the limiting distribution of an ergodic diffusion process using the numerical scheme with a fixed step-size. We establish conditions under which the numerical scheme converges to equilibrium at a geometric rate, and quantify the bias between the equilibrium distributions of the scheme and of the true diffusion process. Notably, our results do not require a global Lipschitz assumption on the drift, in contrast to those required for the Euler--Maruyama scheme for long-time simulation at fixed step-sizes. Our weak convergence result relies on an extension of the theory of Milstein \& Tretyakov to stochastic differential equations with non-Lipschitz drift, which could also be of independent interest. We support our theoretical results with numerical simulations.
- [11] arXiv:2405.14408 (cross-list from math.NA) [pdf, ps, html, other]
-
Title: Adaptive tempering schedules with approximative intermediate measures for filtering problemsSubjects: Numerical Analysis (math.NA); Computation (stat.CO)
Data assimilation algorithms integrate prior information from numerical model simulations with observed data. Ensemble-based filters, regarded as state-of-the-art, are widely employed for large-scale estimation tasks in disciplines such as geoscience and meteorology. Despite their inability to produce the true posterior distribution for nonlinear systems, their robustness and capacity for state tracking are noteworthy. In contrast, Particle filters yield the correct distribution in the ensemble limit but require substantially larger ensemble sizes than ensemble-based filters to maintain stability in higher-dimensional spaces. It is essential to transcend traditional Gaussian assumptions to achieve realistic quantification of uncertainties. One approach involves the hybridisation of filters, facilitated by tempering, to harness the complementary strengths of different filters. A new adaptive tempering method is proposed to tune the underlying schedule, aiming to systematically surpass the performance previously achieved. Although promising numerical results for certain filter combinations in toy examples exist in the literature, the tuning of hyperparameters presents a considerable challenge. A deeper understanding of these interactions is crucial for practical applications.
- [12] arXiv:2405.14628 (cross-list from stat.ME) [pdf, ps, html, other]
-
Title: Online robust estimation and bootstrap inference for function-on-scalar regressionSubjects: Methodology (stat.ME); Computation (stat.CO)
We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datasets, eliminating the need to store large volumes of data in memory. We establish the almost sure consistency, $L_p$ convergence, and asymptotic normality of the online estimator. To enable efficient and fast inference of the parameters of interest, including the derivation of confidence intervals, we also develop an innovative two-step online bootstrap procedure to approximate the limiting error distribution of the robust online estimator. Numerical studies under a variety of scenarios demonstrate the effectiveness and efficiency of the proposed online learning method. A real application analyzing PM$_{2.5}$ air-quality data is also included to exemplify the proposed online approach.
Cross submissions for Friday, 24 May 2024 (showing 10 of 10 entries )
- [13] arXiv:2309.09367 (replaced) [pdf, ps, html, other]
-
Title: ForLion: A New Algorithm for D-optimal Designs under General Parametric Statistical Models with Mixed FactorsComments: 36 pages, 7 tables, 5 figuresSubjects: Computation (stat.CO); Methodology (stat.ME)
In this paper, we address the problem of designing an experimental plan with both discrete and continuous factors under fairly general parametric statistical models. We propose a new algorithm, named ForLion, to search for locally optimal approximate designs under the D-criterion. The algorithm performs an exhaustive search in a design space with mixed factors while keeping high efficiency and reducing the number of distinct experimental settings. Its optimality is guaranteed by the general equivalence theorem. We present the relevant theoretical results for multinomial logit models (MLM) and generalized linear models (GLM), and demonstrate the superiority of our algorithm over state-of-the-art design algorithms using real-life experiments under MLM and GLM. Our simulation studies show that the ForLion algorithm could reduce the number of experimental settings by 25% or improve the relative efficiency of the designs by 17.5% on average. Our algorithm can help the experimenters reduce the time cost, the usage of experimental devices, and thus the total cost of their experiments while preserving high efficiencies of the designs.
- [14] arXiv:2310.09818 (replaced) [pdf, ps, html, other]
-
Title: MCMC for Bayesian nonparametric mixture modeling under differential privacySubjects: Computation (stat.CO); Methodology (stat.ME)
Estimating the probability density of a population while preserving the privacy of individuals in that population is an important and challenging problem that has received considerable attention in recent years. While the previous literature focused on frequentist approaches, in this paper, we propose a Bayesian nonparametric mixture model under differential privacy (DP) and present two Markov chain Monte Carlo (MCMC) algorithms for posterior inference. One is a marginal approach, resembling Neal's algorithm 5 with a pseudo-marginal Metropolis-Hastings move, and the other is a conditional approach. Although our focus is primarily on local DP, we show that our MCMC algorithms can be easily extended to deal with global differential privacy mechanisms. Moreover, for some carefully chosen mechanisms and mixture kernels, we show how auxiliary parameters can be analytically marginalized, allowing standard MCMC algorithms (i.e., non-privatized, such as Neal's Algorithm 2) to be efficiently employed. Our approach is general and applicable to any mixture model and privacy mechanism. In several simulations and a real case study, we discuss the performance of our algorithms and evaluate different privacy mechanisms proposed in the frequentist literature.
- [15] arXiv:2310.19245 (replaced) [pdf, ps, html, other]
-
Title: Efficient Shapley Performance Attribution for Least-Squares RegressionComments: 36 pages, 5 figuresSubjects: Computation (stat.CO)
We consider the performance of a least-squares regression model, as judged by out-of-sample $R^2$. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.
- [16] arXiv:2311.05025 (replaced) [pdf, ps, html, other]
-
Title: Unbiased Kinetic Langevin Monte Carlo with Inexact GradientsComments: 99 Pages, 13 FiguresSubjects: Computation (stat.CO); Numerical Analysis (math.NA); Methodology (stat.ME); Machine Learning (stat.ML)
We present an unbiased method for Bayesian posterior means based on kinetic Langevin dynamics that combines advanced splitting methods with enhanced gradient approximations. Our approach avoids Metropolis correction by coupling Markov chains at different discretization levels in a multilevel Monte Carlo approach. Theoretical analysis demonstrates that our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. It can achieve accuracy $\epsilon>0$ for estimating expectations of Lipschitz functions in $d$ dimensions with $\mathcal{O}(d^{1/4}\epsilon^{-2})$ expected gradient evaluations, without assuming warm start. We exhibit similar bounds using both approximate and stochastic gradients, and our method's computational cost is shown to scale independently of the size of the dataset. The proposed method is tested using a multinomial regression problem on the MNIST dataset and a Poisson regression model for soccer scores. Experiments indicate that the number of gradient evaluations per effective sample is independent of dimension, even when using inexact gradients. For product distributions, we give dimension-independent variance bounds. Our results demonstrate that the unbiased algorithm we present can be much more efficient than the ``gold-standard" randomized Hamiltonian Monte Carlo.
- [17] arXiv:2302.10684 (replaced) [pdf, ps, html, other]
-
Title: Contraction and Convergence Rates for Discretized Kinetic Langevin DynamicsComments: 34 pages, 1 figureJournal-ref: SIAM Journal on Numerical Analysis, 62(3):1226-1258, 2024Subjects: Numerical Analysis (math.NA); Computation (stat.CO)
We provide a framework to analyze the convergence of discretized kinetic Langevin dynamics for $M$-$\nabla$Lipschitz, $m$-convex potentials. Our approach gives convergence rates of $\mathcal{O}(m/M)$, with explicit stepsize restrictions, which are of the same order as the stability threshold for Gaussian targets and are valid for a large interval of the friction parameter. We apply this methodology to various integration schemes which are popular in the molecular dynamics and machine learning communities. Further, we introduce the property ``$\gamma$-limit convergent" (GLC) to characterize underdamped Langevin schemes that converge to overdamped dynamics in the high-friction limit and which have stepsize restrictions that are independent of the friction parameter; we show that this property is not generic by exhibiting methods from both the class and its complement. Finally, we provide asymptotic bias estimates for the BAOAB scheme, which remain accurate in the high-friction limit by comparison to a modified stochastic dynamics which preserves the invariant measure.
- [18] arXiv:2308.05564 (replaced) [pdf, ps, html, other]
-
Title: Large Skew-t Copula Models and Asymmetric Dependence in Intraday Equity ReturnsSubjects: Econometrics (econ.EM); Machine Learning (cs.LG); Statistical Finance (q-fin.ST); Computation (stat.CO)
Skew-t copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we propose a fast and accurate Bayesian variational inference (VI) approach to do so. The method uses a generative representation of the skew-t distribution to define an augmented posterior that can be approximated accurately. A stochastic gradient ascent algorithm is used to solve the variational optimization. The methodology is used to estimate skew-t factor copula models with up to 15 factors for intraday returns from 2017 to 2021 on 93 U.S. equities. The copula captures substantial heterogeneity in asymmetric dependence over equity pairs, in addition to the variability in pairwise correlations. In a moving window study we show that the asymmetric dependencies also vary over time, and that intraday predictive densities from the skew-t copula are more accurate than those from benchmark copula models. Portfolio selection strategies based on the estimated pairwise asymmetric dependencies improve performance relative to the index.
- [19] arXiv:2309.15600 (replaced) [pdf, ps, html, other]
-
Title: pencal: an R Package for the Dynamic Prediction of Survival with Many Longitudinal PredictorsSubjects: Methodology (stat.ME); Computation (stat.CO)
In survival analysis, longitudinal information on the health status of a patient can be used to dynamically update the predicted probability that a patient will experience an event of interest. Traditional approaches to dynamic prediction such as joint models become computationally unfeasible with more than a handful of longitudinal covariates, warranting the development of methods that can handle a larger number of longitudinal covariates. We introduce the R package pencal, which implements a Penalized Regression Calibration approach that makes it possible to handle many longitudinal covariates as predictors of survival. pencal uses mixed-effects models to summarize the trajectories of the longitudinal covariates up to a prespecified landmark time, and a penalized Cox model to predict survival based on both baseline covariates and summary measures of the longitudinal covariates. This article illustrates the structure of the R package, provides a step by step example showing how to estimate PRC, compute dynamic predictions of survival and validate performance, and shows how parallelization can be used to significantly reduce computing time.