Statistics
See recent articles
- [1] arXiv:2407.13814 [pdf, html, other]
-
Title: Building Population-Informed Priors for Bayesian Inference Using Data-Consistent Stochastic InversionSubjects: Methodology (stat.ME)
Bayesian inference provides a powerful tool for leveraging observational data to inform model predictions and uncertainties. However, when such data is limited, Bayesian inference may not adequately constrain uncertainty without the use of highly informative priors. Common approaches for constructing informative priors typically rely on either assumptions or knowledge of the underlying physics, which may not be available in all scenarios. In this work, we consider the scenario where data are available on a population of assets/individuals, which occurs in many problem domains such as biomedical or digital twin applications, and leverage this population-level data to systematically constrain the Bayesian prior and subsequently improve individualized inferences. The approach proposed in this paper is based upon a recently developed technique known as data-consistent inversion (DCI) for constructing a pullback probability measure. Succinctly, we utilize DCI to build population-informed priors for subsequent Bayesian inference on individuals. While the approach is general and applies to nonlinear maps and arbitrary priors, we prove that for linear inverse problems with Gaussian priors, the population-informed prior produces an increase in the information gain as measured by the determinant and trace of the inverse posterior covariance. We also demonstrate that the Kullback-Leibler divergence often improves with high probability. Numerical results, including linear-Gaussian examples and one inspired by digital twins for additively manufactured assets, indicate that there is significant value in using these population-informed priors.
- [2] arXiv:2407.13865 [pdf, html, other]
-
Title: Projection-pursuit Bayesian regression for symmetric matrix predictorsSubjects: Methodology (stat.ME)
This paper develops a novel Bayesian approach for nonlinear regression with symmetric matrix predictors, often used to encode connectivity of different nodes. Unlike methods that vectorize matrices as predictors that result in a large number of model parameters and unstable estimation, we propose a Bayesian multi-index regression method, resulting in a projection-pursuit-type estimator that leverages the structure of matrix-valued predictors. We establish the model identifiability conditions and impose a sparsity-inducing prior on the projection directions for sparse sampling to prevent overfitting and enhance interpretability of the parameter estimates. Posterior inference is conducted through Bayesian backfitting. The performance of the proposed method is evaluated through simulation studies and a case study investigating the relationship between brain connectivity features and cognitive scores.
- [3] arXiv:2407.13889 [pdf, html, other]
-
Title: LASPATED: A Library for the Analysis of Spatio-Temporal Discrete Data (User Manual)Comments: 25 pages, 6 figuresSubjects: Computation (stat.CO)
This is the User Manual of LASPATED library. This library is available on GitHub (at this https URL)) and provides a set of tools to analyze spatiotemporal data. A video tutorial for this library is available on Youtube. It is made of a Python package for time and space discretizations and of two packages (one in Matlab and one in C++) implementing the calibration of the probabilistic models for stochastic spatio-temporal data proposed in the companion paper arXiv:2203.16371v2.
- [4] arXiv:2407.13904 [pdf, other]
-
Title: In defense of MAR over latent ignorability (or latent MAR) for outcome missingness in studying principal causal effects: a causal graph viewSubjects: Methodology (stat.ME)
This paper concerns outcome missingness in principal stratification analysis. We revisit a common assumption known as latent ignorability or latent missing-at-random (LMAR), often considered a relaxation of missing-at-random (MAR). LMAR posits that the outcome is independent of its missingness if one conditions on principal stratum (which is partially unobservable) in addition to observed variables. The literature has focused on methods assuming LMAR (usually supplemented with a more specific assumption about the missingness), without considering the theoretical plausibility and necessity of LMAR. In this paper, we devise a way to represent principal stratum in causal graphs, and use causal graphs to examine this assumption. We find that LMAR is harder to satisfy than MAR, and for the purpose of breaking the dependence between the outcome and its missingness, no benefit is gained from conditioning on principal stratum on top of conditioning on observed variables. This finding has an important implication: MAR should be preferred over LMAR. This is convenient because MAR is easier to handle and (unlike LMAR) if MAR is assumed no additional assumption is needed. We thus turn to focus on the plausibility of MAR and its implications, with a view to facilitate appropriate use of this assumption. We clarify conditions on the causal structure and on auxiliary variables (if available) that need to hold for MAR to hold, and we use MAR to recover effect identification under two dominant identification assumptions (exclusion restriction and principal ignorability). We briefly comment on cases where MAR does not hold. In terms of broader connections, most of the MAR findings are also relevant to classic instrumental variable analysis that targets the local average treatment effect; and the LMAR finding suggests general caution with assumptions that condition on principal stratum.
- [5] arXiv:2407.13958 [pdf, other]
-
Title: Flexible max-stable processes for fast and efficient inferenceSubjects: Methodology (stat.ME)
Max-stable processes serve as the fundamental distributional family in extreme value theory. However, likelihood-based inference methods for max-stable processes still heavily rely on composite likelihoods, rendering them intractable in high dimensions due to their intractable densities. In this paper, we introduce a fast and efficient inference method for max-stable processes based on their angular densities for a class of max-stable processes whose angular densities do not put mass on the boundary space of the simplex, which can be used to construct r-Pareto processes. We demonstrate the efficiency of the proposed method through two new max-stable processes, the truncated extremal-t process and the skewed Brown-Resnick process. The proposed method is shown to be computationally efficient and can be applied to large datasets. Furthermore, the skewed Brown-Resnick process contains the popular Brown-Resnick model as a special case and possesses nonstationary extremal dependence structures. We showcase the new max-stable processes on simulated and real data.
- [6] arXiv:2407.13970 [pdf, html, other]
-
Title: On the Frequentist Coverage of Bayes Posteriors in Nonlinear Inverse ProblemsComments: 42 pages, 2 figuresSubjects: Statistics Theory (math.ST)
We study the asymptotic frequentist coverage and Gaussian approximation of Bayes posterior credible sets in nonlinear inverse problems when a Gaussian prior is placed on the parameter of the PDE. The aim is to ensure valid frequentist coverage of Bayes credible intervals when estimating continuous linear functionals of the parameter. Our results show that Bayes credible intervals have conservative coverage under certain smoothness assumptions on the parameter and a compatibility condition between the likelihood and the prior, regardless of whether an efficient limit exists and/or Bernstein von-Mises theorem holds. In the latter case, our results yield a corollary with more relaxed sufficient conditions than previous works. We illustrate practical utility of the results through the example of estimating the conductivity coefficient of a second order elliptic PDE, where a near-$N^{-1/2}$ contraction rate and conservative coverage results are obtained for linear functionals that were shown not to be estimable efficiently.
- [7] arXiv:2407.13971 [pdf, html, other]
-
Title: Dimension-reduced Reconstruction Map Learning for Parameter Estimation in Likelihood-Free Inference ProblemsSubjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
Many application areas rely on models that can be readily simulated but lack a closed-form likelihood, or an accurate approximation under arbitrary parameter values. Existing parameter estimation approaches in this setting are generally approximate. Recent work on using neural network models to reconstruct the mapping from the data space to the parameters from a set of synthetic parameter-data pairs suffers from the curse of dimensionality, resulting in inaccurate estimation as the data size grows. We propose a dimension-reduced approach to likelihood-free estimation which combines the ideas of reconstruction map estimation with dimension-reduction approaches based on subject-specific knowledge. We examine the properties of reconstruction map estimation with and without dimension reduction and explore the trade-off between approximation error due to information loss from reducing the data dimension and approximation error. Numerical examples show that the proposed approach compares favorably with reconstruction map estimation, approximate Bayesian computation, and synthetic likelihood estimation.
- [8] arXiv:2407.13977 [pdf, html, other]
-
Title: A Unified Confidence Sequence for Generalized Linear Models, with Applications to BanditsComments: 31 pages, 1 figure, 2 tablesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We present a unified likelihood ratio-based confidence sequence (CS) for any (self-concordant) generalized linear models (GLMs) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a poly(S)-free radius where S is the norm of the unknown parameter. Our first technical novelty is its derivation, which utilizes a time-uniform PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a rather unpopular choice for deriving CSs. As a direct application of our new CS, we propose a simple and natural optimistic algorithm called OFUGLB applicable to any generalized linear bandits (GLB; Filippi et al. (2010)). Our analysis shows that the celebrated optimistic approach simultaneously attains state-of-the-art regrets for various self-concordant (not necessarily bounded) GLBs, and even poly(S)-free for bounded GLBs, including logistic bandits. The regret analysis, our second technical novelty, follows from combining our new CS with a new proof technique that completely avoids the previously widely used self-concordant control lemma (Faury et al., 2020, Lemma 9). Finally, we verify numerically that OFUGLB significantly outperforms the prior state-of-the-art (Lee et al., 2024) for logistic bandits.
- [9] arXiv:2407.13980 [pdf, html, other]
-
Title: Byzantine-tolerant distributed learning of finite mixture modelsSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper proposes two split-and-conquer (SC) learning estimators for finite mixture models that are tolerant to Byzantine failures. In SC learning, individual machines obtain local estimates, which are then transmitted to a central server for aggregation. During this communication, the server may receive malicious or incorrect information from some local machines, a scenario known as Byzantine failures. While SC learning approaches have been devised to mitigate Byzantine failures in statistical models with Euclidean parameters, developing Byzantine-tolerant methods for finite mixture models with non-Euclidean parameters requires a distinct strategy. Our proposed distance-based methods are hyperparameter tuning free, unlike existing methods, and are resilient to Byzantine failures while achieving high statistical efficiency. We validate the effectiveness of our methods both theoretically and empirically via experiments on simulated and real data from machine learning applications for digit recognition. The code for the experiment can be found at this https URL.
- [10] arXiv:2407.14002 [pdf, html, other]
-
Title: Derandomized Truncated D-vine Copula Knockoffs with e-values to control the false discovery rateComments: 31 pages, 9 figures, 1 tableSubjects: Methodology (stat.ME)
The Model-X knockoffs is a practical methodology for variable selection, which stands out from other selection strategies since it allows for the control of the false discovery rate (FDR), relying on finite-sample guarantees. In this article, we propose a Truncated D-vine Copula Knockoffs (TDCK) algorithm for sampling approximate knockoffs from complex multivariate distributions. Our algorithm enhances and improves features of previous attempts to sample knockoffs under the multivariate setting, with the three main contributions being: 1) the truncation of the D-vine copula, which reduces the dependence between the original variables and their corresponding knockoffs, improving the statistical power; 2) the employment of a straightforward non-parametric formulation for marginal transformations, eliminating the need for a specific parametric family or a kernel density estimator; 3) the use of the "rvinecopulib'' R package offers better flexibility than the existing fitting vine copula knockoff methods. To eliminate the randomness in distinct realizations resulting in different sets of selected variables, we wrap the TDCK method with an existing derandomizing procedure for knockoffs, leading to a Derandomized Truncated D-vine Copula Knockoffs with e-values (DTDCKe) procedure. We demonstrate the robustness of the DTDCKe procedure under various scenarios with extensive simulation studies. We further illustrate its efficacy using a gene expression dataset, showing it achieves a more reliable gene selection than other competing methods, when the findings are compared with those of a meta-analysis. The results indicate that our Truncated D-vine copula approach is robust and has superior power, representing an appealing approach for variable selection in different multivariate applications, particularly in gene expression analysis.
- [11] arXiv:2407.14003 [pdf, html, other]
-
Title: Time Series Generative Learning with Application to Brain Imaging AnalysisComments: 45 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Methodology (stat.ME)
This paper focuses on the analysis of sequential image data, particularly brain imaging data such as MRI, fMRI, CT, with the motivation of understanding the brain aging process and neurodegenerative diseases. To achieve this goal, we investigate image generation in a time series context. Specifically, we formulate a min-max problem derived from the $f$-divergence between neighboring pairs to learn a time series generator in a nonparametric manner. The generator enables us to generate future images by transforming prior lag-k observations and a random vector from a reference distribution. With a deep neural network learned generator, we prove that the joint distribution of the generated sequence converges to the latent truth under a Markov and a conditional invariance condition. Furthermore, we extend our generation mechanism to a panel data scenario to accommodate multiple samples. The effectiveness of our mechanism is evaluated by generating real brain MRI sequences from the Alzheimer's Disease Neuroimaging Initiative. These generated image sequences can be used as data augmentation to enhance the performance of further downstream tasks, such as Alzheimer's disease detection.
- [12] arXiv:2407.14022 [pdf, html, other]
-
Title: Causal Inference with Complex Treatments: A SurveySubjects: Methodology (stat.ME); Machine Learning (cs.LG)
Causal inference plays an important role in explanatory analysis and decision making across various fields like statistics, marketing, health care, and education. Its main task is to estimate treatment effects and make intervention policies. Traditionally, most of the previous works typically focus on the binary treatment setting that there is only one treatment for a unit to adopt or not. However, in practice, the treatment can be much more complex, encompassing multi-valued, continuous, or bundle options. In this paper, we refer to these as complex treatments and systematically and comprehensively review the causal inference methods for addressing them. First, we formally revisit the problem definition, the basic assumptions, and their possible variations under specific conditions. Second, we sequentially review the related methods for multi-valued, continuous, and bundled treatment settings. In each situation, we tentatively divide the methods into two categories: those conforming to the unconfoundedness assumption and those violating it. Subsequently, we discuss the available datasets and open-source codes. Finally, we provide a brief summary of these works and suggest potential directions for future research.
- [13] arXiv:2407.14175 [pdf, html, other]
-
Title: On Policy Evaluation Algorithms in Distributional Reinforcement LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL). The proposed distributional dynamic programming algorithms are suitable for underlying Markov decision processes (MDPs) having an arbitrary probabilistic reward mechanism, including continuous reward distributions with unbounded support being potentially heavy-tailed.
For a plain instance of our proposed class of algorithms we prove error bounds, both within Wasserstein and Kolmogorov--Smirnov distances. Furthermore, for return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm. We introduce the concept of quantile-spline discretizations to come up with algorithms showing promising results in simulation experiments.
While the performance of our algorithms can rigorously be analysed they can be seen as universal black box algorithms applicable to a large class of MDPs. We also derive new properties of probability metrics commonly used in DRL on which our quantitative analysis is based. - [14] arXiv:2407.14194 [pdf, html, other]
-
Title: Enhancing Variable Importance in Random Forests: A Novel Application of Global Sensitivity AnalysisSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO)
The present work provides an application of Global Sensitivity Analysis to supervised machine learning methods such as Random Forests. These methods act as black boxes, selecting features in high--dimensional data sets as to provide accurate classifiers in terms of prediction when new data are fed into the system. In supervised machine learning, predictors are generally ranked by importance based on their contribution to the final prediction. Global Sensitivity Analysis is primarily used in mathematical modelling to investigate the effect of the uncertainties of the input variables on the output. We apply it here as a novel way to rank the input features by their importance to the explainability of the data generating process, shedding light on how the response is determined by the dependence structure of its predictors. A simulation study shows that our proposal can be used to explore what advances can be achieved either in terms of efficiency, explanatory ability, or simply by way of confirming existing results.
- [15] arXiv:2407.14248 [pdf, html, other]
-
Title: Incertus.jl -- The Julia Lego Blocks for Randomized Clinical Trial DesignsSubjects: Methodology (stat.ME); Computation (stat.CO)
In this paper, we present Insertus.jl, the Julia package that can help the user generate a randomization sequence of a given length for a multi-arm trial with a pre-specified target allocation ratio and assess the operating characteristics of the chosen randomization method through Monte Carlo simulations. The developed package is computationally efficient, and it can be invoked in R. Furthermore, the package is open-ended -- it can flexibly accommodate new randomization procedures and evaluate their statistical properties via simulation. It may be also helpful for validating other randomization methods for which software is not readily available. In summary, Insertus.jl can be used as ``Lego Blocks'' to construct a fit-for-purpose randomization procedure for a given clinical trial design.
- [16] arXiv:2407.14311 [pdf, html, other]
-
Title: A Bayesian joint model of multiple longitudinal and categorical outcomes with application to multiple myeloma using permutation-based variable importanceDanilo Alvares, Jessica K. Barrett, François Mercier, Jochen Schulze, Sean Yiu, Felipe Castro, Spyros Roumpanis, Yajing ZhuComments: 26 pages, 5 figuresSubjects: Methodology (stat.ME); Applications (stat.AP)
Joint models have proven to be an effective approach for uncovering potentially hidden connections between various types of outcomes, mainly continuous, time-to-event, and binary. Typically, longitudinal continuous outcomes are characterized by linear mixed-effects models, survival outcomes are described by proportional hazards models, and the link between outcomes are captured by shared random effects. Other modeling variations include generalized linear mixed-effects models for longitudinal data and logistic regression when a binary outcome is present, rather than time until an event of interest. However, in a clinical research setting, one might be interested in modeling the physician's chosen treatment based on the patient's medical history in order to identify prognostic factors. In this situation, there are often multiple treatment options, requiring the use of a multiclass classification approach. Inspired by this context, we develop a Bayesian joint model for longitudinal and categorical data. In particular, our motivation comes from a multiple myeloma study, in which biomarkers display nonlinear trajectories that are well captured through bi-exponential submodels, where patient-level information is shared with the categorical submodel. We also present a variable importance strategy for ranking prognostic factors. We apply our proposal and a competing model to the multiple myeloma data, compare the variable importance and inferential results for both models, and illustrate patient-level interpretations using our joint model.
- [17] arXiv:2407.14349 [pdf, html, other]
-
Title: Measuring and testing tail equivalenceSubjects: Methodology (stat.ME)
We call two copulas tail equivalent if their first-order approximations in the tail coincide. As a special case, a copula is called tail symmetric if it is tail equivalent to the associated survival copula. We propose a novel measure and statistical test for tail equivalence. The proposed measure takes the value of zero if and only if the two copulas share a pair of tail order and tail order parameter in common. Moreover, taking the nature of these tail quantities into account, we design the proposed measure so that it takes a large value when tail orders are different, and a small value when tail order parameters are non-identical. We derive asymptotic properties of the proposed measure, and then propose a novel statistical test for tail equivalence. Performance of the proposed test is demonstrated in a series of simulation studies and empirical analyses of financial stock returns in the periods of the world financial crisis and the COVID-19 recession. Our empirical analysis reveals non-identical tail behaviors in different pairs of stocks, different parts of tails, and the two periods of recessions.
- [18] arXiv:2407.14365 [pdf, html, other]
-
Title: Modified BART for Learning Heterogeneous Effects in Regression Discontinuity DesignsSubjects: Methodology (stat.ME)
This paper introduces BART-RDD, a sum-of-trees regression model built around a novel regression tree prior, which incorporates the special covariate structure of regression discontinuity designs. Specifically, the tree splitting process is constrained to ensure overlap within a narrow band surrounding the running variable cutoff value, where the treatment effect is identified. It is shown that unmodified BART-based models estimate RDD treatment effects poorly, while our modified model accurately recovers treatment effects at the cutoff. Specifically, BART-RDD is perhaps the first RDD method that effectively learns conditional average treatment effects. The new method is investigated in thorough simulation studies as well as an empirical application looking at the effect of academic probation on student performance in subsequent terms (Lindo et al., 2010).
- [19] arXiv:2407.14369 [pdf, html, other]
-
Title: tidychangepoint: a unified framework for analyzing changepoint detection in univariate time seriesSubjects: Methodology (stat.ME); Computation (stat.CO)
We present tidychangepoint, a new R package for changepoint detection analysis. tidychangepoint leverages existing packages like changepoint, GA, tsibble, and broom to provide tidyverse-compliant tools for segmenting univariate time series using various changepoint detection algorithms. In addition, tidychangepoint also provides model-fitting procedures for commonly-used parametric models, tools for computing various penalized objective functions, and graphical diagnostic displays. tidychangepoint wraps both deterministic algorithms like PELT, and also flexible, randomized, genetic algorithms that can be used with any compliant model-fitting function and any penalized objective function. By bringing all of these disparate tools together in a cohesive fashion, tidychangepoint facilitates comparative analysis of changepoint detection algorithms and models.
New submissions for Monday, 22 July 2024 (showing 19 of 19 entries )
- [20] arXiv:2407.13849 (cross-list from cs.LG) [pdf, html, other]
-
Title: CoxSE: Exploring the Potential of Self-Explaining Neural Networks with Cox Proportional Hazards Model for Survival AnalysisSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The Cox Proportional Hazards (CPH) model has long been the preferred survival model for its explainability. However, to increase its predictive power beyond its linear log-risk, it was extended to utilize deep neural networks sacrificing its explainability. In this work, we explore the potential of self-explaining neural networks (SENN) for survival analysis. we propose a new locally explainable Cox proportional hazards model, named CoxSE, by estimating a locally-linear log-hazard function using the SENN. We also propose a modification to the Neural additive (NAM) models hybrid with SENN, named CoxSENAM, which enables the control of the stability and consistency of the generated explanations. Several experiments using synthetic and real datasets have been performed comparing with a NAM-based model, DeepSurv model explained with SHAP, and a linear CPH model. The results show that, unlike the NAM-based model, the SENN-based model can provide more stable and consistent explanations while maintaining the same expressiveness power of the black-box model. The results also show that, due to their structural design, NAM-based models demonstrated better robustness to non-informative features. Among these models, the hybrid model exhibited the best robustness.
- [21] arXiv:2407.13925 (cross-list from physics.data-an) [pdf, html, other]
-
Title: EggNet: An Evolving Graph-based Graph Attention Network for Particle Track ReconstructionComments: 7 pages, 5 figuresSubjects: Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (cs.LG); High Energy Physics - Phenomenology (hep-ph); Machine Learning (stat.ML)
Track reconstruction is a crucial task in particle experiments and is traditionally very computationally expensive due to its combinatorial nature. Recently, graph neural networks (GNNs) have emerged as a promising approach that can improve scalability. Most of these GNN-based methods, including the edge classification (EC) and the object condensation (OC) approach, require an input graph that needs to be constructed beforehand. In this work, we consider a one-shot OC approach that reconstructs particle tracks directly from a set of hits (point cloud) by recursively applying graph attention networks with an evolving graph structure. This approach iteratively updates the graphs and can better facilitate the message passing across each graph. Preliminary studies on the TrackML dataset show better track performance compared to the methods that require a fixed input graph.
- [22] arXiv:2407.13979 (cross-list from cs.LG) [pdf, html, other]
-
Title: Truthfulness of Calibration MeasuresSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
We initiate the study of the truthfulness of calibration measures in sequential prediction. A calibration measure is said to be truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. Truthfulness is an important property of calibration measures, ensuring that the forecaster is not incentivized to exploit the system with deliberate poor forecasts. This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness.
We conduct a taxonomy of existing calibration measures and their truthfulness. Perhaps surprisingly, we find that all of them are far from being truthful. That is, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty. Our main contribution is the introduction of a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor. - [23] arXiv:2407.13995 (cross-list from eess.SP) [pdf, html, other]
-
Title: Track-MDP: Reinforcement Learning for Target Tracking with Controlled SensingSubjects: Signal Processing (eess.SP); Machine Learning (stat.ML)
State of the art methods for target tracking with sensor management (or controlled sensing) are model-based and are obtained through solutions to Partially Observable Markov Decision Process (POMDP) formulations. In this paper a Reinforcement Learning (RL) approach to the problem is explored for the setting where the motion model for the object/target to be tracked is unknown to the observer. It is assumed that the target dynamics are stationary in time, the state space and the observation space are discrete, and there is complete observability of the location of the target under certain (a priori unknown) sensor control actions. Then, a novel Markov Decision Process (MDP) rather than POMDP formulation is proposed for the tracking problem with controlled sensing, which is termed as Track-MDP. In contrast to the POMDP formulation, the Track-MDP formulation is amenable to an RL based solution. It is shown that the optimal policy for the Track-MDP formulation, which is approximated through RL, is guaranteed to track all significant target paths with certainty. The Track-MDP method is then compared with the optimal POMDP policy, and it is shown that the infinite horizon tracking reward of the optimal Track-MDP policy is the same as that of the optimal POMDP policy. In simulations it is demonstrated that Track-MDP based RL leads to a policy that can track the target with high accuracy.
- [24] arXiv:2407.14021 (cross-list from eess.AS) [pdf, html, other]
-
Title: GE2E-AC: Generalized End-to-End Loss Training for Accent ClassificationSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
Accent classification or AC is a task to predict the accent type of an input utterance, and it can be used as a preliminary step toward accented speech recognition and accent conversion. Existing studies have often achieved such classification by training a neural network model to minimize the classification error of the predicted accent label, which can be obtained as a model output. Since we optimize the entire model only from the perspective of classification loss during training time in this approach, the model might learn to predict the accent type from irrelevant features, such as individual speaker identity, which are not informative during test time. To address this problem, we propose a GE2E-AC, in which we train a model to extract accent embedding or AE of an input utterance such that the AEs of the same accent class get closer, instead of directly minimizing the classification loss. We experimentally show the effectiveness of the proposed GE2E-AC, compared to the baseline model trained with the conventional cross-entropy-based loss.
- [25] arXiv:2407.14065 (cross-list from cs.LG) [pdf, html, other]
-
Title: MSCT: Addressing Time-Varying Confounding with Marginal Structural Causal Transformer for Counterfactual Post-Crash Traffic PredictionComments: 13 pages, 9 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Traffic crashes profoundly impede traffic efficiency and pose economic challenges. Accurate prediction of post-crash traffic status provides essential information for evaluating traffic perturbations and developing effective solutions. Previous studies have established a series of deep learning models to predict post-crash traffic conditions, however, these correlation-based methods cannot accommodate the biases caused by time-varying confounders and the heterogeneous effects of crashes. The post-crash traffic prediction model needs to estimate the counterfactual traffic speed response to hypothetical crashes under various conditions, which demonstrates the necessity of understanding the causal relationship between traffic factors. Therefore, this paper presents the Marginal Structural Causal Transformer (MSCT), a novel deep learning model designed for counterfactual post-crash traffic prediction. To address the issue of time-varying confounding bias, MSCT incorporates a structure inspired by Marginal Structural Models and introduces a balanced loss function to facilitate learning of invariant causal features. The proposed model is treatment-aware, with a specific focus on comprehending and predicting traffic speed under hypothetical crash intervention strategies. In the absence of ground-truth data, a synthetic data generation procedure is proposed to emulate the causal mechanism between traffic speed, crashes, and covariates. The model is validated using both synthetic and real-world data, demonstrating that MSCT outperforms state-of-the-art models in multi-step-ahead prediction performance. This study also systematically analyzes the impact of time-varying confounding bias and dataset distribution on model performance, contributing valuable insights into counterfactual prediction for intelligent transportation systems.
- [26] arXiv:2407.14072 (cross-list from cs.HC) [pdf, html, other]
-
Title: FAVis: Visual Analytics of Factor Analysis for Psychological ResearchComments: 5 pages and 2 figures. To Appear in IEEE VIS 2024Subjects: Human-Computer Interaction (cs.HC); Applications (stat.AP); Other Statistics (stat.OT)
Psychological research often involves understanding psychological constructs through conducting factor analysis on data collected by a questionnaire, which can comprise hundreds of questions. Without interactive systems for interpreting factor models, researchers are frequently exposed to subjectivity, potentially leading to misinterpretations or overlooked crucial information. This paper introduces FAVis, a novel interactive visualization tool designed to aid researchers in interpreting and evaluating factor analysis results. FAVis enhances the understanding of relationships between variables and factors by supporting multiple views for visualizing factor loadings and correlations, allowing users to analyze information from various perspectives. The primary feature of FAVis is to enable users to set optimal thresholds for factor loadings to balance clarity and information retention. FAVis also allows users to assign tags to variables, enhancing the understanding of factors by linking them to their associated psychological constructs. Our user study demonstrates the utility of FAVis in various tasks.
- [27] arXiv:2407.14074 (cross-list from econ.EM) [pdf, html, other]
-
Title: Regression Adjustment for Estimating Distributional Treatment Effects in Randomized Controlled TrialsSubjects: Econometrics (econ.EM); Statistics Theory (math.ST); Methodology (stat.ME)
In this paper, we address the issue of estimating and inferring the distributional treatment effects in randomized experiments. The distributional treatment effect provides a more comprehensive understanding of treatment effects by characterizing heterogeneous effects across individual units, as opposed to relying solely on the average treatment effect. To enhance the precision of distributional treatment effect estimation, we propose a regression adjustment method that utilizes the distributional regression and pre-treatment information. Our method is designed to be free from restrictive distributional assumptions. We establish theoretical efficiency gains and develop a practical, statistically sound inferential framework. Through extensive simulation studies and empirical applications, we illustrate the substantial advantages of our method, equipping researchers with a powerful tool for capturing the full spectrum of treatment effects in experimental research.
- [28] arXiv:2407.14156 (cross-list from math.OC) [pdf, html, other]
-
Title: Learning Firmly Nonexpansive OperatorsSubjects: Optimization and Control (math.OC); Functional Analysis (math.FA); Statistics Theory (math.ST)
This paper proposes a data-driven approach for constructing firmly nonexpansive operators. We demonstrate its applicability in Plug-and-Play methods, where classical algorithms such as forward-backward splitting, Chambolle--Pock primal-dual iteration, Douglas--Rachford iteration or alternating directions method of multipliers (ADMM), are modified by replacing one proximal map by a learned firmly nonexpansive operator. We provide sound mathematical background to the problem of learning such an operator via expected and empirical risk minimization. We prove that, as the number of training points increases, the empirical risk minimization problem converges (in the sense of Gamma-convergence) to the expected risk minimization problem. Further, we derive a solution strategy that ensures firmly nonexpansive and piecewise affine operators within the convex envelope of the training set. We show that this operator converges to the best empirical solution as the number of points in the envelope increases in an appropriate sense. Finally, the experimental section details practical implementations of the method and presents an application in image denoising.
- [29] arXiv:2407.14185 (cross-list from cs.LG) [pdf, html, other]
-
Title: Achieving Well-Informed Decision-Making in Drug Discovery: A Comprehensive Calibration Study using Neural Network-Based Structure-Activity ModelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In the drug discovery process, where experiments can be costly and time-consuming, computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents. Estimating the uncertainty inherent in these neural network predictions provides valuable information that facilitates optimal decision-making when risk assessment is crucial. However, such models can be poorly calibrated, which results in unreliable uncertainty estimates that do not reflect the true predictive uncertainty. In this study, we compare different metrics, including accuracy and calibration scores, used for model hyperparameter tuning to investigate which model selection strategy achieves well-calibrated models. Furthermore, we propose to use a computationally efficient Bayesian uncertainty estimation method named Bayesian Linear Probing (BLP), which generates Hamiltonian Monte Carlo (HMC) trajectories to obtain samples for the parameters of a Bayesian Logistic Regression fitted to the hidden layer of the baseline neural network. We report that BLP improves model calibration and achieves the performance of common uncertainty quantification methods by combining the benefits of uncertainty estimation and probability calibration methods. Finally, we show that combining post hoc calibration method with well-performing uncertainty quantification approaches can boost model accuracy and calibration.
- [30] arXiv:2407.14495 (cross-list from cs.LG) [pdf, html, other]
-
Title: Conformal Thresholded Intervals for Efficient RegressionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper introduces Conformal Thresholded Intervals (CTI), a novel conformal regression method that aims to produce the smallest possible prediction set with guaranteed coverage. Unlike existing methods that rely on nested conformal framework and full conditional distribution estimation, CTI estimates the conditional probability density for a new response to fall into each interquantile interval using off-the-shelf multi-output quantile regression. CTI constructs prediction sets by thresholding the estimated conditional interquantile intervals based on their length, which is inversely proportional to the estimated probability density. The threshold is determined using a calibration set to ensure marginal coverage. Experimental results demonstrate that CTI achieves optimal performance across various datasets.
Cross submissions for Monday, 22 July 2024 (showing 11 of 11 entries )
- [31] arXiv:2001.03798 (replaced) [pdf, html, other]
-
Title: Bayesian Semi-supervised Multi-category Classification under NonparanormalitySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
Semi-supervised learning is a model training method that uses both labeled and unlabeled data. This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any multi-category classification problem. We assume the labels are missing at random when using unlabeled data in a semi-supervised setting. Suppose we have $K$ classes in the data. We assume that the observations follow $K$ multivariate normal distributions depending on their true class labels after some common unknown transformation is applied to each component of the observation vector. The function is expanded in a B-splines series, and a prior is added to the coefficients. We consider a normal prior on the coefficients and constrain the values to meet the normality and identifiability constraints requirement. The precision matrices of the Gaussian distributions are given a conjugate Wishart prior, while the means are given the improper uniform prior. The resulting posterior is still conditionally conjugate, and the Gibbs sampler aided by a data-augmentation technique can thus be adopted. An extensive simulation study compares the proposed method with several other available methods. The proposed method is also applied to real datasets on diagnosing breast cancer and classification of signals. We conclude that the proposed method has a better prediction accuracy in various cases.
- [32] arXiv:2205.07689 (replaced) [pdf, html, other]
-
Title: From Small Scales to Large Scales: Distance-to-Measure Density based Geometric Analysis of Complex DataSubjects: Methodology (stat.ME)
How can we tell complex point clouds with different small scale characteristics apart, while disregarding global features? Can we find a suitable transformation of such data in a way that allows to discriminate between differences in this sense with statistical guarantees? In this paper, we consider the analysis and classification of complex point clouds as they are obtained, e.g., via single molecule localization microscopy. We focus on the task of identifying differences between noisy point clouds based on small scale characteristics, while disregarding large scale information such as overall size. We propose an approach based on a transformation of the data via the so-called Distance-to-Measure (DTM) function, a transformation which is based on the average of nearest neighbor distances. For each data set, we estimate the probability density of average local distances of all data points and use the estimated densities for classification. While the applicability is immediate and the practical performance of the proposed methodology is very good, the theoretical study of the density estimators is quite challenging, as they are based on i.i.d. observations that have been obtained via a complicated transformation. In fact, the transformed data are stochastically dependent in a non-local way that is not captured by commonly considered dependence measures. Nonetheless, we show that the asymptotic behaviour of the density estimator is driven by a kernel density estimator of certain i.i.d. random variables by using theoretical properties of U-statistics, which allows to handle the dependencies via a Hoeffding decomposition. We show via a numerical study and in an application to simulated single molecule localization microscopy data of chromatin fibers that unsupervised classification tasks based on estimated DTM-densities achieve excellent separation results.
- [33] arXiv:2302.00519 (replaced) [pdf, html, other]
-
Title: Time series on compact spaces, with an application to dynamic modeling of relative abundance data in EcologySubjects: Statistics Theory (math.ST); Applications (stat.AP)
Motivated by the dynamic modeling of relative abundance data in ecology, we introduce a general approach to model stationary Markovian or non Markovian time series on (relatively) compact spaces such as a hypercube, the simplex or a sphere in the Euclidean space. Our approach is based on a general construction of infinite memory models, called chains with complete connections. The two main ingredients involved in our generic construction are a parametric family of probability distributions on the state space and a map from the state space to the parameter space. Our framework encompasses Markovian models, observation-driven models and more general infinite memory models. Simple conditions ensuring the existence and uniqueness of a stationary and ergodic path are given. We then study in more details statistical inference in two time series models on the simplex, based on either a Dirichlet or a multivariate logistic-normal conditional distribution. Usefulness of our models to analyze abundance data in ecosystems is also discussed.
- [34] arXiv:2305.14543 (replaced) [pdf, html, other]
-
Title: Deep Functional Factor Models: Forecasting High-Dimensional Functional Time Series via Bayesian Nonparametric FactorizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This paper introduces the Deep Functional Factor Model (DF2M), a Bayesian nonparametric model designed for analysis of high-dimensional functional time series. DF2M is built upon the Indian Buffet Process and the multi-task Gaussian Process, incorporating a deep kernel function that captures non-Markovian and nonlinear temporal dynamics. Unlike many black-box deep learning models, DF2M offers an explainable approach to utilizing neural networks by constructing a factor model and integrating deep neural networks within the kernel function. Additionally, we develop a computationally efficient variational inference algorithm to infer DF2M. Empirical results from four real-world datasets demonstrate that DF2M provides better explainability and superior predictive accuracy compared to conventional deep learning models for high-dimensional functional time series.
- [35] arXiv:2305.15759 (replaced) [pdf, html, other]
-
Title: Differentially Private Latent Diffusion ModelsSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Diffusion models (DMs) are one of the most widely used generative models for producing high quality images. However, a flurry of recent papers points out that DMs are least private forms of image generators, by extracting a significant number of near-identical replicas of training images from DMs. Existing privacy-enhancing techniques for DMs, unfortunately, do not provide a good privacy-utility tradeoff. In this paper, we aim to improve the current state of DMs with differential privacy (DP) by adopting the \textit{Latent} Diffusion Models (LDMs). LDMs are equipped with powerful pre-trained autoencoders that map the high-dimensional pixels into lower-dimensional latent representations, in which DMs are trained, yielding a more efficient and fast training of DMs. Rather than fine-tuning the entire LDMs, we fine-tune only the $\textit{attention}$ modules of LDMs with DP-SGD, reducing the number of trainable parameters by roughly $90\%$ and achieving a better privacy-accuracy trade-off. Our approach allows us to generate realistic, high-dimensional images (256x256) conditioned on text prompts with DP guarantees, which, to the best of our knowledge, has not been attempted before. Our approach provides a promising direction for training more powerful, yet training-efficient differentially private DMs, producing high-quality DP images. Our code is available at https://anonymous.4open.science/r/DP-LDM-4525.
- [36] arXiv:2310.16975 (replaced) [pdf, html, other]
-
Title: Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian InferenceComments: 26 pages, 7 tables, 8 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inference$\unicode{x2013}$particularly in the simulation-based ("likelihood-free") setting. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. Although many measure transport approaches model the transformation as COT maps, obtaining the map is computationally challenging, even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize COT maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.
- [37] arXiv:2311.03490 (replaced) [pdf, html, other]
-
Title: Analytics, have some humility: a statistical view of fourth-down decision makingSubjects: Applications (stat.AP)
The standard mathematical approach to fourth-down decision making in American football is to make the decision that maximizes estimated win probability. Win probability estimates arise from machine learning models fit from historical data. These models attempt to capture a nuanced relationship between a noisy binary outcome variable and game-state variables replete with interactions and non-linearities from a finite dataset of just a few thousand games. Thus, it is imperative to knit uncertainty quantification into the fourth-down decision procedure; we do so using bootstrapping. We find that uncertainty in the estimated optimal fourth-down decision is far greater than that currently expressed by sports analysts in popular sports media.
- [38] arXiv:2312.05345 (replaced) [pdf, html, other]
-
Title: Spline-Based Multi-State Models for Analyzing Disease ProgressionComments: Keywords: disease progression; information matrix; multi-state Markov model; penalized log-likelihood; penalized regression splineSubjects: Methodology (stat.ME); Computation (stat.CO)
Motivated by disease progression-related studies, we propose an estimation method for fitting general non-homogeneous multi-state Markov models. The proposal can handle many types of multi-state processes, with several states and various combinations of observation schemes (e.g., intermittent, exactly observed, censored), and allows for the transition intensities to be flexibly modelled through additive (spline-based) predictors. The algorithm is based on a computationally efficient and stable penalized maximum likelihood estimation approach which exploits the information provided by the analytical Hessian matrix of the model log-likelihood. The proposed modeling framework is employed in case studies that aim at modeling the onset of cardiac allograft vasculopathy, and cognitive decline due to aging, where novel patterns are uncovered. To support applicability and reproducibility, all developed tools are implemented in the R package flexmsm.
- [39] arXiv:2312.07502 (replaced) [pdf, html, other]
-
Title: Posterior Concentration for Gaussian Process Priors under Rescaled and Hierarchical Mat\'ern and Confluent Hypergeometric Covariance FunctionsComments: 38 pages, 7 figuresSubjects: Statistics Theory (math.ST)
In nonparameteric Bayesian approaches, Gaussian stochastic processes can serve as priors on real-valued function spaces. Existing literature on the posterior convergence rates under Gaussian process priors shows that it is possible to achieve optimal or near-optimal posterior contraction rates if the smoothness of the Gaussian process matches that of the target function. Among those priors, Gaussian processes with a parametric Matérn covariance function is particularly notable in that its degree of smoothness can be determined by a dedicated smoothness parameter. \citet{ma2022beyond} recently introduced a new family of covariance functions called the Confluent Hypergeometric (CH) class that simultaneously possess two parameters: one controls the tail index of the polynomially decaying covariance function, and the other parameter controls the degree of mean-squared smoothness analogous to the Matérn class. In this paper, we show that with proper choice of rescaling parameters in the Matérn and CH covariance functions, it is possible to obtain the minimax optimal posterior contraction rate for $\eta$-regular functions for nonparametric regression model with fixed design. Unlike the previous results for unrescaled cases, the smoothness parameter of the covariance function need not equal $\eta$ for achieving the optimal minimax rate, for either rescaled Matérn or rescaled CH covariances, illustrating a key benefit for rescaling. We also consider a fully Bayesian treatment of the rescaling parameters and show the resulting posterior distributions still contract at the minimax-optimal rate. The resultant hierarchical Bayesian procedure is fully adaptive to the unknown true smoothness.
- [40] arXiv:2312.15566 (replaced) [pdf, html, other]
-
Title: Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability GuaranteesComments: To appear in AAAI 2024Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Censoring is the central problem in survival analysis where either the time-to-event (for instance, death), or the time-tocensoring (such as loss of follow-up) is observed for each sample. The majority of existing machine learning-based survival analysis methods assume that survival is conditionally independent of censoring given a set of covariates; an assumption that cannot be verified since only marginal distributions is available from the data. The existence of dependent censoring, along with the inherent bias in current estimators has been demonstrated in a variety of applications, accentuating the need for a more nuanced approach. However, existing methods that adjust for dependent censoring require practitioners to specify the ground truth copula. This requirement poses a significant challenge for practical applications, as model misspecification can lead to substantial bias. In this work, we propose a flexible deep learning-based survival analysis method that simultaneously accommodate for dependent censoring and eliminates the requirement for specifying the ground truth copula. We theoretically prove the identifiability of our model under a broad family of copulas and survival distributions. Experiments results from a wide range of datasets demonstrate that our approach successfully discerns the underlying dependency structure and significantly reduces survival estimation bias when compared to existing methods.
- [41] arXiv:2401.15014 (replaced) [pdf, html, other]
-
Title: A Robust Bayesian Method for Building Polygenic Risk Scores using Projected Summary Statistics and Bridge PriorSubjects: Methodology (stat.ME)
Polygenic risk scores (PRS) developed from genome-wide association studies (GWAS) are of increasing interest for clinical and research applications. Bayesian methods have been popular for building PRS because of their natural ability to regularize models and incorporate external information. In this article, we present new theoretical results, methods, and extensive numerical studies to advance Bayesian methods for PRS applications. We identify a potential risk, under a common Bayesian PRS framework, of posterior impropriety when integrating the required GWAS summary-statistics and linkage disequilibrium (LD) data from two distinct sources. As a principled remedy to this problem, we propose a projection of the summary statistics data that ensures compatibility between the two sources and in turn a proper behavior of the posterior. We further introduce a new PRS method, with accompanying software package, under the less-explored Bayesian bridge prior to more flexibly model varying sparsity levels in effect size distributions. We extensively benchmark it against alternative Bayesian methods using both synthetic and real datasets, quantifying the impact of both prior specification and LD estimation strategy. Our proposed PRS-Bridge, equipped with the projection technique and flexible prior, demonstrates the most consistent and generally superior performance across a variety of scenarios.
- [42] arXiv:2402.01919 (replaced) [pdf, other]
-
Title: Separation rates for the detection of synchronization of interacting point processes in a mean field frame. Application to neuroscienceSubjects: Statistics Theory (math.ST); Probability (math.PR)
We develop and study a statistical test to detect synchrony in spike trains. Our test is based on the number of coincidences between two trains of spikes. The data are supplied in the form of \(n\) pairs (assumed to be independent) of spike trains. The aim is to assess whether the two trains in a pair are also independent. Our approach is based on previous results of Albert et al. (2015, 2019) and Kim et al. (2022) that we extend to our setting, focusing on the construction of a non-asymptotic criterion ensuring the detection of synchronization in the framework of permutation tests. Our criterion is constructed such that it ensures the control of the Type II error, while the Type I error is controlled by construction. We illustrate our results within two classical models of interacting neurons, the jittering Poisson model and Hawkes processes having \(M\) components interacting in a mean field frame and evolving in stationary regime. For this latter model, we obtain a lower bound of the size \(n\) of the sample necessary to detect the dependency between two neurons.
- [43] arXiv:2402.07613 (replaced) [pdf, html, other]
-
Title: Global optimality under amenable symmetry constraintsSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
Consider a convex function that is invariant under an group of transformations. If it has a minimizer, does it also have an invariant minimizer? Variants of this problem appear in nonparametric statistics and in a number of adjacent fields. The answer depends on the choice of function, and on what one may loosely call the geometry of the problem -- the interplay between convexity, the group, and the underlying vector space, which is typically infinite-dimensional. We observe that this geometry is completely encoded in the smallest closed convex invariant subsets of the space, and proceed to study these sets, for groups that are amenable but not necessarily compact. We then apply this toolkit to the invariant optimality problem. It yields new results on invariant kernel mean embeddings and risk-optimal invariant couplings, and clarifies relations between seemingly distinct ideas, such as the summation trick used in machine learning to construct equivariant neural networks and the classic Hunt-Stein theorem of statistics.
- [44] arXiv:2402.10232 (replaced) [pdf, other]
-
Title: Simple, unified analysis of Johnson-Lindenstrauss with applicationsComments: 24 pages, presented at "High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning"Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Probability (math.PR)
We present a simplified and unified analysis of the Johnson-Lindenstrauss (JL) lemma, a cornerstone of dimensionality reduction for managing high-dimensional data. Our approach simplifies understanding and unifies various constructions under the JL framework, including spherical, binary-coin, sparse JL, Gaussian, and sub-Gaussian models. This unification preserves the intrinsic geometry of data, essential for applications from streaming algorithms to reinforcement learning. We provide the first rigorous proof of the spherical construction's effectiveness and introduce a general class of sub-Gaussian constructions within this simplified framework. Central to our contribution is an innovative extension of the Hanson-Wright inequality to high dimensions, complete with explicit constants. By using simple yet powerful probabilistic tools and analytical techniques, such as an enhanced diagonalization process, our analysis solidifies the theoretical foundation of the JL lemma by removing an independence assumption and extends its practical applicability to contemporary algorithms.
- [45] arXiv:2403.04915 (replaced) [pdf, html, other]
-
Title: Bayesian Inference for High-dimensional Time Series by Latent Process ModelingSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
Time series data arising in many applications nowadays are high-dimensional. A large number of parameters describe features of these time series. We propose a novel approach to modeling a high-dimensional time series through several independent univariate time series, which are then orthogonally rotated and sparsely linearly transformed. With this approach, any specified intrinsic relations among component time series given by a graphical structure can be maintained at all time snapshots. We call the resulting process an Orthogonally-rotated Univariate Time series (OUT). Key structural properties of time series such as stationarity and causality can be easily accommodated in the OUT model. For Bayesian inference, we put suitable prior distributions on the spectral densities of the independent latent times series, the orthogonal rotation matrix, and the common precision matrix of the component times series at every time point. A likelihood is constructed using the Whittle approximation for univariate latent time series. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for posterior computation. We study the convergence of the pseudo-posterior distribution based on the Whittle likelihood for the model's parameters upon developing a new general posterior convergence theorem for pseudo-posteriors. We find that the posterior contraction rate for independent observations essentially prevails in the OUT model under very mild conditions on the temporal dependence described in terms of the smoothness of the corresponding spectral densities. Through a simulation study, we compare the accuracy of estimating the parameters and identifying the graphical structure with other approaches. We apply the proposed methodology to analyze a dataset on different industrial components of the US gross domestic product between 2010 and 2019 and predict future observations.
- [46] arXiv:2404.01390 (replaced) [pdf, html, other]
-
Title: Convex relaxation for the generalized maximum-entropy sampling problemSubjects: Statistics Theory (math.ST); Optimization and Control (math.OC)
The generalized maximum-entropy sampling problem (GMESP) is to select an order-$s$ principal submatrix from an order-$n$ covariance matrix, to maximize the product of its $t$ greatest eigenvalues, $0<t\leq s <n$. Introduced more than 25 years ago, GMESP is a natural generalization of two fundamental problems in statistical design theory: (i) maximum-entropy sampling problem (MESP); (ii) binary D-optimality (D-Opt). In the general case, it can be motivated by a selection problem in the context of principal component analysis (PCA).
We introduce the first convex-optimization based relaxation for GMESP, study its behavior, compare it to an earlier spectral bound, and demonstrate its use in a branch-and-bound scheme. We find that such an approach is practical when $s-t$ is very small. - [47] arXiv:2405.13821 (replaced) [pdf, html, other]
-
Title: Normalizing Basis Functions: Approximate Stationary Models for Large Spatial DataComments: Version 2Subjects: Computation (stat.CO); Numerical Analysis (math.NA); Applications (stat.AP)
In geostatistics, traditional spatial models often rely on the Gaussian Process (GP) to fit stationary covariances to data. It is well known that this approach becomes computationally infeasible when dealing with large data volumes, necessitating the use of approximate methods. A powerful class of methods approximate the GP as a sum of basis functions with random coefficients. Although this technique offers computational efficiency, it does not inherently guarantee a stationary covariance. To mitigate this issue, the basis functions can be "normalized" to maintain a constant marginal variance, avoiding unwanted artifacts and edge effects. This allows for the fitting of nearly stationary models to large, potentially non-stationary datasets, providing a rigorous base to extend to more complex problems. Unfortunately, the process of normalizing these basis functions is computationally demanding. To address this, we introduce two fast and accurate algorithms to the normalization step, allowing for efficient prediction on fine grids. The practical value of these algorithms is showcased in the context of a spatial analysis on a large dataset, where significant computational speedups are achieved. While implementation and testing are done specifically within the LatticeKrig framework, these algorithms can be adapted to other basis function methods operating on regular grids.
- [48] arXiv:2406.16171 (replaced) [pdf, html, other]
-
Title: Exploring the difficulty of estimating win probability: a simulation studySubjects: Methodology (stat.ME); Applications (stat.AP)
Estimating win probability is one of the classic modeling tasks of sports analytics. Many widely used win probability estimators are statistical win probability models, which fit the relationship between a binary win/loss outcome variable and certain game-state variables using data-driven regression or machine learning approaches. To illustrate just how difficult it is to accurately fit a statistical win probability model from noisy and highly correlated observational data, in this paper we conduct a simulation study. We create a simplified random walk version of football in which true win probability at each game-state is known, and we see how well a model recovers it. We find that the dependence structure of observational play-by-play data substantially inflates the bias and variance of estimators and lowers the effective sample size. This makes it essential to quantify uncertainty in win probability estimates, but typical bootstrapped confidence intervals are too narrow and don't achieve nominal coverage. Hence, we introduce a novel method, the fractional bootstrap, to calibrate these intervals to achieve adequate coverage.
- [49] arXiv:2407.13296 (replaced) [pdf, html, other]
-
Title: Prediction intervals for overdispersed binomial endpoints and their application to historical control dataSubjects: Applications (stat.AP)
In toxicology, the validation of the concurrent control by historical control data (HCD) has become requirements. This validation is usually done by historical control limits (HCL) which in practice are often graphically displayed in a Sheward control chart like manner. In many applications, HCL are applied to dichotomous data, e.g. the number of rats with a tumor vs. the number of rats without a tumor (carcinogenicity studies) or the number of cells with a micronucleus out of a total number of cells. Dichotomous HCD may be overdispersed and can be heavily right- (or left-) skewed, which is usually not taken into account in the practical applications of HCL. To overcome this problem, four different prediction intervals (two frequentist, two Bayesian), that can be applied to such data, are proposed. Comprehensive Monte-Carlo simulations assessing the coverage probabilities of seven different methods for HCL calculation reveal, that frequentist bootstrap calibrated prediction intervals control the type-1-error best. Heuristics traditionally used in control charts (e.g. the limits in Sheward np-charts or the mean plus minus 2 SD) as well a the historical range fail to control a pre-specified coverage probability. The application of HCL is demonstrated based on a real life data set containing historical controls from long-term carcinogenicity studies run on behalf of the U.S. National Toxicology Program. The proposed frequentist prediction intervals are publicly available from the R package predint, whereas R code for the computation of the Bayesian prediction intervals is provided via GitHub.
- [50] arXiv:1803.11039 (replaced) [pdf, html, other]
-
Title: L\'evy Area Analysis and Parameter Estimation for fOU Processes via Non-Geometric Rough Path TheoryComments: Published in the journal: Acta Mathematica Scientia, 2024Subjects: Probability (math.PR); Statistics Theory (math.ST)
This paper addresses the estimation problem of an unknown drift parameter matrix for a fractional Ornstein-Uhlenbeck process in a multi-dimensional setting. To tackle this problem, we propose a novel approach based on rough path theory that allows us to construct pathwise rough path estimators from both continuous and discrete observations of a single path. Our approach is particularly suitable for high-frequency data. To formulate the parameter estimators, we introduce a theory of pathwise Itô integrals with respect to fractional Brownian motion. By establishing the regularity of fractional Ornstein-Uhlenbeck processes and analyzing the long-term behavior of the associated Lévy area processes, we demonstrate that our estimators are strongly consistent and pathwise stable. Our findings offer a new perspective on estimating the drift parameter matrix for fractional Ornstein-Uhlenbeck processes in multi-dimensional settings, and may have practical implications for fields including finance, economics, and engineering.
- [51] arXiv:2207.13665 (replaced) [pdf, html, other]
-
Title: Causal foundations of bias, disparity and fairnessSubjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Applications (stat.AP)
The study of biases, such as gender or racial biases, is an important topic in the social and behavioural sciences. However, the literature does not always clearly define the concept. Definitions of bias are often ambiguous or not provided at all. To study biases in a precise manner, it is important to have a well-defined concept of bias. We propose to define bias as a direct causal effect that is unjustified. We propose to define the closely related concept of disparity as a direct or indirect causal effect that includes a bias. Our proposed definitions can be used to study biases and disparities in a more rigorous and systematic way. We compare our definitions of bias and disparity with various criteria of fairness introduced in the artificial intelligence literature. In addition, we discuss how our definitions relate to discrimination. We illustrate our definitions of bias and disparity in two case studies, focusing on gender bias in science and racial bias in police shootings. Our proposed definitions aim to contribute to a better appreciation of the causal intricacies of studies of biases and disparities. We hope that this will also promote an improved understanding of the policy implications of such studies.
- [52] arXiv:2211.09619 (replaced) [pdf, html, other]
-
Title: Introduction to Online Nonstochastic ControlComments: Draft; comments/suggestions welcome at nonstochastic.control@gmail.comSubjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control.
The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees. - [53] arXiv:2212.12206 (replaced) [pdf, html, other]
-
Title: Understanding and Improving Transfer Learning of Deep Models via Neural CollapseComments: First two authors contributed equally. Accepted at TMLRSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.
- [54] arXiv:2303.15845 (replaced) [pdf, html, other]
-
Title: Conditional Generative Models are Provably Robust: Pointwise Guarantees for Bayesian Inverse ProblemsComments: Accepted and published in Transactions on Machine Learning Research (07/2023)Journal-ref: Transactions on Machine Learning Research (TMLR), 2023Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Conditional generative models became a very powerful tool to sample from Bayesian inverse problem posteriors. It is well-known in classical Bayesian literature that posterior measures are quite robust with respect to perturbations of both the prior measure and the negative log-likelihood, which includes perturbations of the observations. However, to the best of our knowledge, the robustness of conditional generative models with respect to perturbations of the observations has not been investigated yet. In this paper, we prove for the first time that appropriately learned conditional generative models provide robust results for single observations.
- [55] arXiv:2309.16748 (replaced) [pdf, html, other]
-
Title: Discovering environments with XRMComments: Oral at ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Environment annotations are essential for the success of many out-of-distribution (OOD) generalization methods. Unfortunately, these are costly to obtain and often limited by human annotators' biases. To achieve robust generalization, it is essential to develop algorithms for automatic environment discovery within datasets. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods introduce hyper-parameters and early-stopping criteria, which require a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Algorithms built on top of XRM environments achieve oracle worst-group-accuracy, addressing a long-standing challenge in OOD generalization. Code available at \url{this https URL}.
- [56] arXiv:2311.04855 (replaced) [pdf, html, other]
-
Title: Algorithms for Non-Negative Matrix Factorization on Noisy Data With Negative ValuesComments: 12 pages, 8 figures. Submitted to IEEE Transactions on Signal Processing. Updated version after reviewer comments, expanding paper with a new section and a new appendix as well as more equations. Algorithm derivation flow was significantly altered to be more tractable. Further minor changes made to flow and to make one plot more color blind friendlySubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Signal Processing (eess.SP); Methodology (stat.ME)
Non-negative matrix factorization (NMF) is a dimensionality reduction technique that has shown promise for analyzing noisy data, especially astronomical data. For these datasets, the observed data may contain negative values due to noise even when the true underlying physical signal is strictly positive. Prior NMF work has not treated negative data in a statistically consistent manner, which becomes problematic for low signal-to-noise data with many negative values. In this paper we present two algorithms, Shift-NMF and Nearly-NMF, that can handle both the noisiness of the input data and also any introduced negativity. Both of these algorithms use the negative data space without clipping, and correctly recover non-negative signals without any introduced positive offset that occurs when clipping negative data. We demonstrate this numerically on both simple and more realistic examples, and prove that both algorithms have monotonically decreasing update rules.
- [57] arXiv:2401.04082 (replaced) [pdf, html, other]
-
Title: Improved motif-scaffolding with SE(3) flow matchingJason Yim, Andrew Campbell, Emile Mathieu, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Frank Noé, Regina Barzilay, Tommi S. JaakkolaComments: Preprint. Code: this https URL microsoft/frame-flowJournal-ref: Transactions on Machine Learning Research 2024Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Machine Learning (stat.ML)
Protein design often begins with the knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a range of motifs. However, generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow without additional training. On a benchmark of 24 biologically meaningful motifs, we show our method achieves 2.5 times more designable and unique motif-scaffolds compared to state-of-the-art. Code: this https URL
- [58] arXiv:2402.04298 (replaced) [pdf, html, other]
-
Title: Multi-View Symbolic RegressionEtienne Russeil, Fabrício Olivetti de França, Konstantin Malanchev, Bogdan Burlacu, Emille E. O. Ishida, Marion Leroux, Clément Michelin, Guillaume Moinard, Emmanuel GanglerComments: Accepted to GECCO-2024. 11 pages, 5 figuresSubjects: Machine Learning (cs.LG); Instrumentation and Methods for Astrophysics (astro-ph.IM); Applications (stat.AP)
Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behavior, recovering known expressions from the literature as well as promising alternatives, thus enabling the use of SR to a large range of experimental scenarios.
- [59] arXiv:2402.15776 (replaced) [pdf, other]
-
Title: Truly No-Regret Learning in Constrained MDPsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently known regret bounds allow for error cancellations -- one can compensate for a constraint violation in one round with a strict constraint satisfaction in another. This makes the online learning process unsafe since it only guarantees safety for the final (mixture) policy but not during learning. As Efroni et al. (2020) pointed out, it is an open question whether primal-dual algorithms can provably achieve sublinear regret if we do not allow error cancellations. In this paper, we give the first affirmative answer. We first generalize a result on last-iterate convergence of regularized primal-dual schemes to CMDPs with multiple constraints. Building upon this insight, we propose a model-based primal-dual algorithm to learn in an unknown CMDP. We prove that our algorithm achieves sublinear regret without error cancellations.
- [60] arXiv:2405.05439 (replaced) [pdf, html, other]
-
Title: How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance EvaluationComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)
With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by distribution shifts causing unpredictable changes in performance during deployment. To rigorously evaluate behavior cloning policies, we present a framework that provides a tight lower-bound on robot performance in an arbitrary environment, using a minimal number of experimental policy rollouts. Notably, by applying the standard stochastic ordering to robot performance distributions, we provide a worst-case bound on the entire distribution of performance (via bounds on the cumulative distribution function) for a given task. We build upon established statistical results to ensure that the bounds hold with a user-specified confidence level and tightness, and are constructed from as few policy rollouts as possible. In experiments we evaluate policies for visuomotor manipulation in both simulation and hardware. Specifically, we (i) empirically validate the guarantees of the bounds in simulated manipulation settings, (ii) find the degree to which a learned policy deployed on hardware generalizes to new real-world environments, and (iii) rigorously compare two policies tested in out-of-distribution settings. Our experimental data, code, and implementation of confidence bounds are open-source.
- [61] arXiv:2405.18518 (replaced) [pdf, html, other]
-
Title: Modeling Long Sequences in Bladder Cancer Recurrence: A Comparative Evaluation of LSTM,Transformer,and MambaSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Traditional survival analysis methods often struggle with complex time-dependent data,failing to capture and interpret dynamic characteristics adequately.This study aims to evaluate the performance of three long-sequence models,LSTM,Transformer,and Mamba,in analyzing recurrence event data and integrating them with the Cox proportional hazards model.This study integrates the advantages of deep learning models for handling long-sequence data with the Cox proportional hazards model to enhance the performance in analyzing recurrent events with dynamic time information.Additionally,this study compares the ability of different models to extract and utilize features from time-dependent clinical recurrence data.The LSTM-Cox model outperformed both the Transformer-Cox and Mamba-Cox models in prediction accuracy and model fit,achieving a Concordance index of up to 0.90 on the test set.Significant predictors of bladder cancer recurrence,such as treatment stop time,maximum tumor size at recurrence and recurrence frequency,were identified.The LSTM-Cox model aligned well with clinical outcomes,effectively distinguishing between high-risk and low-risk patient groups.This study demonstrates that the LSTM-Cox model is a robust and efficient method for recurrent data analysis and feature extraction,surpassing newer models like Transformer and this http URL offers a practical approach for integrating deep learning technologies into clinical risk prediction systems,thereby improving patient management and treatment outcomes.
- [62] arXiv:2405.20405 (replaced) [pdf, html, other]
-
Title: Private Mean Estimation with Person-Level Differential PrivacyComments: 72 pages, 3 figuresSubjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when $\textit{all}$ of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d}{\alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $\alpha$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.
- [63] arXiv:2406.04317 (replaced) [pdf, html, other]
-
Title: Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.
- [64] arXiv:2407.09375 (replaced) [pdf, other]
-
Title: HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in ContextComments: ICML 2024, Next Generation Sequence Modeling Architectures WorkshopSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This work explores the in-context learning capabilities of State Space Models (SSMs) and presents, to the best of our knowledge, the first theoretical explanation of a possible underlying mechanism. We introduce a novel weight construction for SSMs, enabling them to predict the next state of any dynamical system after observing previous states without parameter fine-tuning. This is accomplished by extending the HiPPO framework to demonstrate that continuous SSMs can approximate the derivative of any input signal. Specifically, we find an explicit weight construction for continuous SSMs and provide an asymptotic error bound on the derivative approximation. The discretization of this continuous SSM subsequently yields a discrete SSM that predicts the next state. Finally, we demonstrate the effectiveness of our parameterization empirically. This work should be an initial step toward understanding how sequence models based on SSMs learn in context.