Machine Learning
See recent articles
- [1] arXiv:2407.06321 [pdf, html, other]
-
Title: Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli RewardsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We consider Kernelized Bandits (KBs) to optimize a function $f : \mathcal{X} \rightarrow [0,1]$ belonging to the Reproducing Kernel Hilbert Space (RKHS) $\mathcal{H}_k$. Mainstream works on kernelized bandits focus on a subgaussian noise model in which observations of the form $f(\mathbf{x}_t)+\epsilon_t$, being $\epsilon_t$ a subgaussian noise, are available (Chowdhury and Gopalan, 2017). Differently, we focus on the case in which we observe realizations $y_t \sim \text{Ber}(f(\mathbf{x}_t))$ sampled from a Bernoulli distribution with parameter $f(\mathbf{x}_t)$. While the Bernoulli model has been investigated successfully in multi-armed bandits (Garivier and Cappé, 2011), logistic bandits (Faury et al., 2022), bandits in metric spaces (Magureanu et al., 2014), it remains an open question whether tight results can be obtained for KBs. This paper aims to draw the attention of the online learning community to this open problem.
- [2] arXiv:2407.06390 [pdf, html, other]
-
Title: JANET: Joint Adaptive predictioN-region Estimation for Time-seriesComments: Alternate Title: Conformalised Joint Prediction Region for Time SeriesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches struggle to handle multi-step ahead prediction tasks, where uncertainty estimates across multiple future time points are crucial. We propose JANET (Joint Adaptive predictioN-region Estimation for Time-series), a novel framework for constructing conformal prediction regions that are valid for both univariate and multivariate time series. JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlled K-familywise error rates, enabling flexible adaptation to specific application needs. Our empirical evaluation demonstrates JANET's superior performance in multi-step prediction tasks across diverse time series datasets, highlighting its potential for reliable and interpretable uncertainty quantification in sequential data.
New submissions for Wednesday, 10 July 2024 (showing 2 of 2 entries )
- [3] arXiv:2407.06212 (cross-list from cs.LG) [pdf, other]
-
Title: Bias Correction in Machine Learning-based Classification of Rare EventsComments: 2 pages, 1 figure, 1 tableSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Online platform businesses can be identified by using web-scraped texts. This is a classification problem that combines elements of natural language processing and rare event detection. Because online platforms are rare, accurately identifying them with Machine Learning algorithms is challenging. Here, we describe the development of a Machine Learning-based text classification approach that reduces the number of false positives as much as possible. It greatly reduces the bias in the estimates obtained by using calibrated probabilities and ensembles.
- [4] arXiv:2407.06346 (cross-list from cs.LG) [pdf, html, other]
-
Title: High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global UpdatesComments: KDD 2024, Research TrackSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer increasingly from communication costs as the data size or the number of iterations grows. Recent work on linear models has shown that a surrogate likelihood can be optimized locally to iteratively improve on an initial solution in a communication-efficient manner. However, existing versions of these methods experience multiple shortcomings as the data size becomes massive, including diverging updates and efficiently handling sparsity. In this work we develop solutions to these problems which enable us to learn a communication-efficient distributed logistic regression model even beyond millions of features. In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed, and similar or faster runtimes. Our code is available at \url{this https URL}.
- [5] arXiv:2407.06635 (cross-list from cs.CV) [pdf, html, other]
-
Title: Ensembled Cold-Diffusion Restorations for Unsupervised Anomaly DetectionSergio Naval Marimont, Vasilis Siomos, Matthew Baugh, Christos Tzelepis, Bernhard Kainz, Giacomo TarroniComments: 8 pages, 3 figures. MICCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Unsupervised Anomaly Detection (UAD) methods aim to identify anomalies in test samples comparing them with a normative distribution learned from a dataset known to be anomaly-free. Approaches based on generative models offer interpretability by generating anomaly-free versions of test images, but are typically unable to identify subtle anomalies. Alternatively, approaches using feature modelling or self-supervised methods, such as the ones relying on synthetically generated anomalies, do not provide out-of-the-box interpretability. In this work, we present a novel method that combines the strengths of both strategies: a generative cold-diffusion pipeline (i.e., a diffusion-like pipeline which uses corruptions not based on noise) that is trained with the objective of turning synthetically-corrupted images back to their normal, original appearance. To support our pipeline we introduce a novel synthetic anomaly generation procedure, called DAG, and a novel anomaly score which ensembles restorations conditioned with different degrees of abnormality. Our method surpasses the prior state-of-the art for unsupervised anomaly detection in three different Brain MRI datasets.
- [6] arXiv:2407.06646 (cross-list from cs.LG) [pdf, html, other]
-
Title: Variational Learning ISTASubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Compressed sensing combines the power of convex optimization techniques with a sparsity-inducing prior on the signal space to solve an underdetermined system of equations. For many problems, the sparsifying dictionary is not directly given, nor its existence can be assumed. Besides, the sensing matrix can change across different scenarios. Addressing these issues requires solving a sparse representation learning problem, namely dictionary learning, taking into account the epistemic uncertainty of the learned dictionaries and, finally, jointly learning sparse representations and reconstructions under varying sensing matrix conditions. We address both concerns by proposing a variant of the LISTA architecture. First, we introduce Augmented Dictionary Learning ISTA (A-DLISTA), which incorporates an augmentation module to adapt parameters to the current measurement setup. Then, we propose to learn a distribution over dictionaries via a variational approach, dubbed Variational Learning ISTA (VLISTA). VLISTA exploits A-DLISTA as the likelihood model and approximates a posterior distribution over the dictionaries as part of an unfolded LISTA-based recovery algorithm. As a result, VLISTA provides a probabilistic way to jointly learn the dictionary distribution and the reconstruction algorithm with varying sensing matrices. We provide theoretical and experimental support for our architecture and show that our model learns calibrated uncertainties.
- [7] arXiv:2407.06765 (cross-list from cs.LG) [pdf, html, other]
-
Title: A Generalization Bound for Nearly-Linear NetworksComments: 22 pages, 9 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We consider nonlinear networks as perturbations of linear ones. Based on this approach, we present novel generalization bounds that become non-vacuous for networks that are close to being linear. The main advantage over the previous works which propose non-vacuous generalization bounds is that our bounds are a-priori: performing the actual training is not required for evaluating the bounds. To the best of our knowledge, they are the first non-vacuous generalization bounds for neural nets possessing this property.
- [8] arXiv:2407.06797 (cross-list from cs.LG) [pdf, html, other]
-
Title: ED-VAE: Entropy Decomposition of ELBO in Variational AutoencodersSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Traditional Variational Autoencoders (VAEs) are constrained by the limitations of the Evidence Lower Bound (ELBO) formulation, particularly when utilizing simplistic, non-analytic, or unknown prior distributions. These limitations inhibit the VAE's ability to generate high-quality samples and provide clear, interpretable latent representations. This work introduces the Entropy Decomposed Variational Autoencoder (ED-VAE), a novel re-formulation of the ELBO that explicitly includes entropy and cross-entropy components. This reformulation significantly enhances model flexibility, allowing for the integration of complex and non-standard priors. By providing more detailed control over the encoding and regularization of latent spaces, ED-VAE not only improves interpretability but also effectively captures the complex interactions between latent variables and observed data, thus leading to better generative performance.
- [9] arXiv:2407.06867 (cross-list from stat.ME) [pdf, html, other]
-
Title: Distributionally robust risk evaluation with an isotonic constraintSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
Statistical learning under distribution shift is challenging when neither prior knowledge nor fully accessible data from the target distribution is available. Distributionally robust learning (DRL) aims to control the worst-case statistical performance within an uncertainty set of candidate distributions, but how to properly specify the set remains challenging. To enable distributional robustness without being overly conservative, in this paper, we propose a shape-constrained approach to DRL, which incorporates prior information about the way in which the unknown target distribution differs from its estimate. More specifically, we assume the unknown density ratio between the target distribution and its estimate is isotonic with respect to some partial order. At the population level, we provide a solution to the shape-constrained optimization problem that does not involve the isotonic constraint. At the sample level, we provide consistency results for an empirical estimator of the target in a range of different settings. Empirical studies on both synthetic and real data examples demonstrate the improved accuracy of the proposed shape-constrained approach.
- [10] arXiv:2407.06935 (cross-list from cs.LG) [pdf, html, other]
-
Title: Bayesian Federated Learning with Hamiltonian Monte Carlo: Algorithm and TheorySubjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification. We establish rigorous convergence guarantees of FA-HMC on non-iid distributed data sets, under the strong convexity and Hessian smoothness assumptions. Our analysis investigates the effects of parameter space dimension, noise on gradients and momentum, and the frequency of communication (between the central node and local nodes) on the convergence and communication costs of FA-HMC. Beyond that, we establish the tightness of our analysis by showing that the convergence rate cannot be improved even for continuous FA-HMC process. Moreover, extensive empirical studies demonstrate that FA-HMC outperforms the existing Federated Averaging-Langevin Monte Carlo (FA-LD) algorithm.
- [11] arXiv:2407.06945 (cross-list from stat.CO) [pdf, html, other]
-
Title: Adaptively Robust and Sparse K-means ClusteringSubjects: Computation (stat.CO); Machine Learning (stat.ML)
While K-means is known to be a standard clustering algorithm, it may be compromised due to the presence of outliers and high-dimensional noisy variables. This paper proposes adaptively robust and sparse K-means clustering (ARSK) to address these practical limitations of the standard K-means algorithm. We introduce a redundant error component for each observation for robustness, and this additional parameter is penalized using a group sparse penalty. To accommodate the impact of high-dimensional noisy variables, the objective function is modified by incorporating weights and implementing a penalty to control the sparsity of the weight vector. The tuning parameters to control the robustness and sparsity are selected by Gap statistics. Through simulation experiments and real data analysis, we demonstrate the superiority of the proposed method to existing algorithms in identifying clusters without outliers and informative variables simultaneously.
Cross submissions for Wednesday, 10 July 2024 (showing 9 of 9 entries )
- [12] arXiv:2001.03798 (replaced) [pdf, html, other]
-
Title: Bayesian Semi-supervised learning under nonparanormalitySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
Semi-supervised learning is a model training method that uses both labeled and unlabeled data. This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any binary classification problem. We assume the labels are missing at random when using unlabeled data in a semi-supervised setting. We assume that the observations follow two multivariate normal distributions depending on their true class labels after some common unknown transformation is applied to each component of the observation vector. The function is expanded in a B-splines series and a prior is put on the coefficients. We consider a normal prior on the coefficients and constrain the values to meet the requirement for normality and identifiability constraints. The precision matrices of the two Gaussian distributions have a conjugate Wishart prior, while the means have improper uniform priors. The resulting posterior is still conditionally conjugate, and the Gibbs sampler aided by a data augmentation technique can thus be adopted. An extensive simulation study compares the proposed method with several other available methods. The proposed method is also applied to real datasets on diagnosing breast cancer and classification of signals. We conclude that the proposed method has a better prediction accuracy in various cases.
- [13] arXiv:2311.11475 (replaced) [pdf, html, other]
-
Title: Gaussian Interpolation FlowsComments: 51 pages, 6 figures, 1 tableSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Gaussian denoising has emerged as a powerful method for constructing simulation-free continuous normalizing flows for generative modeling. Despite their empirical successes, theoretical properties of these flows and the regularizing effect of Gaussian denoising have remained largely unexplored. In this work, we aim to address this gap by investigating the well-posedness of simulation-free continuous normalizing flows built on Gaussian denoising. Through a unified framework termed Gaussian interpolation flow, we establish the Lipschitz regularity of the flow velocity field, the existence and uniqueness of the flow, and the Lipschitz continuity of the flow map and the time-reversed flow map for several rich classes of target distributions. This analysis also sheds light on the auto-encoding and cycle consistency properties of Gaussian interpolation flows. Additionally, we study the stability of these flows in source distributions and perturbations of the velocity field, using the quadratic Wasserstein distance as a metric. Our findings offer valuable insights into the learning techniques employed in Gaussian interpolation flows for generative modeling, providing a solid theoretical foundation for end-to-end error analyses of learning Gaussian interpolation flows with empirical observations.
- [14] arXiv:2303.02444 (replaced) [pdf, html, other]
-
Title: Calibrating Transformers via Sparse Gaussian ProcessesComments: Published at The Eleventh International Conference on Learning Representations (ICLR 2023). This latest Arxiv version includes a clarification of how ECE/MCE are computed (at page 10)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Transformer models have achieved profound success in prediction tasks in a wide range of applications in natural language processing, speech recognition and computer vision. Extending Transformer's success to safety-critical domains requires calibrated uncertainty estimation which remains under-explored. To address this, we propose Sparse Gaussian Process attention (SGPA), which performs Bayesian inference directly in the output space of multi-head attention blocks (MHAs) in transformer to calibrate its uncertainty. It replaces the scaled dot-product operation with a valid symmetric kernel and uses sparse Gaussian processes (SGP) techniques to approximate the posterior processes of MHA outputs. Empirically, on a suite of prediction tasks on text, images and graphs, SGPA-based Transformers achieve competitive predictive accuracy, while noticeably improving both in-distribution calibration and out-of-distribution robustness and detection.
- [15] arXiv:2305.11567 (replaced) [pdf, html, other]
-
Title: TSGM: A Flexible Framework for Generative Modeling of Synthetic Time SeriesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Temporally indexed data are essential in a wide range of fields and of interest to machine learning researchers. Time series data, however, are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations and the application of existing and new data-intensive ML methods. A possible solution to this bottleneck is to generate synthetic data. In this work, we introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series. TSGM includes a broad repertoire of machine learning methods: generative models, probabilistic, and simulator-based approaches. The framework enables users to evaluate the quality of the produced data from different angles: similarity, downstream effectiveness, predictive consistency, diversity, and privacy. The framework is extensible, which allows researchers to rapidly implement their own methods and compare them in a shareable environment. TSGM was tested on open datasets and in production and proved to be beneficial in both cases. Additionally to the library, the project allows users to employ command line interfaces for synthetic data generation which lowers the entry threshold for those without a programming background.
- [16] arXiv:2310.12285 (replaced) [pdf, html, other]
-
Title: Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithmSubjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
High-dimensional longitudinal data is increasingly used in a wide range of scientific studies. To properly account for dependence between longitudinal observations, statistical methods for high-dimensional linear mixed models (LMMs) have been developed. However, few packages implementing these high-dimensional LMMs are available in the statistical software R. Additionally, some packages suffer from scalability issues. This work presents an efficient and accurate Bayesian framework for high-dimensional LMMs. We use empirical Bayes estimators of hyperparameters for increased flexibility and an Expectation-Conditional-Minimization (ECM) algorithm for computationally efficient maximum a posteriori probability (MAP) estimation of parameters. The novelty of the approach lies in its partitioning and parameter expansion as well as its fast and scalable computation. We illustrate Linear Mixed Modeling with PaRtitiOned empirical Bayes ECM (LMM-PROBE) in simulation studies evaluating fixed and random effects estimation along with computation time. A real-world example is provided using data from a study of lupus in children, where we identify genes and clinical factors associated with a new lupus biomarker and predict the biomarker over time. Supplementary materials are available online.
- [17] arXiv:2405.00914 (replaced) [pdf, html, other]
-
Title: Accelerated Fully First-Order Methods for Bilevel and Minimax OptimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
We present in this paper novel accelerated fully first-order methods in \emph{Bilevel Optimization} (BLO). Firstly, for BLO under the assumption that the lower-level functions admit the typical strong convexity assumption, the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation} (\texttt{PRAF${}^2$BA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAF${}^2$BA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.
- [18] arXiv:2405.13712 (replaced) [pdf, html, other]
-
Title: Learning Diffusion Priors from Observations by Expectation MaximizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Diffusion models recently proved to be remarkable priors for Bayesian inverse problems. However, training these models typically requires access to large amounts of clean data, which could prove difficult in some settings. In this work, we present a novel method based on the expectation-maximization algorithm for training diffusion models from incomplete and noisy observations only. Unlike previous works, our method leads to proper diffusion models, which is crucial for downstream tasks. As part of our method, we propose and motivate a new posterior sampling scheme for unconditional diffusion models. We present empirical evidence supporting the effectiveness of our method.
- [19] arXiv:2405.14806 (replaced) [pdf, html, other]
-
Title: Lorentz-Equivariant Geometric Algebra Transformers for High-Energy PhysicsComments: 10+12 pages, 5+2 figures, 2 tables, v2: Extend acknowledgements, add link to github repoSubjects: Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (cs.LG); High Energy Physics - Phenomenology (hep-ph); Machine Learning (stat.ML)
Extracting scientific understanding from particle-physics experiments requires solving diverse learning problems with high precision and good data efficiency. We propose the Lorentz Geometric Algebra Transformer (L-GATr), a new multi-purpose architecture for high-energy physics. L-GATr represents high-energy data in a geometric algebra over four-dimensional space-time and is equivariant under Lorentz transformations, the symmetry group of relativistic kinematics. At the same time, the architecture is a Transformer, which makes it versatile and scalable to large systems. L-GATr is first demonstrated on regression and classification tasks from particle physics. We then construct the first Lorentz-equivariant generative model: a continuous normalizing flow based on an L-GATr network, trained with Riemannian flow matching. Across our experiments, L-GATr is on par with or outperforms strong domain-specific baselines.