Data Analysis, Statistics and Probability
See recent articles
Showing new listings for Wednesday, 12 March 2025
- [1] arXiv:2503.07647 (cross-list from cs.LG) [pdf, html, other]
-
Title: On the Importance of Clearsky Model in Short-Term Solar Radiation ForecastingCyril Voyant, Milan Despotovic, Gilles Notton, Yves-Marie Saint-Drenan, Mohammed Asloune, Luis Garcia-GutierrezComments: 20 pages, 10 Figures and 1 TableSubjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Data Analysis, Statistics and Probability (physics.data-an)
Clearsky models are widely used in solar energy for many applications such as quality control, resource assessment, satellite-base irradiance estimation and forecasting. However, their use in forecasting and nowcasting is associated with a number of challenges. Synchronization errors, reliance on the Clearsky index (ratio of the global horizontal irradiance to its cloud-free counterpart) and high sensitivity of the clearsky model to errors in aerosol optical depth at low solar elevation limit their added value in real-time applications. This paper explores the feasibility of short-term forecasting without relying on a clearsky model. We propose a Clearsky-Free forecasting approach using Extreme Learning Machine (ELM) models. ELM learns daily periodicity and local variability directly from raw Global Horizontal Irradiance (GHI) data. It eliminates the need for Clearsky normalization, simplifying the forecasting process and improving scalability. Our approach is a non-linear adaptative statistical method that implicitely learns the irradiance in cloud-free conditions removing the need for an clear-sky model and the related operational issues. Deterministic and probabilistic results are compared to traditional benchmarks, including ARMA with McClear-generated Clearsky data and quantile regression for probabilistic forecasts. ELM matches or outperforms these methods, providing accurate predictions and robust uncertainty quantification. This approach offers a simple, efficient solution for real-time solar forecasting. By overcoming the stationarization process limitations based on usual multiplicative scheme Clearsky models, it provides a flexible and reliable framework for modern energy systems.
- [2] arXiv:2503.07736 (cross-list from stat.ML) [pdf, html, other]
-
Title: Uncertainty quantification and posterior sampling for network reconstructionComments: 16 pages, 12 figures. Code available in this https URLSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph)
Network reconstruction is the task of inferring the unseen interactions between elements of a system, based only on their behavior or dynamics. This inverse problem is in general ill-posed, and admits many solutions for the same observation. Nevertheless, the vast majority of statistical methods proposed for this task -- formulated as the inference of a graphical generative model -- can only produce a ``point estimate,'' i.e. a single network considered the most likely. In general, this can give only a limited characterization of the reconstruction, since uncertainties and competing answers cannot be conveyed, even if their probabilities are comparable, while being structurally different. In this work we present an efficient MCMC algorithm for sampling from posterior distributions of reconstructed networks, which is able to reveal the full population of answers for a given reconstruction problem, weighted according to their plausibilities. Our algorithm is general, since it does not rely on specific properties of particular generative models, and is specially suited for the inference of large and sparse networks, since in this case an iteration can be performed in time $O(N\log^2 N)$ for a network of $N$ nodes, instead of $O(N^2)$, as would be the case for a more naive approach. We demonstrate the suitability of our method in providing uncertainties and consensus of solutions (which provably increases the reconstruction accuracy) in a variety of synthetic and empirical cases.
Cross submissions (showing 2 of 2 entries)
- [3] arXiv:2407.04465 (replaced) [pdf, html, other]
-
Title: Learning Patterns from Biological Networks: A Compounded Burr Probability ModelSubjects: Applications (stat.AP); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an)
Complex biological networks, encompassing metabolic reactions, gene interactions, and protein-protein interactions, often exhibit scale-free characteristics with power-law degree distributions. However, empirical evidence reveals significant deviations from ideal power-law fits, necessitating more flexible and accurate modeling approaches. To address this challenge, we introduce a novel Compounded Burr (CBurr) distribution, a novel probability model derived from the Burr family, designed to capture the intricate structural properties of biological networks. We rigorously establish its statistical properties, including moment analysis, hazard functions, and tail behavior, and provide a robust parameter estimation framework using the maximum likelihood method. The CBurr distribution is broadly applicable to networks with fat-tailed degree distributions, making it highly relevant for modeling biological, social, and technological networks. To validate its efficacy, we conduct an extensive empirical study on large-scale biological network datasets, demonstrating that CBurr consistently outperforms conventional power-law and alternative heavy-tailed models in fitting the entire range of node degree distributions. Our proposed CBurr probability distribution holds great promise for accurately capturing the complex nature of biological networks and advancing our understanding of their underlying mechanisms.
- [4] arXiv:2409.17915 (replaced) [pdf, html, other]
-
Title: N-dimensional maximum-entropy tomography via particle samplingComments: 6 pages, 2 figuresSubjects: Accelerator Physics (physics.acc-ph); Data Analysis, Statistics and Probability (physics.data-an)
We propose a modified maximum-entropy (MENT) algorithm for six-dimensional phase space tomography. The algorithm uses particle sampling and low-dimensional density estimation to approximate large sets of high-dimensional integrals in the original MENT formulation. We implement this approach using Markov Chain Monte Carlo (MCMC) sampling techniques and demonstrate convergence of six-dimensional MENT on both synthetic and measured data.
- [5] arXiv:2503.02460 (replaced) [pdf, html, other]
-
Title: Quantum measurement fittingSubjects: Quantum Physics (quant-ph); Data Analysis, Statistics and Probability (physics.data-an)
Quantum measurements are not deterministic. For this reason quantum measurements are repeated for a number of shots on identically prepared systems. The uncertainty in each measurement depends on the number of shots and the expected outcome of the measurement. This information can be used to improve the fitting of models to quantum measurements.
In this paper we analyse ordinary-least squares, weighted least squares and maximum-likelihood estimation. We show that using the information on the quantum measurement uncertainty can lead to improved estimation of system parameters. We also introduce the concept of model violation and demonstrate it can be a valuable tool to analyze model assumptions and performance of quantum systems.