Data Analysis, Statistics and Probability
- [1] arXiv:2405.09571 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: The Best Radar Ranging Pulse to Resolve Two ReflectorsComments: 8 pages, 8 figuresSubjects: Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.data-an); Optics (physics.optics); Quantum Physics (quant-ph)
Previous work established fundamental bounds on subwavelength resolution for the radar range resolution problem, called superradar [Phys. Rev. Appl. 20, 064046 (2023)]. In this work, we identify the optimal waveforms for distinguishing the range resolution between two reflectors of identical strength. We discuss both the unnormalized optimal waveform as well as the best square-integrable pulse, and their variants. Using orthogonal function theory, we give an explicit algorithm to optimize the wave pulse in finite time to have the best performance. We also explore range resolution estimation with unnormalized waveforms with multi-parameter methods to also independently estimate loss and time of arrival. These results are consistent with the earlier single parameter approach of range resolution only and give deeper insight into the ranging estimation problem. Experimental results are presented using radio pulse reflections inside coaxial cables, showing robust range resolution smaller than a tenth of the inverse bandedge, with uncertainties close to the derived Cramér-Rao bound.
- [2] arXiv:2405.09579 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Scalable Sparse Regression for Model Discovery: The Fast Lane to InsightComments: Scripts to reproduce all figures are located at this https URL Standalone sparse regression scripts can be found at this https URLSubjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
There exist endless examples of dynamical systems with vast available data and unsatisfying mathematical descriptions. Sparse regression applied to symbolic libraries has quickly emerged as a powerful tool for learning governing equations directly from data; these learned equations balance quantitative accuracy with qualitative simplicity and human interpretability. Here, I present a general purpose, model agnostic sparse regression algorithm that extends a recently proposed exhaustive search leveraging iterative Singular Value Decompositions (SVD). This accelerated scheme, Scalable Pruning for Rapid Identification of Null vecTors (SPRINT), uses bisection with analytic bounds to quickly identify optimal rank-1 modifications to null vectors. It is intended to maintain sensitivity to small coefficients and be of reasonable computational cost for large symbolic libraries. A calculation that would take the age of the universe with an exhaustive search but can be achieved in a day with SPRINT.
- [3] arXiv:2405.09622 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Holevo Cram\'er-Rao bound: How close can we get without entangling measurements?Comments: 22 pages, 9 figures, 10 appendices; presented at AIP Summer Meeting 2023Subjects: Quantum Physics (quant-ph); Mathematical Physics (math-ph); Data Analysis, Statistics and Probability (physics.data-an)
In multi-parameter quantum metrology, the resource of entanglement can lead to an increase in efficiency of the estimation process. Entanglement can be used in the state preparation stage, or the measurement stage, or both, to harness this advantage; here we focus on the role of entangling measurements. Specifically, entangling or collective measurements over multiple identical copies of a probe state are known to be superior to measuring each probe individually, but the extent of this improvement is an open problem. It is also known that such entangling measurements, though resource-intensive, are required to attain the ultimate limits in multi-parameter quantum metrology and quantum information processing tasks. In this work we investigate the maximum precision improvement that collective quantum measurements can offer over individual measurements for estimating parameters of qudit states, calling this the 'collective quantum enhancement'. We show that, whereas the maximum enhancement can, in principle, be a factor of $n$ for estimating $n$ parameters, this bound is not tight for large $n$. Instead, our results prove an enhancement linear in dimension of the qudit is possible using collective measurements and lead us to conjecture that this is the maximum collective quantum enhancement in any local estimation scenario.
- [4] arXiv:2405.09817 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Active Learning with Fully Bayesian Neural Networks for Discontinuous and Nonstationary DataSubjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Active learning optimizes the exploration of large parameter spaces by strategically selecting which experiments or simulations to conduct, thus reducing resource consumption and potentially accelerating scientific discovery. A key component of this approach is a probabilistic surrogate model, typically a Gaussian Process (GP), which approximates an unknown functional relationship between control parameters and a target property. However, conventional GPs often struggle when applied to systems with discontinuities and non-stationarities, prompting the exploration of alternative models. This limitation becomes particularly relevant in physical science problems, which are often characterized by abrupt transitions between different system states and rapid changes in physical property behavior. Fully Bayesian Neural Networks (FBNNs) serve as a promising substitute, treating all neural network weights probabilistically and leveraging advanced Markov Chain Monte Carlo techniques for direct sampling from the posterior distribution. This approach enables FBNNs to provide reliable predictive distributions, crucial for making informed decisions under uncertainty in the active learning setting. Although traditionally considered too computationally expensive for 'big data' applications, many physical sciences problems involve small amounts of data in relatively low-dimensional parameter spaces. Here, we assess the suitability and performance of FBNNs with the No-U-Turn Sampler for active learning tasks in the 'small data' regime, highlighting their potential to enhance predictive accuracy and reliability on test functions relevant to problems in physical sciences.
- [5] arXiv:2405.10235 (cross-list from cs.DB) [pdf, ps, other]
-
Title: Novel Data Models for Inter-operable LCA FrameworksSubjects: Databases (cs.DB); Data Analysis, Statistics and Probability (physics.data-an)
Life cycle assessment (LCA) plays a critical role in assessing the environmental impacts of a product, technology, or service throughout its entire life cycle. Nonetheless, many existing LCA tools and methods lack adequate metadata management, which can hinder their further development and wide adoption. In the example of LCA for clean energy technologies, metadata helps monitor data and the environment that holds the integrity of the energy assets and sustainability of the materials sources across their entire value chains. Ontologizing metadata, i.e. a common vocabulary and language to connect multiple data sources, as well as implementing AI-aware data management, can have long-lasting, positive, and accelerating effects along with collecting and utilizing quality data from different sources and across the entire data lifecycle. The integration of ontologies in life cycle assessments has garnered significant attention in recent years. We synthesized the existing literature on ontologies for LCAs, providing insights into this interdisciplinary field's evolution, current state, and future directions. We also proposed the framework for a suitable data model and the workflow thereof to warrant the alignment with existing ontologies, practical frameworks, and industry standards.
Cross submissions for Friday, 17 May 2024 (showing 5 of 5 entries )
- [6] arXiv:2210.14245 (replaced) [pdf, ps, html, other]
-
Title: CaloFlow for CaloChallenge Dataset 1Comments: 36 pages, 21 figures, v3: match published versionJournal-ref: SciPost Phys. 16, 126 (2024)Subjects: Instrumentation and Detectors (physics.ins-det); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); High Energy Physics - Phenomenology (hep-ph); Data Analysis, Statistics and Probability (physics.data-an)
CaloFlow is a new and promising approach to fast calorimeter simulation based on normalizing flows. Applying CaloFlow to the photon and charged pion Geant4 showers of Dataset 1 of the Fast Calorimeter Simulation Challenge 2022, we show how it can produce high-fidelity samples with a sampling time that is several orders of magnitude faster than Geant4. We demonstrate the fidelity of the samples using calorimeter shower images, histograms of high-level features, and aggregate metrics such as a classifier trained to distinguish CaloFlow from Geant4 samples.
- [7] arXiv:2308.11700 (replaced) [pdf, ps, html, other]
-
Title: Calorimeter shower superresolutionComments: 16 pages, 13 figures, v3: title changed, matches published versionJournal-ref: Phys. Rev. D 109, 092009 (2024)Subjects: Instrumentation and Detectors (physics.ins-det); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); High Energy Physics - Phenomenology (hep-ph); Data Analysis, Statistics and Probability (physics.data-an)
Calorimeter shower simulation is a major bottleneck in the Large Hadron Collider computational pipeline. There have been recent efforts to employ deep-generative surrogate models to overcome this challenge. However, many of best performing models have training and generation times that do not scale well to high-dimensional calorimeter showers. In this work, we introduce SuperCalo, a flow-based superresolution model, and demonstrate that high-dimensional fine-grained calorimeter showers can be quickly upsampled from coarse-grained showers. This novel approach presents a way to reduce computational cost, memory requirements and generation time associated with fast calorimeter simulation models. Additionally, we show that the showers upsampled by SuperCalo possess a high degree of variation. This allows a large number of high-dimensional calorimeter showers to be upsampled from much fewer coarse showers with high-fidelity, which results in additional reduction in generation time.