Electrical Engineering and Systems Science
See recent articles
Showing new listings for Friday, 4 April 2025
- [1] arXiv:2504.01983 [pdf, html, other]
-
Title: Impedance and Stability Targeted Adaptation for Aerial Manipulator with Unknown Coupling DynamicsAmitabh Sharma, Saksham Gupta, Shivansh Pratap Singh, Rishabh Dev Yadav, Hongyu Song, Wei Pan, Spandan Roy, Simone BaldiComments: Submitted to International Conference on Intelligent Robots and Systems (IROS) 2025. 7 Pages, 9 FiguresSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
Stable aerial manipulation during dynamic tasks such as object catching, perching, or contact with rigid surfaces necessarily requires compliant behavior, which is often achieved via impedance control. Successful manipulation depends on how effectively the impedance control can tackle the unavoidable coupling forces between the aerial vehicle and the manipulator. However, the existing impedance controllers for aerial manipulator either ignore these coupling forces (in partitioned system compliance methods) or require their precise knowledge (in complete system compliance methods). Unfortunately, such forces are very difficult to model, if at all possible. To solve this long-standing control challenge, we introduce an impedance controller for aerial manipulator which does not rely on a priori knowledge of the system dynamics and of the coupling forces. The impedance control design can address unknown coupling forces, along with system parametric uncertainties, via suitably designed adaptive laws. The closed-loop system stability is proved analytically and experimental results with a payload-catching scenario demonstrate significant improvements in overall stability and tracking over the state-of-the-art impedance controllers using either partitioned or complete system compliance.
- [2] arXiv:2504.02005 [pdf, html, other]
-
Title: System Identification and Adaptive Input Estimation on the Jaiabot Micro Autonomous Underwater VehicleComments: 9 pages, 8 figuresSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
This paper reports an attempt to model the system dynamics and estimate both the unknown internal control input and the state of a recently developed marine autonomous vehicle, the Jaiabot. Although the Jaiabot has shown promise in many applications, process and sensor noise necessitates state estimation and noise filtering. In this work, we present the first surge and heading linear dynamical model for Jaiabots derived from real data collected during field testing. An adaptive input estimation algorithm is implemented to accurately estimate the control input and hence the state. For validation, this approach is compared to the classical Kalman filter, highlighting its advantages in handling unknown control inputs.
- [3] arXiv:2504.02007 [pdf, other]
-
Title: OccludeNeRF: Geometric-aware 3D Scene Inpainting with Collaborative Score Distillation in NeRFComments: CVPR 2025 CV4MetaverseSubjects: Image and Video Processing (eess.IV)
With Neural Radiance Fields (NeRFs) arising as a powerful 3D representation, research has investigated its various downstream tasks, including inpainting NeRFs with 2D images. Despite successful efforts addressing the view consistency and geometry quality, prior methods yet suffer from occlusion in NeRF inpainting tasks, where 2D prior is severely limited in forming a faithful reconstruction of the scene to inpaint.
To address this, we propose a novel approach that enables cross-view information sharing during knowledge distillation from a diffusion model, effectively propagating occluded information across limited views. Additionally, to align the distillation direction across multiple sampled views, we apply a grid-based denoising strategy and incorporate additional rendered views to enhance cross-view consistency. To assess our approach's capability of handling occlusion cases, we construct a dataset consisting of challenging scenes with severe occlusion, in addition to existing datasets. Compared with baseline methods, our method demonstrates better performance in cross-view consistency and faithfulness in reconstruction, while preserving high rendering quality and fidelity. - [4] arXiv:2504.02057 [pdf, html, other]
-
Title: Path planning with moving obstacles using stochastic optimal controlSeyyed Reza Jafari (1), Anders Hansson (1), Bo Wahlberg (2) ((1) Linköping University, Linköping, Sweden, (2) KTH Royal Institute of Technology, Stockholm, Sweden)Comments: 6 pages, 6 figures. Submitted to the 64th IEEE Conference on Decision and Control (CDC) 2025Subjects: Systems and Control (eess.SY)
Navigating a collision-free, optimal path for a robot poses a perpetual challenge, particularly in the presence of moving objects such as humans. In this study, we formulate the problem of finding an optimal path as a stochastic optimal control problem. However, obtaining a solution to this problem is nontrivial. Therefore, we consider a simplified problem, which is more tractable. For this simplified formulation, we are able to solve the corresponding Bellman equation. However, the solution obtained from the simplified problem does not sufficiently address the original problem of interest. To address the full problem, we propose a numerical procedure where we solve an optimization problem at each sampling instant. The solution to the simplified problem is integrated into the online formulation as a final-state penalty. We illustrate the efficiency of the proposed method using a numerical example.
- [5] arXiv:2504.02088 [pdf, html, other]
-
Title: Distributed Resource Allocation for Human-Autonomy Teaming under Coupled ConstraintsComments: 8 pages, 4 figures. Submitted to the 2025 IEEE Conference on Decision and Control (CDC)Subjects: Systems and Control (eess.SY)
This paper studies the optimal resource allocation problem within a multi-agent network composed of both autonomous agents and humans. The main challenge lies in the globally coupled constraints that link the decisions of autonomous agents with those of humans. To address this, we propose a reformulation that transforms these coupled constraints into decoupled local constraints defined over the system's communication graph. Building on this reformulation and incorporating a human response model that captures human-robot interactions while accounting for individual preferences and biases, we develop a fully distributed algorithm. This algorithm guides the states of the autonomous agents to equilibrium points which, when combined with the human responses, yield a globally optimal resource allocation. We provide both theoretical analysis and numerical simulations to validate the effectiveness of the proposed approach.
- [6] arXiv:2504.02093 [pdf, html, other]
-
Title: An Integrated Transportation Network and Power Grid Simulation Approach for Assessing Environmental Impact of Electric VehiclesDiana Wallison, Jessica Wert, Farnaz Safdarian, Komal Shetye, Thomas J. Overbye, Jonathan M. Snodgrass, Yanzhi XuComments: This work has been submitted to Nature for possible publicationSubjects: Systems and Control (eess.SY)
This study develops an integrated approach that includes EV charging and power generation to assess the complex cross-sector interactions of vehicle electrification and its environmental impact. The charging load from on-road EV operation is developed based on a regional-level transportation simulation and charging behavior simulation, considering different EV penetration levels, congestion levels, and charging strategies. The emissions from EGUs are estimated from a dispatch study in a power grid simulation using the charging load as a major input. A case study of Austin, Texas is performed to quantify the environmental impact of EV adoption on both on-road and EGU emission sources at the regional level. The results demonstrate the range of emission impact under a combination of factors.
- [7] arXiv:2504.02099 [pdf, html, other]
-
Title: Orthodromic Routing and Forwarding for Large Satellite ConstellationsComments: 6 pages, 7 figuresSubjects: Systems and Control (eess.SY)
Low earth orbit satellite constellations with intersatellite links (ISLs) are currently being developed and deployed. The availability of ISLs provides the capability to route across the satellite constellation, rather than using the satellite as a single hop in a bent-pipe configuration. We present a fully distributed solution to routing and forwarding which we call Orthodromic Routing (OR(r) ). OR(r) routing is built on a foundation of both geographic and link state routing to create a hybrid protocol which scales to enormous constellations with excellent failure handling. Our work includes an addressing and forwarding plane for OR(r)which can be implemented in hardware in a highly parallel manner to achieve line rates while only requiring a bounded number of forwarding table entries.
- [8] arXiv:2504.02129 [pdf, html, other]
-
Title: Towards Enabling Learning for Time-Varying finite horizon Sequential Decision-Making Problems*Subjects: Systems and Control (eess.SY); Combinatorics (math.CO); Optimization and Control (math.OC)
Parameterized Sequential Decision Making (Para-SDM) framework models a wide array of network design applications spanning supply-chain, transportation, and sensor networks. These problems entail sequential multi-stage optimization characterized by states, control actions, and cost functions dependent on designable parameters. The challenge is to determine both the sequential decision policy and parameters simultaneously to minimize cumulative stagewise costs. Many Para-SDM problems are NP-hard and often necessitate time-varying policies. Existing algorithms tackling finite-horizon time-varying Para-SDM problems struggle with scalability when faced with a large number of states. Conversely, the sole algorithm addressing infinite-horizon Para-SDM assumes time (stage)-invariance, yielding stationary policies. However, this approach proves scalable for time-invariant problems by leveraging deep neural networks to learn optimal stage-invariant state-action value functions, enabling handling of large-scale scenarios. This article proposes a novel approach that reinterprets finite-horizon, time-varying Para-SDM problems as equivalent time-invariant problems through topography lifting. Our method achieves nearly identical results to the time-varying solution while exhibiting improved performance times in various simulations, notably in the small cell network problem. This fresh perspective on Para-SDM problems expands the scope of addressable issues and holds promise for future scalability through the integration of learning methods.
- [9] arXiv:2504.02134 [pdf, html, other]
-
Title: Robust Channel Estimation for Optical Wireless Communications Using Neural NetworkSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Optical Wireless Communication (OWC) has gained significant attention due to its high-speed data transmission and throughput. Optical wireless channels are often assumed to be flat, but we evaluate frequency selective channels to consider high data rate optical wireless or very dispersive environments. To address this for optical scenarios, this paper presents a robust channel estimation framework with low-complexity to mitigate frequency-selective effects, then to improve system reliability and performance. This channel estimation framework contains a neural network that can estimate general optical wireless channels without prior channel information about the environment. Based on this estimate and the corresponding delay spread, one of several candidate offline-trained neural networks will be activated to predict this channel. Simulation results demonstrate that the proposed method has improved and robust normalized mean square error (NMSE) and bit error rate (BER) performance compared to conventional estimation methods while maintaining computational efficiency. These findings highlight the potential of neural network solutions in enhancing the performance of OWC systems under indoor channel conditions.
- [10] arXiv:2504.02147 [pdf, html, other]
-
Title: Data-Driven Nonconvex Reachability Analysis using Exact MultiplicationSubjects: Systems and Control (eess.SY)
This paper addresses a fundamental challenge in data-driven reachability analysis: accurately representing and propagating non-convex reachable sets. We propose a novel approach using constrained polynomial zonotopes to describe reachable sets for unknown LTI systems. Unlike constrained zonotopes commonly used in existing literature, constrained polynomial zonotopes are closed under multiplication with constrained matrix zonotopes. We leverage this property to develop an exact multiplication method that preserves the non-convex geometry of reachable sets without resorting to approximations. We demonstrate that our approach provides tighter over-approximations of reachable sets for LTI systems compared to conventional methods.
- [11] arXiv:2504.02198 [pdf, html, other]
-
Title: Error Analysis of Sampling Algorithms for Approximating Stochastic Optimal ControlSubjects: Systems and Control (eess.SY); Numerical Analysis (math.NA); Optimization and Control (math.OC)
This paper is concerned with the error analysis of two types of sampling algorithms, namely model predictive path integral (MPPI) and an interacting particle system (\IPS) algorithm, that have been proposed in the literature for numerical approximation of the stochastic optimal control. The analysis is presented through the lens of Gibbs variational principle. For an illustrative example of a single-stage stochastic optimal control problem, analytical expressions for approximation error and scaling laws, with respect to the state dimension and sample size, are derived. The analytical results are illustrated with numerical simulations.
- [12] arXiv:2504.02216 [pdf, html, other]
-
Title: Image Coding for Machines via Feature-Preserving Rate-Distortion OptimizationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Many images and videos are primarily processed by computer vision algorithms, involving only occasional human inspection. When this content requires compression before processing, e.g., in distributed applications, coding methods must optimize for both visual quality and downstream task performance. We first show that, given the features obtained from the original and the decoded images, an approach to reduce the effect of compression on a task loss is to perform rate-distortion optimization (RDO) using the distance between features as a distortion metric. However, optimizing directly such a rate-distortion trade-off requires an iterative workflow of encoding, decoding, and feature evaluation for each coding parameter, which is computationally impractical. We address this problem by simplifying the RDO formulation to make the distortion term computable using block-based encoders. We first apply Taylor's expansion to the feature extractor, recasting the feature distance as a quadratic metric with the Jacobian matrix of the neural network. Then, we replace the linearized metric with a block-wise approximation, which we call input-dependent squared error (IDSE). To reduce computational complexity, we approximate IDSE using Jacobian sketches. The resulting loss can be evaluated block-wise in the transform domain and combined with the sum of squared errors (SSE) to address both visual quality and computer vision performance. Simulations with AVC across multiple feature extractors and downstream neural networks show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE, with no decoder complexity overhead and just a 7% encoder complexity increase.
- [13] arXiv:2504.02222 [pdf, html, other]
-
Title: APSeg: Auto-Prompt Model with Acquired and Injected Knowledge for Nuclear Instance Segmentation and ClassificationComments: 10 pages, 3 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Nuclear instance segmentation and classification provide critical quantitative foundations for digital pathology diagnosis. With the advent of the foundational Segment Anything Model (SAM), the accuracy and efficiency of nuclear segmentation have improved significantly. However, SAM imposes a strong reliance on precise prompts, and its class-agnostic design renders its classification results entirely dependent on the provided prompts. Therefore, we focus on generating prompts with more accurate localization and classification and propose \textbf{APSeg}, \textbf{A}uto-\textbf{P}rompt model with acquired and injected knowledge for nuclear instance \textbf{Seg}mentation and classification. APSeg incorporates two knowledge-aware modules: (1) Distribution-Guided Proposal Offset Module (\textbf{DG-POM}), which learns distribution knowledge through density map guided, and (2) Category Knowledge Semantic Injection Module (\textbf{CK-SIM}), which injects morphological knowledge derived from category descriptions. We conducted extensive experiments on the PanNuke and CoNSeP datasets, demonstrating the effectiveness of our approach. The code will be released upon acceptance.
- [14] arXiv:2504.02373 [pdf, html, other]
-
Title: HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image EnhancementComments: 7 pages, 5 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In practical applications, conventional methods generate large volumes of low-light images that require compression for efficient storage and transmission. However, most existing methods either disregard the removal of potential compression artifacts during the enhancement process or fail to establish a unified framework for joint task enhancement of images with varying compression qualities. To solve this problem, we propose the hybrid priors-guided network (HPGN), which enhances compressed low-light images by integrating both compression and illumination priors. Our approach fully utilizes the JPEG quality factor (QF) and DCT quantization matrix (QM) to guide the design of efficient joint task plug-and-play modules. Additionally, we employ a random QF generation strategy to guide model training, enabling a single model to enhance images across different compression levels. Experimental results confirm the superiority of our proposed method.
- [15] arXiv:2504.02380 [pdf, html, other]
-
Title: Beyond Asymptotics: Targeted exploration with finite-sample guaranteesSubjects: Systems and Control (eess.SY)
In this paper, we introduce a targeted exploration strategy for the non-asymptotic, finite-time case. The proposed strategy is applicable to uncertain linear time-invariant systems subject to sub-Gaussian disturbances. As the main result, the proposed approach provides a priori guarantees, ensuring that the optimized exploration inputs achieve a desired accuracy of the model parameters. The technical derivation of the strategy (i) leverages existing non-asymptotic identification bounds with self-normalized martingales, (ii) utilizes spectral lines to predict the effect of sinusoidal excitation, and (iii) effectively accounts for spectral transient error and parametric uncertainty. A numerical example illustrates how the finite exploration time influence the required exploration energy.
- [16] arXiv:2504.02382 [pdf, html, other]
-
Title: Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 ChallengeYudi Sang, Yanzhen Liu, Sutuke Yibulayimu, Yunning Wang, Benjamin D. Killeen, Mingxu Liu, Ping-Cheng Ku, Ole Johannsen, Karol Gotkowski, Maximilian Zenk, Klaus Maier-Hein, Fabian Isensee, Peiyan Yue, Yi Wang, Haidong Yu, Zhaohong Pan, Yutong He, Xiaokun Liang, Daiqi Liu, Fuxin Fan, Artur Jurgas, Andrzej Skalski, Yuxi Ma, Jing Yang, Szymon Płotka, Rafał Litka, Gang Zhu, Yingchun Song, Mathias Unberath, Mehran Armand, Dan Ruan, S. Kevin Zhou, Qiyong Cao, Chunpeng Zhao, Xinbao Wu, Yu WangComments: PENGWIN 2024 Challenge ReportSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The segmentation of pelvic fracture fragments in CT and X-ray images is crucial for trauma diagnosis, surgical planning, and intraoperative guidance. However, accurately and efficiently delineating the bone fragments remains a significant challenge due to complex anatomy and imaging limitations. The PENGWIN challenge, organized as a MICCAI 2024 satellite event, aimed to advance automated fracture segmentation by benchmarking state-of-the-art algorithms on these complex tasks. A diverse dataset of 150 CT scans was collected from multiple clinical centers, and a large set of simulated X-ray images was generated using the DeepDRR method. Final submissions from 16 teams worldwide were evaluated under a rigorous multi-metric testing scheme. The top-performing CT algorithm achieved an average fragment-wise intersection over union (IoU) of 0.930, demonstrating satisfactory accuracy. However, in the X-ray task, the best algorithm attained an IoU of 0.774, highlighting the greater challenges posed by overlapping anatomical structures. Beyond the quantitative evaluation, the challenge revealed methodological diversity in algorithm design. Variations in instance representation, such as primary-secondary classification versus boundary-core separation, led to differing segmentation strategies. Despite promising results, the challenge also exposed inherent uncertainties in fragment definition, particularly in cases of incomplete fractures. These findings suggest that interactive segmentation approaches, integrating human decision-making with task-relevant information, may be essential for improving model reliability and clinical applicability.
- [17] arXiv:2504.02408 [pdf, other]
-
Title: Translation of Fetal Brain Ultrasound Images into Pseudo-MRI Images using Artificial IntelligenceComments: 13 pages, 7 figuresSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Ultrasound is a widely accessible and cost-effective medical imaging tool commonly used for prenatal evaluation of the fetal brain. However, it has limitations, particularly in the third trimester, where the complexity of the fetal brain requires high image quality for extracting quantitative data. In contrast, magnetic resonance imaging (MRI) offers superior image quality and tissue differentiation but is less available, expensive, and requires time-consuming acquisition. Thus, transforming ultrasonic images into an MRI-mimicking display may be advantageous and allow better tissue anatomy presentation. To address this goal, we have examined the use of artificial intelligence, implementing a diffusion model renowned for generating high-quality images. The proposed method, termed "Dual Diffusion Imposed Correlation" (DDIC), leverages a diffusion-based translation methodology, assuming a shared latent space between ultrasound and MRI domains. Model training was obtained utilizing the "HC18" dataset for ultrasound and the "CRL fetal brain atlas" along with the "FeTA " datasets for MRI. The generated pseudo-MRI images provide notable improvements in visual discrimination of brain tissue, especially in the lateral ventricles and the Sylvian fissure, characterized by enhanced contrast clarity. Improvement was demonstrated in Mutual information, Peak signal-to-noise ratio, Fréchet Inception Distance, and Contrast-to-noise ratio. Findings from these evaluations indicate statistically significant superior performance of the DDIC compared to other translation methodologies. In addition, a Medical Opinion Test was obtained from 5 gynecologists. The results demonstrated display improvement in 81% of the tested images. In conclusion, the presented pseudo-MRI images hold the potential for streamlining diagnosis and enhancing clinical outcomes through improved representation.
- [18] arXiv:2504.02506 [pdf, html, other]
-
Title: Secrecy Performance of a Keyhole-based Multi-user System with Multiple EavesdroppersSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
This paper investigates the secrecy performance of a keyhole-aided multi-user communication network in the presence of multiple eavesdroppers. The communication happens through the same keyhole for legitimate users and eavesdroppers. In this context, the secrecy performance is evaluated for a user scheduling technique by obtaining the exact closed-form expression of secrecy outage probability (SOP). Further, a simplified asymptotic SOP expression is derived assuming high signal-to-noise ratio (SNR) scenario for a better understanding of the impact of system parameters. The effect of the keyhole parameters, number of users, number of eavesdroppers, and threshold secrecy rate on the SOP performance are also investigated for the considered system model. In the high-SNR regime, the asymptotic SOP saturates to a constant value and does not depend on the keyhole parameter and the channel parameter of the source-to-keyhole channel.
- [19] arXiv:2504.02520 [pdf, html, other]
-
Title: Beyond Traditional Coherence Time: An Electromagnetic Perspective for Mobile ChannelsComments: 5 pages, 5 figuresSubjects: Signal Processing (eess.SP)
Channel coherence time has been widely regarded as a critical parameter in the design of mobile systems. However, a prominent challenge lies in integrating electromagnetic (EM) polarization effects into the derivation of the channel coherence time. In this paper, we develop a framework to analyze the impact of polarization mismatch on the channel coherence time. Specifically, we first establish an EM channel model to capture the essence of EM wave propagation. Based on this model, we then derive the EM temporal correlation function, incorporating the effects of polarization mismatch and beam misalignment. Further, considering the random orientation of the mobile user equipment (UE), we derive a closed-form solution for the EM coherence time in the turning scenario. When the trajectory degenerates into a straight line, we also provide a closed-form lower bound on the EM coherence time. The simulation results validate our theoretical analysis and reveal that neglecting the EM polarization effects leads to overly optimistic estimates of the EM coherence time.
- [20] arXiv:2504.02529 [pdf, html, other]
-
Title: Probabilistic Simulation of Aircraft Descent via a Hybrid Physics-Data ApproachSubjects: Systems and Control (eess.SY)
This paper presents a method for generating probabilistic descent trajectories in simulations of real-world airspace. A dataset of 116,066 trajectories harvested from Mode S radar returns in UK airspace was used to train and test the model. Thirteen aircraft types with varying performance characteristics were investigated. It was found that the error in the mean prediction of time to reach the bottom of descent for the proposed method was less than that of the the Base of Aircraft Data (BADA) model by a factor of 10. Furthermore, the method was capable of generating a range of trajectories that were similar to the held out test dataset when analysed in distribution. The proposed method is hybrid, with aircraft drag and calibrated airspeed functions generated probabilistically to parameterise the BADA equations, ensuring the physical plausibility of generated trajectories.
- [21] arXiv:2504.02565 [pdf, html, other]
-
Title: MAD: A Magnitude And Direction Policy Parametrization for Stability Constrained Reinforcement LearningSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
We introduce magnitude and direction (MAD) policies, a policy parameterization for reinforcement learning (RL) that preserves Lp closed-loop stability for nonlinear dynamical systems. Although complete in their ability to describe all stabilizing controllers, methods based on nonlinear Youla and system-level synthesis are significantly affected by the difficulty of parameterizing Lp-stable operators. In contrast, MAD policies introduce explicit feedback on state-dependent features - a key element behind the success of RL pipelines - without compromising closed-loop stability. This is achieved by describing the magnitude of the control input with a disturbance-feedback Lp-stable operator, while selecting its direction based on state-dependent features through a universal function approximator. We further characterize the robust stability properties of MAD policies under model mismatch. Unlike existing disturbance-feedback policy parameterizations, MAD policies introduce state-feedback components compatible with model-free RL pipelines, ensuring closed-loop stability without requiring model information beyond open-loop stability. Numerical experiments show that MAD policies trained with deep deterministic policy gradient (DDPG) methods generalize to unseen scenarios, matching the performance of standard neural network policies while guaranteeing closed-loop stability by design.
- [22] arXiv:2504.02575 [pdf, other]
-
Title: Assessing Geographical and Seasonal Influences on Energy Efficiency of Electric Drayage TrucksAnkur Shiledar, Manfredi Villani, Joseph N.E. Lucero, Ruixiao Sun, Vivek A. Sujan, Simona Onori, Giorgio RizzoniSubjects: Systems and Control (eess.SY)
The electrification of heavy-duty vehicles is a critical pathway towards improved energy efficiency of the freight sector. The current battery electric truck technology poses several challenges to the operations of commercial vehicles, such as limited driving range, sensitivity to climate conditions, and long recharging times. Estimating the energy consumption of heavy-duty electric trucks is crucial to assess the feasibility of the fleet electrification and its impact on the electric grid. This paper focuses on developing a model-based simulation approach to predict and analyze the energy consumption of drayage trucks used in ports logistic operations, considering seasonal climate variations and geographical characteristics. The paper includes results for three major container ports within the United States, providing region-specific insights into driving range, payload capacity, and charging infrastructure requirements, which will inform decision-makers in integrating electric trucks into the existing drayage operations and plan investments for electric grid development.
- [23] arXiv:2504.02579 [pdf, html, other]
-
Title: Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image CompressionComments: To appear at CVPR 2025Subjects: Image and Video Processing (eess.IV)
Generative neural image compression supports data representation at extremely low bitrate, synthesizing details at the client and consistently producing highly realistic images. By leveraging the similarities between quantization error and additive noise, diffusion-based generative image compression codecs can be built using a latent diffusion model to "denoise" the artifacts introduced by quantization. However, we identify three critical gaps in previous approaches following this paradigm (namely, the noise level, noise type, and discretization gaps) that result in the quantized data falling out of the data distribution known by the diffusion model. In this work, we propose a novel quantization-based forward diffusion process with theoretical foundations that tackles all three aforementioned gaps. We achieve this through universal quantization with a carefully tailored quantization schedule and a diffusion model trained with uniform noise. Compared to previous work, our proposal produces consistently realistic and detailed reconstructions, even at very low bitrates. In such a regime, we achieve the best rate-distortion-realism performance, outperforming previous related works.
- [24] arXiv:2504.02582 [pdf, html, other]
-
Title: Ambiguity Function Analysis of Affine Frequency Division Multiplexing for Integrated Sensing and CommunicationSubjects: Signal Processing (eess.SP)
Affine frequency division multiplexing (AFDM) is a chirp-based multicarrier waveform that was recently proposed for communication over doubly dispersive channels. Given its chirp nature, AFDM is expected to have superior sensing capabilities compared to orthogonal frequency division multiplexing (OFDM) and is thus a promising candidate for integrated sensing and communication (ISAC) applications. In this paper, we derive a closed-form expression for the ambiguity function of AFDM waveforms modulated with $M$-ary quadrature amplitude modulation (QAM) data symbols. We determine the condition on the chirp rate of the AFDM waveform that minimizes the sidelobes in the delay/range domain in the presence of random $M$-ary QAM symbols, thereby improving overall sensing performance. Additionally, we find an approximate statistical distribution for the magnitude of the derived ambiguity function. Simulation results are presented to evaluate the sensing performance of the AFDM waveform for various system parameters and to compare its peak-to-sidelobe ratio (PSLR) and integrated sidelobe ratio (ISLR) with those of OFDM.
- [25] arXiv:2504.02597 [pdf, html, other]
-
Title: Regulating Spatial Fairness in a Tripartite Micromobility Sharing System via Reinforcement LearningComments: 6 pages, 2 figures, accepted at the 2025 Innovation & Society: Statistics and Data Science for Evaluation and Quality (IES) on February 24th, 2025. arXiv admin note: text overlap with arXiv:2403.15780Subjects: Systems and Control (eess.SY)
In the growing field of Shared Micromobility Systems, which holds great potential for shaping urban transportation, fairness-oriented approaches remain largely unexplored. This work addresses such a gap by investigating the balance between performance optimization and algorithmic fairness in Shared Micromobility Services using Reinforcement Learning. Our methodology achieves equitable outcomes, measured by the Gini index, across central, peripheral, and remote station categories. By strategically rebalancing vehicle distribution, it maximizes operator performance while upholding fairness principles. The efficacy of our approach is validated through a case study using synthetic data.
- [26] arXiv:2504.02613 [pdf, html, other]
-
Title: UAV-Assisted 5G Networks: Mobility-Aware 3D Trajectory Optimization and Resource Allocation for Dynamic EnvironmentsAsad Mahmood, Thang X. Vu, Wali Ullah Khan, Symeon Chatzinotas, Björn Ottersten (Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg)Subjects: Signal Processing (eess.SP)
This paper proposes a framework for robust design of UAV-assisted wireless networks that combine 3D trajectory optimization with user mobility prediction to address dynamic resource allocation challenges. We proposed a sparse second-order prediction model for real-time user tracking coupled with heuristic user clustering to balance service quality and computational complexity. The joint optimization problem is formulated to maximize the minimum rate. It is then decomposed into user association, 3D trajectory design, and resource allocation subproblems, which are solved iteratively via successive convex approximation (SCA). Extensive simulations demonstrate: (1) near-optimal performance with $\epsilon \approx 0.67\%$ deviation from upper-bound solutions, (2) $16\%$ higher minimum rates for distant users compared to non-predictive 3D designs, and (3) $10-30\%$ faster outage mitigation than time-division benchmarks. The framework's adaptive speed control enables precise mobile user tracking while maintaining energy efficiency under constrained flight time. Results demonstrate superior robustness in edge-coverage scenarios, making it particularly suitable for $5G/6G$ networks.
- [27] arXiv:2504.02628 [pdf, html, other]
-
Title: Towards Computation- and Communication-efficient Computational PathologyChu Han, Bingchao Zhao, Jiatai Lin, Shanshan Lyu, Longfei Wang, Tianpeng Deng, Cheng Lu, Changhong Liang, Hannah Y. Wen, Xiaojing Guo, Zhenwei Shi, Zaiyi LiuSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Despite the impressive performance across a wide range of applications, current computational pathology models face significant diagnostic efficiency challenges due to their reliance on high-magnification whole-slide image analysis. This limitation severely compromises their clinical utility, especially in time-sensitive diagnostic scenarios and situations requiring efficient data transfer. To address these issues, we present a novel computation- and communication-efficient framework called Magnification-Aligned Global-Local Transformer (MAGA-GLTrans). Our approach significantly reduces computational time, file transfer requirements, and storage overhead by enabling effective analysis using low-magnification inputs rather than high-magnification ones. The key innovation lies in our proposed magnification alignment (MAGA) mechanism, which employs self-supervised learning to bridge the information gap between low and high magnification levels by effectively aligning their feature representations. Through extensive evaluation across various fundamental CPath tasks, MAGA-GLTrans demonstrates state-of-the-art classification performance while achieving remarkable efficiency gains: up to 10.7 times reduction in computational time and over 20 times reduction in file transfer and storage requirements. Furthermore, we highlight the versatility of our MAGA framework through two significant extensions: (1) its applicability as a feature extractor to enhance the efficiency of any CPath architecture, and (2) its compatibility with existing foundation models and histopathology-specific encoders, enabling them to process low-magnification inputs with minimal information loss. These advancements position MAGA-GLTrans as a particularly promising solution for time-sensitive applications, especially in the context of intraoperative frozen section diagnosis where both accuracy and efficiency are paramount.
- [28] arXiv:2504.02641 [pdf, html, other]
-
Title: Utilizing 5G NR SSB Blocks for Passive Detection and Localization of Low-Altitude DronesSubjects: Signal Processing (eess.SP)
With the exponential growth of the unmanned aerial vehicle (UAV) industry and a broad range of applications expected to appear in the coming years, the employment of traditional radar systems is becoming increasingly cumbersome for UAV supervision. Motivated by this emerging challenge, this paper investigates the feasibility of employing integrated sensing and communication (ISAC) systems implemented over current and future wireless networks to perform this task. We propose a sensing mechanism based on the synchronization signal block (SSB) in the fifth-generation (5G) standard that performs sensing in a passive bistatic setting. By assuming planar arrays at the sensing nodes and according to the 5G standard, we consider that the SSB signal is sent in a grid of orthogonal beams that are multiplexed in time, with some of them pointing toward a surveillance region where low-altitude drones can be flying. The Cramer-Rao Bound (CRB) is derived as the theoretical bound for range and velocity estimation. Our results demonstrate the potential of employing SSB signals for UAV-like target localization at low SNR.
- [29] arXiv:2504.02647 [pdf, html, other]
-
Title: Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic SegmentationComments: Accepted by IEEE TGRS 2025Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Semantic segmentation of high-resolution remote sensing images plays a crucial role in land-use monitoring and urban planning. Recent remarkable progress in deep learning-based methods makes it possible to generate satisfactory segmentation results. However, existing methods still face challenges in adapting network parameters to various land cover distributions and enhancing the interaction between spatial and frequency domain features. To address these challenges, we propose the Adaptive Frequency Enhancement Network (AFENet), which integrates two key components: the Adaptive Frequency and Spatial feature Interaction Module (AFSIM) and the Selective feature Fusion Module (SFM). AFSIM dynamically separates and modulates high- and low-frequency features according to the content of the input image. It adaptively generates two masks to separate high- and low-frequency components, therefore providing optimal details and contextual supplementary information for ground object feature representation. SFM selectively fuses global context and local detailed features to enhance the network's representation capability. Hence, the interactions between frequency and spatial features are further enhanced. Extensive experiments on three publicly available datasets demonstrate that the proposed AFENet outperforms state-of-the-art methods. In addition, we also validate the effectiveness of AFSIM and SFM in managing diverse land cover types and complex scenarios. Our codes are available at this https URL.
- [30] arXiv:2504.02648 [pdf, html, other]
-
Title: Controlled Social Learning: Altruism vs. BiasSubjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT); Social and Information Networks (cs.SI)
We introduce a model of sequential social learning in which a planner may pay a cost to adjust the private signal precision of some agents. This framework presents a new optimization problem for social learning that sheds light on practical policy questions, such as how the socially optimal level of ad personalization changes according to current beliefs or how a biased planner might derail social learning. We then characterize the optimal policies of an altruistic planner who maximizes social welfare and a biased planner who seeks to induce a specific action. Even for a planner who has equivalent knowledge to an individual, cannot lie or cherry-pick information, and is fully observable, we demonstrate that it can dramatically influence social welfare in both positive and negative directions. An important area for future exploration is how one might prevent these latter outcomes to protect against the manipulation of social learning.
- [31] arXiv:2504.02653 [pdf, html, other]
-
Title: Online and Offline Space-Filling Input Design for Nonlinear System Identification: A Receding Horizon Control-Based ApproachSubjects: Systems and Control (eess.SY)
The effectiveness of data-driven techniques heavily depends on the input signal used to generate the estimation data. However, a significant research gap exists in the field of input design for nonlinear dynamic system identification. In particular, existing methods largely overlook the minimization of the generalization error, i.e., model inaccuracies in regions not covered by the estimation dataset. This work addresses this gap by proposing an input design method that embeds a novel optimality criterion within a receding horizon control (RHC)-based optimization framework. The distance-based optimality criterion induces a space-filling design within a user-defined region of interest in a surrogate model's input space, requiring only minimal prior knowledge. Additionally, the method is applicable both online, where model parameters are continuously updated based on process observations, and offline, where a fixed model is employed. The space-filling performance of the proposed strategy is evaluated on an artificial example and compared to state-of-the-art methods, demonstrating superior efficiency in exploring process operating spaces.
- [32] arXiv:2504.02668 [pdf, html, other]
-
Title: Two-Stage nnU-Net for Automatic Multi-class Bi-Atrial Segmentation from LGE-MRIsComments: MBAS Challenge, STACOM, MICCAI 2024Subjects: Image and Video Processing (eess.IV)
Late gadolinium enhancement magnetic resonance imaging (LGE-MRI) is used to visualise atrial fibrosis and scars, providing important information for personalised atrial fibrillation (AF) treatments. Since manual analysis and delineations of these images can be both labour-intensive and subject to variability, we develop an automatic pipeline to perform segmentation of the left atrial (LA) cavity, the right atrial (RA) cavity, and the wall of both atria on LGE-MRI. Our method is based on a two-stage nnU-Net architecture, combining 2D and 3D convolutional networks, and incorporates adaptive histogram equalisation to improve tissue contrast in the input images and morphological operations on the output segmentation maps. We achieve Dice similarity coefficients of 0.92 +/- 0.03, 0.93 +/- 0.03, 0.71 +/- 0.05 and 95% Hausdorff distances of (3.89 +/- 6.67) mm, (4.42 +/- 1.66) mm and (3.94 +/- 1.83) mm for LA, RA, and wall, respectively. The accurate delineation of the LA, RA and the myocardial wall is the first step in analysing atrial structure in cardiovascular patients, especially those with AF. This can allow clinicians to provide adequate and personalised treatment plans in a timely manner.
- [33] arXiv:2504.02679 [pdf, html, other]
-
Title: A Set-Theoretic Robust Control Approach for Linear Quadratic Games with Unknown CounterpartsComments: Submitted to 64th IEEE Conference on Decision and ControlSubjects: Systems and Control (eess.SY)
Ensuring robust decision-making in multi-agent systems is challenging when agents have distinct, possibly conflicting objectives and lack full knowledge of each other s strategies. This is apparent in safety-critical applications such as human-robot interaction and assisted driving, where uncertainty arises not only from unknown adversary strategies but also from external disturbances. To address this, the paper proposes a robust adaptive control approach based on linear quadratic differential games. Our method allows a controlled agent to iteratively refine its belief about the adversary strategy and disturbances using a set-membership approach, while simultaneously adapting its policy to guarantee robustness against the uncertain adversary policy and improve performance over time. We formally derive theoretical guarantees on the robustness of the proposed control scheme and its convergence to epsilon-Nash strategies. The effectiveness of our approach is demonstrated in a numerical simulation.
- [34] arXiv:2504.02743 [pdf, html, other]
-
Title: Sequential Binary Hypothesis Testing with Competing Agents under Information AsymmetryComments: 8 pages, 4 figures, submitted to IEEE Conference on Decision and Control 2025Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
This paper concerns sequential hypothesis testing in competitive multi-agent systems where agents exchange potentially manipulated information. Specifically, a two-agent scenario is studied where each agent aims to correctly infer the true state of nature while optimizing decision speed and accuracy. At each iteration, agents collect private observations, update their beliefs, and share (possibly corrupted) belief signals with their counterparts before deciding whether to stop and declare a state, or continue gathering more information. The analysis yields three main results: (1)~when agents share information strategically, the optimal signaling policy involves equal-probability randomization between truthful and inverted beliefs; (2)~agents maximize performance by relying solely on their own observations for belief updating while using received information only to anticipate their counterpart's stopping decision; and (3)~the agent reaching their confidence threshold first cause the other agent to achieve a higher conditional probability of error. Numerical simulations further demonstrate that agents with higher KL divergence in their conditional distributions gain competitive advantage. Furthermore, our results establish that information sharing -- despite strategic manipulation -- reduces overall system stopping time compared to non-interactive scenarios, which highlights the inherent value of communication even in this competitive setup.
- [35] arXiv:2504.02766 [pdf, html, other]
-
Title: On Composable and Parametric Uncertainty in Systems Co-DesignComments: 8 pages, submitted to IEEE Conference on Decision and Control (CDC) 2025Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Optimizing the design of complex systems requires navigating interdependent decisions, heterogeneous components, and multiple objectives. Our monotone theory of co-design offers a compositional framework for addressing this challenge, modeling systems as Design Problems (DPs), representing trade-offs between functionalities and resources within partially ordered sets. While current approaches model uncertainty using intervals, capturing worst- and best-case bounds, they fail to express probabilistic notions such as risk and confidence. These limitations hinder the applicability of co-design in domains where uncertainty plays a critical role. In this paper, we introduce a unified framework for composable uncertainty in co-design, capturing intervals, distributions, and parametrized models. This extension enables reasoning about risk-performance trade-offs and supports advanced queries such as experiment design, learning, and multi-stage decision making. We demonstrate the expressiveness and utility of the framework via a numerical case study on the uncertainty-aware co-design of task-driven Unmanned Aerial Vehicle (UAV).
New submissions (showing 35 of 35 entries)
- [36] arXiv:2504.01970 (cross-list from math.OC) [pdf, html, other]
-
Title: Differentiable Optimization for Deep Learning-Enhanced DC Approximation of AC Optimal Power FlowComments: 9 pages, 5 figuresSubjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
The growing scale of power systems and the increasing uncertainty introduced by renewable energy sources necessitates novel optimization techniques that are significantly faster and more accurate than existing methods. The AC Optimal Power Flow (AC-OPF) problem, a core component of power grid optimization, is often approximated using linearized DC Optimal Power Flow (DC-OPF) models for computational tractability, albeit at the cost of suboptimal and inefficient decisions. To address these limitations, we propose a novel deep learning-based framework for network equivalency that enhances DC-OPF to more closely mimic the behavior of AC-OPF. The approach utilizes recent advances in differentiable optimization, incorporating a neural network trained to predict adjusted nodal shunt conductances and branch susceptances in order to account for nonlinear power flow behavior. The model can be trained end-to-end using modern deep learning frameworks by leveraging the implicit function theorem. Results demonstrate the framework's ability to significantly improve prediction accuracy, paving the way for more reliable and efficient power systems.
- [37] arXiv:2504.01984 (cross-list from stat.AP) [pdf, html, other]
-
Title: Stable EEG Source Estimation for Standardized Kalman Filter using Change Rate TrackingSubjects: Applications (stat.AP); Signal Processing (eess.SP); Numerical Analysis (math.NA)
This article focuses on the measurement and evolution modeling of Standardized Kalman filtering in brain activity estimation when non-invasive electroencephalography measurements are used as the data. Here, we propose new parameter tuning and model utilizing the change rate of brain activity distribution to improve the stability of the otherwise accurate estimation. Namely, we pose a backward differentiation-based measurement model for the change rate that increased the stability of the tracking notably. Simulated data and data from a real subject were used in experiments.
- [38] arXiv:2504.01996 (cross-list from cs.RO) [pdf, html, other]
-
Title: Real-Time Navigation for Autonomous Aerial Vehicles Using VideoComments: Submitted to Journal of Real-Time Image ProcessingSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Most applications in autonomous navigation using mounted cameras rely on the construction and processing of geometric 3D point clouds, which is an expensive process. However, there is another simpler way to make a space navigable quickly: to use semantic information (e.g., traffic signs) to guide the agent. However, detecting and acting on semantic information involves Computer Vision~(CV) algorithms such as object detection, which themselves are demanding for agents such as aerial drones with limited onboard resources. To solve this problem, we introduce a novel Markov Decision Process~(MDP) framework to reduce the workload of these CV approaches. We apply our proposed framework to both feature-based and neural-network-based object-detection tasks, using open-loop and closed-loop simulations as well as hardware-in-the-loop emulations. These holistic tests show significant benefits in energy consumption and speed with only a limited loss in accuracy compared to models based on static features and neural networks.
- [39] arXiv:2504.02061 (cross-list from cs.CV) [pdf, html, other]
-
Title: Aligned Better, Listen Better for Audio-Visual Large Language ModelsYuxin Guo, Shuailei Ma, Shijie Ma, Xiaoyi Bao, Chen-Wei Xie, Kecheng Zheng, Tingyu Weng, Siyang Sun, Yun Zheng, Wei ZouComments: Accepted to ICLR 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Audio is essential for multimodal video understanding. On the one hand, video inherently contains audio, which supplies complementary information to vision. Besides, video large language models (Video-LLMs) can encounter many audio-centric settings. However, existing Video-LLMs and Audio-Visual Large Language Models (AV-LLMs) exhibit deficiencies in exploiting audio information, leading to weak understanding and hallucinations. To solve the issues, we delve into the model architecture and dataset. (1) From the architectural perspective, we propose a fine-grained AV-LLM, namely Dolphin. The concurrent alignment of audio and visual modalities in both temporal and spatial dimensions ensures a comprehensive and accurate understanding of videos. Specifically, we devise an audio-visual multi-scale adapter for multi-scale information aggregation, which achieves spatial alignment. For temporal alignment, we propose audio-visual interleaved merging. (2) From the dataset perspective, we curate an audio-visual caption and instruction-tuning dataset, called AVU. It comprises 5.2 million diverse, open-ended data tuples (video, audio, question, answer) and introduces a novel data partitioning strategy. Extensive experiments show our model not only achieves remarkable performance in audio-visual understanding, but also mitigates potential hallucinations.
- [40] arXiv:2504.02084 (cross-list from cs.RO) [pdf, html, other]
-
Title: Evaluation of Flight Parameters in UAV-based 3D Reconstruction for Rooftop Infrastructure AssessmentComments: 8 pages, 6 figures, 2 tablesSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Rooftop 3D reconstruction using UAV-based photogrammetry offers a promising solution for infrastructure assessment, but existing methods often require high percentages of image overlap and extended flight times to ensure model accuracy when using autonomous flight paths. This study systematically evaluates key flight parameters-ground sampling distance (GSD) and image overlap-to optimize the 3D reconstruction of complex rooftop infrastructure. Controlled UAV flights were conducted over a multi-segment rooftop at Queen's University using a DJI Phantom 4 Pro V2, with varied GSD and overlap settings. The collected data were processed using Reality Capture software and evaluated against ground truth models generated from UAV-based LiDAR and terrestrial laser scanning (TLS). Experimental results indicate that a GSD range of 0.75-1.26 cm combined with 85% image overlap achieves a high degree of model accuracy, while minimizing images collected and flight time. These findings provide guidance for planning autonomous UAV flight paths for efficient rooftop assessments.
- [41] arXiv:2504.02103 (cross-list from physics.optics) [pdf, other]
-
Title: OAM-Assisted Self-Healing Is Directional, Proportional and PersistentMarek Klemes, Lan Hu, Greg Bowles, Alireza Ghayekhloo, Mohammad Akbari, Soulideth Thirakoune, Michael Schwartzman, Kevin Zhang, Tan Huy Ho, David Wessel, Wen TongComments: 20 pages, including 2 appendicesSubjects: Optics (physics.optics); Signal Processing (eess.SP)
In this paper we demonstrate the postulated mechanism of self-healing specifically due to orbital-angular-momentum (OAM) in radio vortex beams having equal beam-widths. In previous work we experimentally demonstrated self-healing effects in OAM beams at 28 GHz and postulated a theoretical mechanism to account for them. In this work we further characterize the OAM self-healing mechanism theoretically and confirm those characteristics with systematic and controlled experimental measurements on a 28 GHz outdoor link. Specifically, we find that the OAM self-healing mechanism is an additional self-healing mechanism in structured electromagnetic beams which is directional with respect to the displacement of an obstruction relative to the beam axis. We also confirm our previous findings that the amount of OAM self-healing is proportional to the OAM order, and additionally find that it persists beyond the focusing region into the far field. As such, OAM-assisted self-healing brings an advantage over other so-called non-diffracting beams both in terms of the minimum distance for onset of self-healing and the amount of self-healing obtainable. We relate our findings by extending theoretical models in the literature and develop a unifying electromagnetic analysis to account for self-healing of OAM-bearing non-diffracting beams more rigorously.
- [42] arXiv:2504.02114 (cross-list from cs.CR) [pdf, html, other]
-
Title: On Model Protection in Federated Learning against Eavesdropping AttacksSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
In this study, we investigate the protection offered by federated learning algorithms against eavesdropping adversaries. In our model, the adversary is capable of intercepting model updates transmitted from clients to the server, enabling it to create its own estimate of the model. Unlike previous research, which predominantly focuses on safeguarding client data, our work shifts attention protecting the client model itself. Through a theoretical analysis, we examine how various factors, such as the probability of client selection, the structure of local objective functions, global aggregation at the server, and the eavesdropper's capabilities, impact the overall level of protection. We further validate our findings through numerical experiments, assessing the protection by evaluating the model accuracy achieved by the adversary. Finally, we compare our results with methods based on differential privacy, underscoring their limitations in this specific context.
- [43] arXiv:2504.02171 (cross-list from math.OC) [pdf, html, other]
-
Title: On the threshold of excitable systems: An energy-based perspectiveSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO); Neurons and Cognition (q-bio.NC)
A fundamental characteristic of excitable systems is their ability to exhibit distinct subthreshold and suprathreshold behaviors. Precisely quantifying this distinction requires a proper definition of the threshold, which has remained elusive in neurodynamics. In this paper, we introduce a novel, energy-based threshold definition for excitable circuits grounded in dissipativity theory, specifically using the classical concept of required supply. According to our definition, the threshold corresponds to a local maximum of the required supply, clearly separating subthreshold passive responses from suprathreshold regenerative spikes. We illustrate and validate the proposed definition through analytical and numerical studies of three canonical systems: a simple RC circuit, the FitzHugh--Nagumo model, and the biophysically detailed Hodgkin--Huxley model.
- [44] arXiv:2504.02184 (cross-list from cs.RO) [pdf, html, other]
-
Title: Model Predictive Control with Visibility Graphs for Humanoid Path Planning and Tracking Against Adversarial OpponentsComments: This is a preprint version. This paper has been accepted to IEEE International Conference on Robotics and Automation (ICRA) 2025. The final published version will be available on IEEE XploreSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
In this paper we detail the methods used for obstacle avoidance, path planning, and trajectory tracking that helped us win the adult-sized, autonomous humanoid soccer league in RoboCup 2024. Our team was undefeated for all seated matches and scored 45 goals over 6 games, winning the championship game 6 to 1. During the competition, a major challenge for collision avoidance was the measurement noise coming from bipedal locomotion and a limited field of view (FOV). Furthermore, obstacles would sporadically jump in and out of our planned trajectory. At times our estimator would place our robot inside a hard constraint. Any planner in this competition must also be be computationally efficient enough to re-plan and react in real time. This motivated our approach to trajectory generation and tracking. In many scenarios long-term and short-term planning is needed. To efficiently find a long-term general path that avoids all obstacles we developed DAVG (Dynamic Augmented Visibility Graphs). DAVG focuses on essential path planning by setting certain regions to be active based on obstacles and the desired goal pose. By augmenting the states in the graph, turning angles are considered, which is crucial for a large soccer playing robot as turning may be more costly. A trajectory is formed by linearly interpolating between discrete points generated by DAVG. A modified version of model predictive control (MPC) is used to then track this trajectory called cf-MPC (Collision-Free MPC). This ensures short-term planning. Without having to switch formulations cf-MPC takes into account the robot dynamics and collision free constraints. Without a hard switch the control input can smoothly transition in cases where the noise places our robot inside a constraint boundary. The nonlinear formulation runs at approximately 120 Hz, while the quadratic version achieves around 400 Hz.
- [45] arXiv:2504.02214 (cross-list from cs.CV) [pdf, html, other]
-
Title: Geospatial Artificial Intelligence for Satellite-based Flood Extent Mapping: Concepts, Advances, and Future PerspectivesComments: 10 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Geospatial Artificial Intelligence (GeoAI) for satellite-based flood extent mapping systematically integrates artificial intelligence techniques with satellite data to identify flood events and assess their impacts, for disaster management and spatial decision-making. The primary output often includes flood extent maps, which delineate the affected areas, along with additional analytical outputs such as uncertainty estimation and change detection.
- [46] arXiv:2504.02255 (cross-list from cs.RO) [pdf, html, other]
-
Title: Bipedal Robust Walking on Uneven Footholds: Piecewise Slope LIPM with Discrete Model Predictive ControlSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
This study presents an enhanced theoretical formulation for bipedal hierarchical control frameworks under uneven terrain conditions. Specifically, owing to the inherent limitations of the Linear Inverted Pendulum Model (LIPM) in handling terrain elevation variations, we develop a Piecewise Slope LIPM (PS-LIPM). This innovative model enables dynamic adjustment of the Center of Mass (CoM) height to align with topographical undulations during single-step cycles. Another contribution is proposed a generalized Angular Momentum-based LIPM (G-ALIP) for CoM velocity compensation using Centroidal Angular Momentum (CAM) regulation. Building upon these advancements, we derive the DCM step-to-step dynamics for Model Predictive Control MPC formulation, enabling simultaneous optimization of step position and step duration. A hierarchical control framework integrating MPC with a Whole-Body Controller (WBC) is implemented for bipedal locomotion across uneven stepping stones. The results validate the efficacy of the proposed hierarchical control framework and the theoretical formulation.
- [47] arXiv:2504.02302 (cross-list from cs.SD) [pdf, html, other]
-
Title: Causal Self-supervised Pretrained Frontend with Predictive Code for Speech SeparationComments: arXiv admin note: text overlap with arXiv:2411.03085Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Speech separation (SS) seeks to disentangle a multi-talker speech mixture into single-talker speech streams. Although SS can be generally achieved using offline methods, such a processing paradigm is not suitable for real-time streaming applications. Causal separation models, which rely only on past and present information, offer a promising solution for real-time streaming. However, these models typically suffer from notable performance degradation due to the absence of future context. In this paper, we introduce a novel frontend that is designed to mitigate the mismatch between training and run-time inference by implicitly incorporating future information into causal models through predictive patterns. The pretrained frontend employs a transformer decoder network with a causal convolutional encoder as the backbone and is pretrained in a self-supervised manner with two innovative pretext tasks: autoregressive hybrid prediction and contextual knowledge distillation. These tasks enable the model to capture predictive patterns directly from mixtures in a self-supervised manner. The pretrained frontend subsequently serves as a feature extractor to generate high-quality predictive patterns. Comprehensive evaluations on synthetic and real-world datasets validated the effectiveness of the proposed pretrained frontend.
- [48] arXiv:2504.02352 (cross-list from cs.IT) [pdf, other]
-
Title: Liquid Neural Networks: Next-Generation AI for Telecom from First PrinciplesComments: 15 pages, 5 figures. Accepted by ZTE CommunicationsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Artificial intelligence (AI) has emerged as a transformative technology with immense potential to reshape the next-generation of wireless networks. By leveraging advanced algorithms and machine learning techniques, AI offers unprecedented capabilities in optimizing network performance, enhancing data processing efficiency, and enabling smarter decision-making processes. However, existing AI solutions face significant challenges in terms of robustness and interpretability. Specifically, current AI models exhibit substantial performance degradation in dynamic environments with varying data distributions, and the black-box nature of these algorithms raises concerns regarding safety, transparency, and fairness. This presents a major challenge in integrating AI into practical communication systems. Recently, a novel type of neural network, known as the liquid neural networks (LNNs), has been designed from first principles to address these issues. In this paper, we explore the potential of LNNs in telecommunications. First, we illustrate the mechanisms of LNNs and highlight their unique advantages over traditional networks. Then we unveil the opportunities that LNNs bring to future wireless networks. Furthermore, we discuss the challenges and design directions for the implementation of LNNs. Finally, we summarize the performance of LNNs in two case studies.
- [49] arXiv:2504.02362 (cross-list from cs.CV) [pdf, html, other]
-
Title: Brightness Perceiving for Recursive Low-Light Image EnhancementJournal-ref: IEEE Transactions on Artificial Intelligence Vol 5, no. 6, 3034--3045 (2023)Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Due to the wide dynamic range in real low-light scenes, there will be large differences in the degree of contrast degradation and detail blurring of captured images, making it difficult for existing end-to-end methods to enhance low-light images to normal exposure. To address the above issue, we decompose low-light image enhancement into a recursive enhancement task and propose a brightness-perceiving-based recursive enhancement framework for high dynamic range low-light image enhancement. Specifically, our recursive enhancement framework consists of two parallel sub-networks: Adaptive Contrast and Texture enhancement network (ACT-Net) and Brightness Perception network (BP-Net). The ACT-Net is proposed to adaptively enhance image contrast and details under the guidance of the brightness adjustment branch and gradient adjustment branch, which are proposed to perceive the degradation degree of contrast and details in low-light images. To adaptively enhance images captured under different brightness levels, BP-Net is proposed to control the recursive enhancement times of ACT-Net by exploring the image brightness distribution properties. Finally, in order to coordinate ACT-Net and BP-Net, we design a novel unsupervised training strategy to facilitate the training procedure. To further validate the effectiveness of the proposed method, we construct a new dataset with a broader brightness distribution by mixing three low-light datasets. Compared with eleven existing representative methods, the proposed method achieves new SOTA performance on six reference and no reference metrics. Specifically, the proposed method improves the PSNR by 0.9 dB compared to the existing SOTA method.
- [50] arXiv:2504.02375 (cross-list from math.OC) [pdf, other]
-
Title: A Comparative Study of MINLP and MPVC Formulations for Solving Complex Nonlinear Decision-Making Problems in Aerospace ApplicationsComments: Submitted to Optimal Control Applications and Methods (OCAM)Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
High-level decision-making for dynamical systems often involves performance and safety specifications that are activated or deactivated depending on conditions related to the system state and commands. Such decision-making problems can be naturally formulated as optimization problems where these conditional activations are regulated by discrete variables. However, solving these problems can be challenging numerically, even on powerful computing platforms, especially when the dynamics are nonlinear. In this work, we consider decision-making for nonlinear systems where certain constraints, as well as possible terms in the cost function, are activated or deactivated depending on the system state and commands. We show that these problems can be formulated either as mixed-integer nonlinear programs (MINLPs) or as mathematical programs with vanishing constraints (MPVCs), where the former formulation involves discrete decision variables, whereas the latter relies on continuous variables subject to structured nonconvex constraints. We discuss the different solution methods available for both formulations and demonstrate them on optimal trajectory planning problems in various aerospace applications. Finally, we compare the strengths and weaknesses of the MINLP and MPVC approaches through a focused case study on powered descent guidance with divert-feasible regions.
- [51] arXiv:2504.02386 (cross-list from cs.CV) [pdf, html, other]
-
Title: VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language ModelsComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
We present VoiceCraft-Dub, a novel approach for automated video dubbing that synthesizes high-quality speech from text and facial cues. This task has broad applications in filmmaking, multimedia creation, and assisting voice-impaired individuals. Building on the success of Neural Codec Language Models (NCLMs) for speech synthesis, our method extends their capabilities by incorporating video features, ensuring that synthesized speech is time-synchronized and expressively aligned with facial movements while preserving natural prosody. To inject visual cues, we design adapters to align facial features with the NCLM token space and introduce audio-visual fusion layers to merge audio-visual information within the NCLM framework. Additionally, we curate CelebV-Dub, a new dataset of expressive, real-world videos specifically designed for automated video dubbing. Extensive experiments show that our model achieves high-quality, intelligible, and natural speech synthesis with accurate lip synchronization, outperforming existing methods in human perception and performing favorably in objective evaluations. We also adapt VoiceCraft-Dub for the video-to-speech task, demonstrating its versatility for various applications.
- [52] arXiv:2504.02398 (cross-list from cs.CL) [pdf, html, other]
-
Title: Scaling Analysis of Interleaved Speech-Text Language ModelsSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Existing Speech Language Model (SLM) scaling analysis paints a bleak picture. They predict that SLMs require much more compute and data compared to text, leading some to question the feasibility of training high-quality SLMs. However, modern SLMs are often initialised from pre-trained TextLMs using speech-text interleaving to allow knowledge transfer. This raises the question - Do interleaved SLMs scale more efficiently than textless-SLMs? In this paper we answer a resounding, yes! We conduct scaling analysis of interleaved SLMs by training several dozen and analysing the scaling trends. We see that under this setup SLMs scale more efficiently with compute. Additionally, our results indicate that the scaling-dynamics are significantly different than textless-SLMs, suggesting one should allocate notably more of the compute budget for increasing model size over training tokens. We also study the role of synthetic data and TextLM model families in unlocking this potential. Results suggest, that our scaled up model achieves comparable performance with leading models on speech semantic metrics while using less compute and data than other approaches. We open source models, samples, and data - this https URL.
- [53] arXiv:2504.02402 (cross-list from cs.SD) [pdf, html, other]
-
Title: EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modelingComments: Our project page: this https URLSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
When sound waves hit an object, they induce vibrations that produce high-frequency and subtle visual changes, which can be used for recovering the sound. Early studies always encounter trade-offs related to sampling rate, bandwidth, field of view, and the simplicity of the optical path. Recent advances in event camera hardware show good potential for its application in visual sound recovery, because of its superior ability in capturing high-frequency signals. However, existing event-based vibration recovery methods are still sub-optimal for sound recovery. In this work, we propose a novel pipeline for non-contact sound recovery, fully utilizing spatial-temporal information from the event stream. We first generate a large training set using a novel simulation pipeline. Then we designed a network that leverages the sparsity of events to capture spatial information and uses Mamba to model long-term temporal information. Lastly, we train a spatial aggregation block to aggregate information from different locations to further improve signal quality. To capture event signals caused by sound waves, we also designed an imaging system using a laser matrix to enhance the gradient and collected multiple data sequences for testing. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.
- [54] arXiv:2504.02407 (cross-list from cs.SD) [pdf, html, other]
-
Title: F5R-TTS: Improving Flow Matching based Text-to-Speech with Group Relative Policy OptimizationSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
We present F5R-TTS, a novel text-to-speech (TTS) system that integrates Gradient Reward Policy Optimization (GRPO) into a flow-matching based architecture. By reformulating the deterministic outputs of flow-matching TTS into probabilistic Gaussian distributions, our approach enables seamless integration of reinforcement learning algorithms. During pretraining, we train a probabilistically reformulated flow-matching based model which is derived from F5-TTS with an open-source dataset. In the subsequent reinforcement learning (RL) phase, we employ a GRPO-driven enhancement stage that leverages dual reward metrics: word error rate (WER) computed via automatic speech recognition and speaker similarity (SIM) assessed by verification models. Experimental results on zero-shot voice cloning demonstrate that F5R-TTS achieves significant improvements in both speech intelligibility (relatively 29.5\% WER reduction) and speaker similarity (relatively 4.6\% SIM score increase) compared to conventional flow-matching based TTS systems. Audio samples are available at this https URL.
- [55] arXiv:2504.02420 (cross-list from cs.RO) [pdf, html, other]
-
Title: On learning racing policies with reinforcement learningSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Fully autonomous vehicles promise enhanced safety and efficiency. However, ensuring reliable operation in challenging corner cases requires control algorithms capable of performing at the vehicle limits. We address this requirement by considering the task of autonomous racing and propose solving it by learning a racing policy using Reinforcement Learning (RL). Our approach leverages domain randomization, actuator dynamics modeling, and policy architecture design to enable reliable and safe zero-shot deployment on a real platform. Evaluated on the F1TENTH race car, our RL policy not only surpasses a state-of-the-art Model Predictive Control (MPC), but, to the best of our knowledge, also represents the first instance of an RL policy outperforming expert human drivers in RC racing. This work identifies the key factors driving this performance improvement, providing critical insights for the design of robust RL-based control strategies for autonomous vehicles.
- [56] arXiv:2504.02479 (cross-list from cs.LG) [pdf, html, other]
-
Title: Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive TargetsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Machine Learning (stat.ML)
We propose a decentralized reinforcement learning solution for multi-agent shepherding of non-cohesive targets using policy-gradient methods. Our architecture integrates target-selection with target-driving through Proximal Policy Optimization, overcoming discrete-action constraints of previous Deep Q-Network approaches and enabling smoother agent trajectories. This model-free framework effectively solves the shepherding problem without prior dynamics knowledge. Experiments demonstrate our method's effectiveness and scalability with increased target numbers and limited sensing capabilities.
- [57] arXiv:2504.02494 (cross-list from cs.CV) [pdf, other]
-
Title: Semiconductor Wafer Map Defect Classification with Tiny Vision TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Semiconductor wafer defect classification is critical for ensuring high precision and yield in manufacturing. Traditional CNN-based models often struggle with class imbalances and recognition of the multiple overlapping defect types in wafer maps. To address these challenges, we propose ViT-Tiny, a lightweight Vision Transformer (ViT) framework optimized for wafer defect classification. Trained on the WM-38k dataset. ViT-Tiny outperforms its ViT-Base counterpart and state-of-the-art (SOTA) models, such as MSF-Trans and CNN-based architectures. Through extensive ablation studies, we determine that a patch size of 16 provides optimal performance. ViT-Tiny achieves an F1-score of 98.4%, surpassing MSF-Trans by 2.94% in four-defect classification, improving recall by 2.86% in two-defect classification, and increasing precision by 3.13% in three-defect classification. Additionally, it demonstrates enhanced robustness under limited labeled data conditions, making it a computationally efficient and reliable solution for real-world semiconductor defect detection.
- [58] arXiv:2504.02561 (cross-list from cs.NI) [pdf, html, other]
-
Title: Digital Twins for Internet of Battlespace Things (IoBT) CoalitionsSubjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
This paper presents a new framework for integrating Digital Twins (DTs) within Internet of battlespace Things (IoBT) coalitions. We introduce a novel three-tier architecture that enables efficient coordination and management of DT models across coalition partners while addressing key challenges in interoperability, security, and resource allocation. The architecture comprises specialized controllers at each tier: Digital Twin Coalition Partner (DTCP) controllers managing individual coalition partners' DT resources, a central Digital Twin Coalition(DTC) controller orchestrating cross-partner coordination, and Digital Twin Coalition Mission (DTCP) controllers handling mission-specific DT interactions. We propose a hybrid approach for DT model placement across edge devices, tactical nodes, and cloud infrastructure, optimizing performance while maintaining security and accessibility. The architecture leverages software-defined networking principles for dynamic resource allocation and slice management, enabling efficient sharing of computational and network resources between DT operations and primary IoBT functions. Our proposed framework aims to provide a robust foundation for deploying and managing Digital Twins in coalition warfare, enhancing situational awareness, decision-making capabilities, and operational effectiveness while ensuring secure and interoperable operations across diverse coalition partners.
- [59] arXiv:2504.02586 (cross-list from cs.SD) [pdf, other]
-
Title: Deep learning for music generation. Four approaches and their comparative evaluationJournal-ref: U.P.B. Scientific Bulletin, Series C, Vol. 85, Issue 4, 2023Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
This paper introduces four different artificial intelligence algorithms for music generation and aims to compare these methods not only based on the aesthetic quality of the generated music but also on their suitability for specific applications. The first set of melodies is produced by a slightly modified visual transformer neural network that is used as a language model. The second set of melodies is generated by combining chat sonification with a classic transformer neural network (the same method of music generation is presented in a previous research), the third set of melodies is generated by combining the Schillinger rhythm theory together with a classic transformer neural network, and the fourth set of melodies is generated using GPT3 transformer provided by OpenAI. A comparative analysis is performed on the melodies generated by these approaches and the results indicate that significant differences can be observed between them and regarding the aesthetic value of them, GPT3 produced the most pleasing melodies, and the newly introduced Schillinger method proved to generate better sounding music than previous sonification methods.
- [60] arXiv:2504.02604 (cross-list from cs.CL) [pdf, html, other]
-
Title: LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic DialectSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Developing Automatic Speech Recognition (ASR) systems for Tunisian Arabic Dialect is challenging due to the dialect's linguistic complexity and the scarcity of annotated speech datasets. To address these challenges, we propose the LinTO audio and textual datasets -- comprehensive resources that capture phonological and lexical features of Tunisian Arabic Dialect. These datasets include a variety of texts from numerous sources and real-world audio samples featuring diverse speakers and code-switching between Tunisian Arabic Dialect and English or French. By providing high-quality audio paired with precise transcriptions, the LinTO audio and textual datasets aim to provide qualitative material to build and benchmark ASR systems for the Tunisian Arabic Dialect.
Keywords -- Tunisian Arabic Dialect, Speech-to-Text, Low-Resource Languages, Audio Data Augmentation - [61] arXiv:2504.02607 (cross-list from cs.LG) [pdf, html, other]
-
Title: Learning Geometrically-Informed Lyapunov Functions with Deep Diffeomorphic RBF NetworksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
The practical deployment of learning-based autonomous systems would greatly benefit from tools that flexibly obtain safety guarantees in the form of certificate functions from data. While the geometrical properties of such certificate functions are well understood, synthesizing them using machine learning techniques still remains a challenge. To mitigate this issue, we propose a diffeomorphic function learning framework where prior structural knowledge of the desired output is encoded in the geometry of a simple surrogate function, which is subsequently augmented through an expressive, topology-preserving state-space transformation. Thereby, we achieve an indirect function approximation framework that is guaranteed to remain in the desired hypothesis space. To this end, we introduce a novel approach to construct diffeomorphic maps based on RBF networks, which facilitate precise, local transformations around data. Finally, we demonstrate our approach by learning diffeomorphic Lyapunov functions from real-world data and apply our method to different attractor systems.
- [62] arXiv:2504.02627 (cross-list from stat.CO) [pdf, html, other]
-
Title: Incorporating the ChEES Criterion into Sequential Monte Carlo SamplersComments: 16 pages, 9 figuresSubjects: Computation (stat.CO); Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Markov chain Monte Carlo (MCMC) methods are a powerful but computationally expensive way of performing non-parametric Bayesian inference. MCMC proposals which utilise gradients, such as Hamiltonian Monte Carlo (HMC), can better explore the parameter space of interest if the additional hyper-parameters are chosen well. The No-U-Turn Sampler (NUTS) is a variant of HMC which is extremely effective at selecting these hyper-parameters but is slow to run and is not suited to GPU architectures. An alternative to NUTS, Change in the Estimator of the Expected Square HMC (ChEES-HMC) was shown not only to run faster than NUTS on GPU but also sample from posteriors more efficiently. Sequential Monte Carlo (SMC) samplers are another sampling method which instead output weighted samples from the posterior. They are very amenable to parallelisation and therefore being run on GPUs while having additional flexibility in their choice of proposal over MCMC. We incorporate (ChEEs-HMC) as a proposal into SMC samplers and demonstrate competitive but faster performance than NUTS on a number of tasks.
- [63] arXiv:2504.02688 (cross-list from cs.NI) [pdf, html, other]
-
Title: Handover and SINR-Aware Path Optimization in 5G-UAV mmWave Communication using DRLSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Path planning and optimization for unmanned aerial vehicles (UAVs)-assisted next-generation wireless networks is critical for mobility management and ensuring UAV safety and ubiquitous connectivity, especially in dense urban environments with street canyons and tall buildings. Traditional statistical and model-based techniques have been successfully used for path optimization in communication networks. However, when dynamic channel propagation characteristics such as line-of-sight (LOS), interference, handover, and signal-to-interference and noise ratio (SINR) are included in path optimization, statistical and model-based path planning solutions become obsolete since they cannot adapt to the dynamic and time-varying wireless channels, especially in the mmWave bands. In this paper, we propose a novel model-free actor-critic deep reinforcement learning (AC-DRL) framework for path optimization in UAV-assisted 5G mmWave wireless networks, which combines four important aspects of UAV communication: \textit{flight time, handover, connectivity and SINR}. We train an AC-RL agent that enables a UAV connected to a gNB to determine the optimal path to a desired destination in the shortest possible time with minimal gNB handover, while maintaining connectivity and the highest possible SINR. We train our model with data from a powerful ray tracing tool called Wireless InSite, which uses 3D images of the propagation environment and provides data that closely resembles the real propagation environment. The simulation results show that our system has superior performance in tracking high SINR compared to other selected RL algorithms.
- [64] arXiv:2504.02697 (cross-list from cs.CV) [pdf, html, other]
-
Title: Learning Phase Distortion with Selective State Space Models for Video Turbulence MitigationComments: CVPR 2025, project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Atmospheric turbulence is a major source of image degradation in long-range imaging systems. Although numerous deep learning-based turbulence mitigation (TM) methods have been proposed, many are slow, memory-hungry, and do not generalize well. In the spatial domain, methods based on convolutional operators have a limited receptive field, so they cannot handle a large spatial dependency required by turbulence. In the temporal domain, methods relying on self-attention can, in theory, leverage the lucky effects of turbulence, but their quadratic complexity makes it difficult to scale to many frames. Traditional recurrent aggregation methods face parallelization challenges.
In this paper, we present a new TM method based on two concepts: (1) A turbulence mitigation network based on the Selective State Space Model (MambaTM). MambaTM provides a global receptive field in each layer across spatial and temporal dimensions while maintaining linear computational complexity. (2) Learned Latent Phase Distortion (LPD). LPD guides the state space model. Unlike classical Zernike-based representations of phase distortion, the new LPD map uniquely captures the actual effects of turbulence, significantly improving the model's capability to estimate degradation by reducing the ill-posedness. Our proposed method exceeds current state-of-the-art networks on various synthetic and real-world TM benchmarks with significantly faster inference speed. The code is available at this http URL. - [65] arXiv:2504.02712 (cross-list from cs.IT) [pdf, other]
-
Title: TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of ModelsXinquan Wang, Fenghao Zhu, Chongwen Huang, Zhaohui Yang, Zhaoyang Zhang, Sami Muhaidat, Chau Yuen, Mérouane DebbahComments: 6 pages; submitted to 2025 IEEE VTC FallSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Large language models (LLMs) face significant challenges in specialized domains like telecommunication (Telecom) due to technical complexity, specialized terminology, and rapidly evolving knowledge. Traditional methods, such as scaling model parameters or retraining on domain-specific corpora, are computationally expensive and yield diminishing returns, while existing approaches like retrieval-augmented generation, mixture of experts, and fine-tuning struggle with accuracy, efficiency, and coordination. To address this issue, we propose Telecom mixture of models (TeleMoM), a consensus-driven ensemble framework that integrates multiple LLMs for enhanced decision-making in Telecom. TeleMoM employs a two-stage process: proponent models generate justified responses, and an adjudicator finalizes decisions, supported by a quality-checking mechanism. This approach leverages strengths of diverse models to improve accuracy, reduce biases, and handle domain-specific complexities effectively. Evaluation results demonstrate that TeleMoM achieves a 9.7\% increase in answer accuracy, highlighting its effectiveness in Telecom applications.
- [66] arXiv:2504.02781 (cross-list from cs.LG) [pdf, html, other]
-
Title: Towards Green AI-Native Networks: Evaluation of Neural Circuit Policy for Estimating Energy Consumption of Base StationsComments: 15 pages, 9 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP)
Optimization of radio hardware and AI-based network management software yield significant energy savings in radio access networks. The execution of underlying Machine Learning (ML) models, which enable energy savings through recommended actions, may require additional compute and energy, highlighting the opportunity to explore and adopt accurate and energy-efficient ML technologies. This work evaluates the novel use of sparsely structured Neural Circuit Policies (NCPs) in a use case to estimate the energy consumption of base stations. Sparsity in ML models yields reduced memory, computation and energy demand, hence facilitating a low-cost and scalable solution. We also evaluate the generalization capability of NCPs in comparison to traditional and widely used ML models such as Long Short Term Memory (LSTM), via quantifying their sensitivity to varying model hyper-parameters (HPs). NCPs demonstrated a clear reduction in computational overhead and energy consumption. Moreover, results indicated that the NCPs are robust to varying HPs such as number of epochs and neurons in each layer, making them a suitable option to ease model management and to reduce energy consumption in Machine Learning Operations (MLOps) in telecommunications.
- [67] arXiv:2504.02819 (cross-list from cs.CV) [pdf, html, other]
-
Title: GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture RingsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Symmetry, where certain features remain invariant under geometric transformations, can often serve as a powerful prior in designing convolutional neural networks (CNNs). While conventional CNNs inherently support translational equivariance, extending this property to rotation and reflection has proven challenging, often forcing a compromise between equivariance, efficiency, and information loss. In this work, we introduce Gaussian Mixture Ring Convolution (GMR-Conv), an efficient convolution kernel that smooths radial symmetry using a mixture of Gaussian-weighted rings. This design mitigates discretization errors of circular kernels, thereby preserving robust rotation and reflection equivariance without incurring computational overhead. We further optimize both the space and speed efficiency of GMR-Conv via a novel parameterization and computation strategy, allowing larger kernels at an acceptable cost. Extensive experiments on eight classification and one segmentation datasets demonstrate that GMR-Conv not only matches conventional CNNs' performance but can also surpass it in applications with orientation-less data. GMR-Conv is also proven to be more robust and efficient than the state-of-the-art equivariant learning methods. Our work provides inspiring empirical evidence that carefully applied radial symmetry can alleviate the challenges of information loss, marking a promising advance in equivariant network architectures. The code is available at this https URL.
- [68] arXiv:2504.02823 (cross-list from cs.CV) [pdf, html, other]
-
Title: STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security InspectionDivya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari, Neha Gour, Abderaouf Behouch, Taimur Hassan, Syed Talal Wasim, Nabil Maalej, Muzammal Naseer, Juergen Gall, Mohammed Bennamoun, Ernesto Damiani, Naoufel WerghiComments: Accepted at CVPR 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Advancements in Computer-Aided Screening (CAS) systems are essential for improving the detection of security threats in X-ray baggage scans. However, current datasets are limited in representing real-world, sophisticated threats and concealment tactics, and existing approaches are constrained by a closed-set paradigm with predefined labels. To address these challenges, we introduce STCray, the first multimodal X-ray baggage security dataset, comprising 46,642 image-caption paired scans across 21 threat categories, generated using an X-ray scanner for airport security. STCray is meticulously developed with our specialized protocol that ensures domain-aware, coherent captions, that lead to the multi-modal instruction following data in X-ray baggage security. This allows us to train a domain-aware visual AI assistant named STING-BEE that supports a range of vision-language tasks, including scene comprehension, referring threat localization, visual grounding, and visual question answering (VQA), establishing novel baselines for multi-modal learning in X-ray baggage security. Further, STING-BEE shows state-of-the-art generalization in cross-domain settings. Code, data, and models are available at this https URL.
Cross submissions (showing 33 of 33 entries)
- [69] arXiv:2012.12689 (replaced) [pdf, html, other]
-
Title: The Less Intelligent the Elements, the More Intelligent the Whole. Or, Possibly Not?Comments: 30 pages, 3 figures, 3 tablesSubjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Adaptation and Self-Organizing Systems (nlin.AO)
We approach the debate on how ``intelligent'' artificial agents should be, by endowing the preys and predators of the Lotka-Volterra model with behavioural algorithms characterized by different levels of sophistication. We find that by endowing both preys and predators with the capability of making predictions based on linear extrapolation a novel sort of dynamic equilibrium appears, where both species co-exist while both populations grow indefinitely. While we confirm that, in general, simple agents favour the emergence of complex collective behaviour, we also suggest that the capability of individuals to take first-order derivatives of one other's behaviour may allow the collective computation of derivatives of any order.
- [70] arXiv:2311.02003 (replaced) [pdf, html, other]
-
Title: Efficient Model-Based Deep Learning via Network Pruning and Fine-TuningSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Model-based deep learning (MBDL) is a powerful methodology for designing deep models to solve imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits their applicability in certain large-scale applications. Here we make two contributions to address this issue: First, we show how structured pruning can be adopted to reduce the number of parameters in MBDL networks. Second, we present three methods to fine-tune the pruned MBDL networks to mitigate potential performance loss. Each fine-tuning strategy has a unique benefit that depends on the presence of a pre-trained model and a high-quality ground truth. We show that our pruning and fine-tuning approach can accelerate image reconstruction using popular deep equilibrium learning (DEQ) and deep unfolding (DU) methods by 50% and 32%, respectively, with nearly no performance loss. This work thus offers a step forward for solving inverse problems by showing the potential of pruning to improve the scalability of MBDL. Code is available at this https URL .
- [71] arXiv:2311.08548 (replaced) [pdf, html, other]
-
Title: Topology of surface electromyogram signals: hand gesture decoding on Riemannian manifoldsJournal-ref: 2024 J. Neural Eng. 21 036047Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Quantitative Methods (q-bio.QM)
$\textit{Objective.}$ In this article, we present data and methods for decoding hand gestures using surface electromyogram (EMG) signals. EMG-based upper limb interfaces are valuable for amputee rehabilitation, artificial supernumerary limb augmentation, gestural control of computers, and virtual and augmented reality applications. $\textit{Approach.}$ To achieve this, we collect EMG signals from the upper limb using surface electrodes placed at key muscle sites involved in hand movements. Additionally, we design and evaluate efficient models for decoding EMG signals. $\textit{Main results.}$ Our findings reveal that the manifold of symmetric positive definite (SPD) matrices serves as an effective embedding space for EMG signals. Moreover, for the first time, we quantify the distribution shift of these signals across individuals. $\textit{Significance.}$ Overall, our approach demonstrates significant potential for developing efficient and interpretable methods for decoding EMG signals. This is particularly important as we move toward the broader adoption of EMG-based wrist interfaces.
- [72] arXiv:2408.10522 (replaced) [pdf, html, other]
-
Title: On the Security of Directional Modulation via Time Modulated Arrays Using OFDM WaveformsComments: IEEE Trans. on Wireless Communications, minor revisionSubjects: Signal Processing (eess.SP)
Time-modulated arrays (TMAs) transmitting information bearing orthogonal frequency division multiplexing (OFDM) signals can achieve directional modulation. By turning its antennas on and off in a periodic fashion, the TMA can be configured to transmit the OFDM signal undistorted in the direction of a legitimate receiver and scrambled everywhere else. This capability has been proposed as means of securing the transmitted information from unauthorized users. In this paper, we investigate how secure the TMA OFDM system is, by looking at the transmitted signal from an eavesdropper's point of view. We demonstrate that the symbols observed by the eavesdropper across the OFDM subcarriers are linear combinations of the source symbols, with mixing coefficients that are unknown to the eavesdropper. We propose the use of independent component analysis (ICA) theory to obtain the mixing matrix and provide methods to resolve the column permutation and scaling ambiguities, which are inherent in the ICA problem, by leveraging the structure of the mixing matrix and assuming knowledge of the characteristics of the TMA OFDM system. In general, resolving the ambiguities and recovering the symbols requires long data. Specifically for the case of the constant modulus symbols, we propose a modified ICA approach, namely the constant-modulus ICA (CMICA), that provides a good estimate of the mixing matrix using a small number of received samples. We also propose countermeasures which the TMA could undertake in order to defend the scrambling. Simulation results are presented to demonstrate the effectiveness, efficiency and robustness of our scrambling defying and defending schemes.
- [73] arXiv:2409.02832 (replaced) [pdf, html, other]
-
Title: Diffraction Aided Wireless PositioningJournal-ref: IEEE Transactions on Wireless Communications, 2025Subjects: Signal Processing (eess.SP)
Wireless positioning in Non-Line-of-Sight (NLoS) scenarios presents significant challenges due to multipath effects that lead to biased measurements and reduced positioning accuracy. This paper revisits electromagnetic field theory related to diffraction and in the context of wireless positioning and proposes a novel positioning technique that greatly improves accuracy in NLoS environments dominated by diffraction. The method is applied to a critical public safety use case: precisely locating at-risk individuals within buildings, with a particular focus on improving 3D positioning and z-axis accuracy. By leveraging the Geometrical Theory of Diffraction (GTD), the approach introduces an innovative NLoS path length model and a new NLOS positioning technique. Using Fisher information analysis, we establish the conditions required for 3D positioning and derive lower bounds on positioning performance for both 3D and z-axis estimates for the proposed NLOS positioning technique. Additionally, we propose an algorithmic implementation of the proposed NLoS positioning method using non-linear least squares estimation, which we term D-NLS. The positioning performance of our proposed NLOs positioning technique is validated using an extensive ray-tracing simulation. The numerical results highlight the superiority of our approach in outdoor-to-indoor environments, which directly estimates NLoS path lengths and delivers significant performance enhancements over existing methods for both 3D and z-axis positioning scenarios.
- [74] arXiv:2409.09092 (replaced) [pdf, html, other]
-
Title: Harnessing On-Machine Metrology Data for Prints with a Surrogate Model for Laser Powder Directed Energy DepositionComments: 19 pages, 9 figuresSubjects: Systems and Control (eess.SY); Materials Science (cond-mat.mtrl-sci)
In this study, we leverage the massive amount of multi-modal on-machine metrology data generated from Laser Powder Directed Energy Deposition (LP-DED) to construct a comprehensive surrogate model of the 3D printing process. By employing Dynamic Mode Decomposition with Control (DMDc), a data-driven technique, we capture the complex physics inherent in this extensive dataset. This physics-based surrogate model emphasizes thermodynamically significant quantities, enabling us to accurately predict key process outcomes. The model ingests 21 process parameters, including laser power, scan rate, and position, while providing outputs such as melt pool temperature, melt pool size, and other essential observables. Furthermore, it incorporates uncertainty quantification to provide bounds on these predictions, enhancing reliability and confidence in the results. We then deploy the surrogate model on a new, unseen part and monitor the printing process as validation of the method. Our experimental results demonstrate that the predictions align with actual measurements with high accuracy, confirming the effectiveness of our approach. This methodology not only facilitates real-time predictions but also operates at process-relevant speeds, establishing a basis for implementing feedback control in LP-DED.
- [75] arXiv:2409.18257 (replaced) [pdf, html, other]
-
Title: Developing a Dual-Stage Vision Transformer Model for Lung Disease ClassificationComments: 3 pages, 2 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Lung diseases have become a prevalent problem throughout the United States, affecting over 34 million people. Accurate and timely diagnosis of the different types of lung diseases is critical, and Artificial Intelligence (AI) methods could speed up these processes. A dual-stage vision transformer is built throughout this research by integrating a Vision Transformer (ViT) and a Swin Transformer to classify 14 different lung diseases from X-ray scans of patients with these diseases. The proposed model achieved an accuracy of 92.06% on a label-level when making predictions on an unseen testing subset of the dataset after data preprocessing and training the neural network. The model showed promise for accurately classifying lung diseases and diagnosing patients who suffer from these harmful diseases.
- [76] arXiv:2409.19939 (replaced) [pdf, html, other]
-
Title: A database of upper limb surface electromyogram signals from demographically diverse individualsHarshavardhana T. Gowda, Neha Kaul, Carlos Carrasco, Marcus A. Battraw, Safa Amer, Saniya Kotwal, Selena Lam, Zachary McNaughton, Ferdous Rahimi, Sana Shehabi, Jonathon S. Schofield, Lee M. MillerJournal-ref: Sci Data 12, 517 (2025)Subjects: Signal Processing (eess.SP)
Upper limb based neuromuscular interfaces aim to provide a seamless way for humans to interact with technology. Among noninvasive interfaces, surface electromyogram (EMG) signals hold significant promise. However, their sensitivity to physiological and anatomical factors remains poorly understood, raising questions about how these factors influence gesture decoding across individuals or groups. To facilitate the study of signal distribution shifts across individuals or groups of individuals, we present a dataset of upper limb EMG signals and physiological measures from 91 demographically diverse adults. Participants were selected to represent a range of ages (18 to 92 years) and body mass indices (healthy, overweight, and obese). The dataset also includes measures such as skin hydration and elasticity, which may affect EMG signals. This dataset provides a basis to study demographic confounds in EMG signals and serves as a benchmark to test the development of fair and unbiased algorithms that enable accurate hand gesture decoding across demographically diverse subjects. Additionally, we validate the quality of the collected data using state-of-the-art gesture decoding techniques.
- [77] arXiv:2411.19258 (replaced) [pdf, html, other]
-
Title: L4acados: Learning-based models for acados, applied to Gaussian process-based predictive controlAmon Lahr, Joshua Näf, Kim P. Wabersich, Jonathan Frey, Pascal Siehl, Andrea Carron, Moritz Diehl, Melanie N. ZeilingerSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Incorporating learning-based models, such as artificial neural networks or Gaussian processes, into model predictive control (MPC) strategies can significantly improve control performance and online adaptation capabilities for real-world applications. Still, enabling state-of-the-art implementations of learning-based models for MPC is complicated by the challenge of interfacing machine learning frameworks with real-time optimal control software. This work aims at filling this gap by incorporating external sensitivities in sequential quadratic programming solvers for nonlinear optimal control. To this end, we provide L4acados, a general framework for incorporating Python-based residual models in the real-time optimal control software acados. By computing external sensitivities via a user-defined Python module, L4acados enables the implementation of MPC controllers with learning-based residual models in acados, while supporting parallelization of sensitivity computations when preparing the quadratic subproblems. We demonstrate significant speed-ups and superior scaling properties of L4acados compared to available software using a neural-network-based control example. Last, we provide an efficient and modular real-time implementation of Gaussian process-based MPC using L4acados, which is applied to two hardware examples: autonomous miniature racing, as well as motion control of a full-scale autonomous vehicle for an ISO lane change maneuver.
- [78] arXiv:2412.04255 (replaced) [pdf, other]
-
Title: Model-Agnostic Meta-Learning for Fault Diagnosis of Induction Motors in Data-Scarce Environments with Varying Operating Conditions and Electric Drive NoiseAli Pourghoraba, MohammadSadegh KhajueeZadeh, Ali Amini, Abolfazl Vahedi, Gholam Reza Agah, Akbar RahidehSubjects: Systems and Control (eess.SY)
Reliable mechanical fault detection with limited data is crucial for the effective operation of induction machines, particularly given the real-world challenges present in industrial datasets, such as significant imbalances between healthy and faulty samples and the scarcity of data representing faulty conditions. This research introduces an innovative meta-learning approach to address these issues, focusing on mechanical fault detection in induction motors across diverse operating conditions while mitigating the adverse effects of drive noise in scenarios with limited data. The process of identifying faults under varying operating conditions is framed as a few-shot classification challenge and approached through a model-agnostic meta-learning strategy. Specifically, this approach begins with training a meta-learner across multiple interconnected fault-diagnosis tasks conducted under different operating conditions. In this stage, cross-entropy is utilized to optimize parameters and develop a robust representation of the tasks. Subsequently, the parameters of the meta-learner are fine-tuned for new tasks, enabling rapid adaptation using only a small number of samples. This method achieves excellent accuracy in fault detection across various conditions, even when data availability is restricted. The findings indicate that the proposed model outperforms other sophisticated techniques, providing enhanced generalization and quicker adaptation. The accuracy of fault diagnosis reaches a minimum of 99%, underscoring the model's effectiveness for reliable fault identification.
- [79] arXiv:2501.07715 (replaced) [pdf, html, other]
-
Title: Analyzing the Role of the DSO in Electricity Trading of VPPs via a Stackelberg Game ModelComments: Accepted by 16th IEEE PowerTech conference in Kiel. 6 pagesSubjects: Systems and Control (eess.SY)
The increasing penetration of distributed energy resources has sparked interests in participating in power markets. Here, we consider two settings where Virtual Power Plants (VPPs) with some flexible resources participate in the electricity trading, either directly in the wholesale electricity market, or interfaced by the Distribution System Operator (DSO) who is the transaction organizer. In order to study the role of DSO as a stakeholder, a Stackelberg game is represented via a bi-level model: the DSO maximizes profits at the upper level, while the VPPs minimize operating costs at the lower level. To solve this problem, the Karush-Kuhn-Tucker conditions of lower level is deduced to achieve a single-level problem. The results show that the role of the DSO as an intermediary agent leads to a decrease in operating costs of the VPPs by organizing lower-level trading, while making a profit for itself. However, this seemingly win-win result comes at the cost of losing wholesale market interests, which implies that stakeholders need to abide by regulatory constraints in the electricity market.
- [80] arXiv:2501.07830 (replaced) [pdf, html, other]
-
Title: Deep Learning Waveform Channel Modeling for Wideband Optical Fiber Transmission: Model Comparisons, Challenges and Potential SolutionsMinghui Shi, Hang Yang, Zekun Niu, Chuyan Zeng, Junzhe Xiao, Yunfan Zhang, Mingzhe Chen, Weisheng Hu, Lilin YiSubjects: Signal Processing (eess.SP)
Fast and accurate waveform simulation is critical for understanding fiber channel characteristics, developing digital signal processing (DSP) technologies, optimizing optical network configurations, and advancing the optical fiber transmission system towards wideband. Deep learning (DL) has emerged as a powerful tool for waveform modeling, offering high accuracy and low complexity compared to traditional split-step Fourier method (SSFM), due to its strong nonlinear fitting capabilities and efficient parallel computation. However, most DL methods are designed for few-channel and low-rate WDM systems, leaving their scalability to wideband systems uncertain. Moreover, the lack of a standardized accuracy evaluation method and the inconsistent results between waveform errors and transmission performance errors, hinders fair comparisons of various DL schemes. In this paper, we introduce a DSP-assisted accuracy evaluation method integrated with nonlinear DSP, providing a fair benchmark for evaluating the accuracy of DL models. Using this method, we conduct a comprehensive comparison of DL schemes, ranging from simple configurations to more complex wideband setups. The feature decoupled distributed method combining with bidirectional long short-term memory achieves the better performance compared to other DL schemes. Furthermore, in scenarios with more-channel and higher-rate, the performance advantages of FDD-BiLSTM will be further improved. However, as the number of channels and symbol rates increase, the performance of FDD-BiLSTM still gradually deteriorate. We analyze these challenges from three perspectives: the more intricate linear and nonlinear effects, the higher sampling rate required for SSFM. To address these challenges, we discuss potential solutions from two aspects: incorporating more prior physical knowledge and optimizing the structure of DL models.
- [81] arXiv:2502.05510 (replaced) [pdf, html, other]
-
Title: Data-Driven Neural Certificate SynthesisComments: 18 pages, submitted to AutomaticaSubjects: Systems and Control (eess.SY)
We investigate the problem of verifying different properties of discrete time dynamical systems, namely, reachability, safety and reach-while-avoid. To achieve this, we adopt a data driven perspective and using past systems' trajectories as data, we aim at learning a specific function termed \emph{certificate} for each property we wish to verify. The certificate construction problem is treated as a safety informed neural network training process, where we use a neural network to learn the parameterization of each certificate, while the loss function we seek to minimize is designed to encompass conditions on the certificate to be learned that encode the satisfaction of the associated property. Besides learning a certificate, we quantify probabilistically its generalization properties, namely, how likely it is for a certificate to be valid (and hence for the associated property to be satisfied) when it comes to a new system trajectory not included in the training data set. We view this problem under the realm of probably approximately correct (PAC) learning under the notion of compression, and use recent advancements of the so-called scenario approach to obtain scalable generalization bounds on the learned certificates. To achieve this, we design a novel algorithm that minimizes the loss function and hence constructs a certificate, and at the same time determines a quantity termed compression, which is instrumental in obtaining meaningful probabilistic guarantees. This process is novel per se and provides a constructive mechanism for compression set calculation, thus opening the road for its use to more general non-convex optimization problems. We verify the efficacy of our methodology on several numerical case studies, and compare it (both theoretically and numerically) with closely related results on data-driven property verification.
- [82] arXiv:2502.09654 (replaced) [pdf, html, other]
-
Title: Heterogeneous Mixture of Experts for Remote Sensing Image Super-ResolutionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Remote sensing image super-resolution (SR) aims to reconstruct high-resolution remote sensing images from low-resolution inputs, thereby addressing limitations imposed by sensors and imaging conditions. However, the inherent characteristics of remote sensing images, including diverse ground object types and complex details, pose significant challenges to achieving high-quality reconstruction. Existing methods typically employ a uniform structure to process various types of ground objects without distinction, making it difficult to adapt to the complex characteristics of remote sensing images. To address this issue, we introduce a Mixture of Experts (MoE) model and design a set of heterogeneous experts. These experts are organized into multiple expert groups, where experts within each group are homogeneous while being heterogeneous across groups. This design ensures that specialized activation parameters can be employed to handle the diverse and intricate details of ground objects effectively. To better accommodate the heterogeneous experts, we propose a multi-level feature aggregation strategy to guide the routing process. Additionally, we develop a dual-routing mechanism to adaptively select the optimal expert for each pixel. Experiments conducted on the UCMerced and AID datasets demonstrate that our proposed method achieves superior SR reconstruction accuracy compared to state-of-the-art methods. The code will be available at this https URL.
- [83] arXiv:2503.05022 (replaced) [pdf, other]
-
Title: Lessons learned from field demonstrations of model predictive control and reinforcement learning for residential and commercial HVAC: A reviewArash J. Khabbazi, Elias N. Pergantis, Levi D. Reyes Premer, Panagiotis Papageorgiou, Alex H. Lee, James E. Braun, Gregor P. Henze, Kevin J. KircherSubjects: Systems and Control (eess.SY)
A large body of simulation research suggests that model predictive control (MPC) and reinforcement learning (RL) for heating, ventilation, and air-conditioning (HVAC) in residential and commercial buildings could reduce energy costs, pollutant emissions, and strain on power grids. Despite this potential, neither MPC nor RL has seen widespread industry adoption. Field demonstrations could accelerate MPC and RL adoption by providing real-world data that support the business case for deployment. Here we review 24 papers that document field demonstrations of MPC and RL in residential buildings and 80 in commercial buildings. After presenting demographic information -- such as experiment scopes, locations, and durations -- this paper analyzes experiment protocols and their influence on performance estimates. We find that 71% of the reviewed field demonstrations use experiment protocols that may lead to unreliable performance estimates. Over the remaining 29% that we view as reliable, the weighted-average cost savings, weighted by experiment duration, are 16% in residential buildings and 13% in commercial buildings. While these savings are potentially attractive, making the business case for MPC and RL also requires characterizing the costs of deployment, operation, and maintenance. Only 13 of the 104 reviewed papers report these costs or discuss related challenges. Based on these observations, we recommend directions for future field research, including: Improving experiment protocols; reporting deployment, operation, and maintenance costs; designing algorithms and instrumentation to reduce these costs; controlling HVAC equipment alongside other distributed energy resources; and pursuing emerging objectives such as peak shaving, arbitraging wholesale energy prices, and providing power grid reliability services.
- [84] arXiv:2503.10419 (replaced) [pdf, html, other]
-
Title: A nonlinear real time capable motion cueing algorithm based on deep reinforcement learningHendrik Scheidel, Camilo Gonzalez, Houshyar Asadi, Tobias Bellmann, Andreas Seefried, Shady Mohamed, Saeid NahavandiSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
In motion simulation, motion cueing algorithms are used for the trajectory planning of the motion simulator platform, where workspace limitations prevent direct reproduction of reference trajectories. Strategies such as motion washout, which return the platform to its center, are crucial in these settings. For serial robotic MSPs with highly nonlinear workspaces, it is essential to maximize the efficient utilization of the MSPs kinematic and dynamic capabilities. Traditional approaches, including classical washout filtering and linear model predictive control, fail to consider platform-specific, nonlinear properties, while nonlinear model predictive control, though comprehensive, imposes high computational demands that hinder real-time, pilot-in-the-loop application without further simplification. To overcome these limitations, we introduce a novel approach using deep reinforcement learning for motion cueing, demonstrated here for the first time in a 6-degree-of-freedom setting with full consideration of the MSPs kinematic nonlinearities. Previous work by the authors successfully demonstrated the application of DRL to a simplified 2-DOF setup, which did not consider kinematic or dynamic constraints. This approach has been extended to all 6 DOF by incorporating a complete kinematic model of the MSP into the algorithm, a crucial step for enabling its application on a real motion simulator. The training of the DRL-MCA is based on Proximal Policy Optimization in an actor-critic implementation combined with an automated hyperparameter optimization. After detailing the necessary training framework and the algorithm itself, we provide a comprehensive validation, demonstrating that the DRL MCA achieves competitive performance against established algorithms. Moreover, it generates feasible trajectories by respecting all system constraints and meets all real-time requirements with low...
- [85] arXiv:2503.19949 (replaced) [pdf, html, other]
-
Title: Automated Video-EEG Analysis in Epilepsy Studies: Advances and ChallengesSubjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
Epilepsy is typically diagnosed through electroencephalography (EEG) and long-term video-EEG (vEEG) monitoring. The manual analysis of vEEG recordings is time-consuming, necessitating automated tools for seizure detection. Recent advancements in machine learning have shown promise in real-time seizure detection and prediction using EEG and video data. However, diversity of seizure symptoms, markup ambiguities, and limited availability of multimodal datasets hinder progress. This paper reviews the latest developments in automated video-EEG analysis and discusses the integration of multimodal data. We also propose a novel pipeline for treatment effect estimation from vEEG data using concept-based learning, offering a pathway for future research in this domain.
- [86] arXiv:2504.00276 (replaced) [pdf, html, other]
-
Title: On-the-fly Surrogation for Complex Nonlinear DynamicsComments: Preprint submitted to the 2025 64th IEEE Conference on Decision and Control (CDC)Subjects: Systems and Control (eess.SY)
High-fidelity models are essential for accurately capturing nonlinear system dynamics. However, simulation of these models is often computationally too expensive and, due to their complexity, they are not directly suitable for analysis, control design or real-time applications. Surrogate modelling techniques seek to construct simplified representations of these systems with minimal complexity, but adequate information on the dynamics given a simulation, analysis or synthesis objective at hand. Despite the widespread availability of system linearizations and the growing computational potential of autograd methods, there is no established approach that systematically exploits them to capture the underlying global nonlinear dynamics. This work proposes a novel surrogate modelling approach that can efficiently build a global representation of the dynamics on-the-fly from local system linearizations without ever explicitly computing a model. Using radial basis function interpolation and the second fundamental theorem of calculus, the surrogate model is only computed at its evaluation, enabling rapid computation for simulation and analysis and seamless incorporation of new linearization data. The efficiency and modelling capabilities of the method are demonstrated on simulation examples.
- [87] arXiv:2304.05597 (replaced) [pdf, html, other]
-
Title: On Some Geometric Behavior of Value Iteration on the Orthant: Switching System PerspectiveSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
In this paper, the primary goal is to offer additional insights into the value iteration through the lens of switching system models in the control community. These models establish a connection between value iteration and switching system theory and reveal additional geometric behaviors of value iteration in solving discounted Markov decision problems. Specifically, the main contributions of this paper are twofold: 1) We provide a switching system model of value iteration and, based on it, offer a different proof for the contraction property of the value iteration. 2) Furthermore, from the additional insights, new geometric behaviors of value iteration are proven when the initial iterate lies in a special region. We anticipate that the proposed perspectives might have the potential to be a useful tool, applicable in various settings. Therefore, further development of these methods could be a valuable avenue for future research.
- [88] arXiv:2404.01551 (replaced) [pdf, html, other]
-
Title: Safety-Aware Multi-Agent Learning for Dynamic Network BridgingComments: 8 pages, 18 equations, 4 figures, 1 algorithm, and 1 tableSubjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Addressing complex cooperative tasks in safety-critical environments poses significant challenges for multi-agent systems, especially under conditions of partial observability. We focus on a dynamic network bridging task, where agents must learn to maintain a communication path between two moving targets. To ensure safety during training and deployment, we integrate a control-theoretic safety filter that enforces collision avoidance through local setpoint updates. We develop and evaluate multi-agent reinforcement learning safety-informed message passing, showing that encoding safety filter activations as edge-level features improves coordination. The results suggest that local safety enforcement and decentralized learning can be effectively combined in distributed multi-agent tasks.
- [89] arXiv:2408.04936 (replaced) [pdf, html, other]
-
Title: Hybrid lunar ISRU plant: a comparative analysis with carbothermal reduction and water extractionKosuke Ikeya, Francisco J. Guerrero-Gonzalez, Luca Kiewiet, Michel-Alexandre Cardin, Jan Cilliers, Stanley Starr, Kathryn HadlerComments: 29 pages, 22 figures, 8 tables, accepted by Acta AstronauticaSubjects: Chemical Physics (physics.chem-ph); Systems and Control (eess.SY)
To establish a self-sustained human presence in space and to explore deeper into the solar system, extensive research has been conducted on In-Situ Resource Utilization (ISRU) systems. Past studies have proposed and researched many technologies to produce oxygen from regolith, such as carbothermal reduction and water extraction from icy regolith, to utilize it for astronauts' life support and as the propellant of space systems. However, determining the most promising technology remains challenging due to uncertainties in the lunar environment and processing methods. To better understand the lunar environment and ISRU operations, it is crucial to gather more information. Motivated by this need for information gathering, this paper proposes a new ISRU plant architecture integrating carbothermal reduction of dry regolith and water extraction from icy regolith. Two different hybrid plant architectures integrating both technologies (1) in parallel and (2) in series are examined. The former involves mining and processing in both a Permanently Shadowed Region (PSR) and a peak of eternal light in parallel, while the latter solely mines in a PSR. In this series hybrid architecture, the dry regolith tailings from water extraction are further processed by carbothermal reduction. This paper conducts a comparative analysis of the landed mass and required power of each plant architecture utilizing subsystem-level models. Furthermore, based on uncertain parameters such as resource content in regolith, the potential performance range of each plant was discovered through Monte Carlo simulations. The result indicates the benefit of the series hybrid architecture in terms of regolith excavation rate, while its mass cost seems the highest among the studied architectures.
- [90] arXiv:2409.14489 (replaced) [pdf, html, other]
-
Title: A New Twist on Low-Complexity Digital BackpropagationComments: The manuscript has been accepted for publication on the Journal of Lightwave Technology on Febraury 2025. With respect to the previous version, we corrected a typo on figures 9a, 9b, 10Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This work proposes a novel low-complexity digital backpropagation (DBP) method, with the goal of optimizing the trade-off between backpropagation accuracy and complexity. The method combines a split step Fourier method (SSFM)-like structure with a simplified logarithmic perturbation method to obtain a high accuracy with a small number of DBP steps. Subband processing and asymmetric steps with optimized splitting ratio are also employed to further reduce the number of steps required to achieve a prescribed performance. The first part of the manuscript is dedicated to the derivation of a simplified logarithmic-perturbation model for the propagation of signal in an optical fiber, which serves for the development of the proposed coupled-band enhanced split step Fourier method (CB-ESSFM) and for the analytical calculation of the model coefficients. Next, the manuscript presents a DSP algorithm for the implementation of DBP based on a discrete-time version of the model and an overlap-and-save processing strategy. Practical approaches for the optimization of the coefficients used in the algorithm and of the splitting ratio of the asymmetric steps are also discussed. A detailed analysis of the computational complexity is presented. Finally, the performance and complexity of the proposed DBP method are investigated through simulations. In a five-channel 100 GHz-spaced wavelength division multiplexing system over a 15x80 km single-mode-fiber link, the proposed CB-ESSFM achieves a gain of about 1dB over simple dispersion compensation with only 15 steps (corresponding to 681 real multiplications per 2D symbol), with an improvement of 0.9 dB over conventional SSFM and almost 0.4dB over our previously proposed ESSFM. Significant gains and improvements are obtained also at lower complexity. A similar analysis is performed also for longer links, confirming the good performance of the proposed method.
- [91] arXiv:2409.15720 (replaced) [pdf, html, other]
-
Title: Optimization of partially isolated quantum harmonic oscillator memory systems by mean square decoherence time criteriaComments: 9 pages, 3 figures, submitted to ANZCC 2025, the first line of the proof of Lemma 1 on page 4 has been correctedSubjects: Quantum Physics (quant-ph); Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper is concerned with open quantum harmonic oscillators with position-momentum system variables, whose internal dynamics and interaction with the environment are governed by linear quantum stochastic differential equations. A recently proposed approach to such systems as Heisenberg picture quantum memories exploits their ability to approximately retain initial conditions over a decoherence horizon. Using the quantum memory decoherence time defined previously in terms of a fidelity threshold on a weighted mean-square deviation of the system variables from their initial values, we apply this approach to a partially isolated subsystem of the oscillator, which is not directly affected by the external fields. The partial isolation leads to an appropriate system decomposition and a qualitatively different short-horizon asymptotic behaviour of the deviation, which yields a longer decoherence time in the high-fidelity limit. The resulting approximate decoherence time maximization over the energy parameters for improving the quantum memory performance is discussed for a coherent feedback interconnection of such systems.
- [92] arXiv:2410.04133 (replaced) [pdf, html, other]
-
Title: An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple DomainsJun Li, Aaron Aguirre, Junior Moura, Che Liu, Lanhai Zhong, Chenxi Sun, Gari Clifford, Brandon Westover, Shenda HongComments: Code: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Artificial intelligence (AI) has demonstrated significant potential in ECG analysis and cardiovascular disease assessment. Recently, foundation models have played a remarkable role in advancing medical AI. The development of an ECG foundation model holds the promise of elevating AI-ECG research to new heights. However, building such a model faces several challenges, including insufficient database sample sizes and inadequate generalization across multiple domains. Additionally, there is a notable performance gap between single-lead and multi-lead ECG analyses. We introduced an ECG Foundation Model (ECGFounder), a general-purpose model that leverages real-world ECG annotations from cardiology experts to broaden the diagnostic capabilities of ECG analysis. ECGFounder was trained on over 10 million ECGs with 150 label categories from the Harvard-Emory ECG Database, enabling comprehensive cardiovascular disease diagnosis through ECG analysis. The model is designed to be both an effective out-of-the-box solution, and a to be fine-tunable for downstream tasks, maximizing usability. Importantly, we extended its application to lower rank ECGs, and arbitrary single-lead ECGs in particular. ECGFounder is applicable to supporting various downstream tasks in mobile monitoring scenarios. Experimental results demonstrate that ECGFounder achieves expert-level performance on internal validation sets, with AUROC exceeding 0.95 for eighty diagnoses. It also shows strong classification performance and generalization across various diagnoses on external validation sets. When fine-tuned, ECGFounder outperforms baseline models in demographic analysis, clinical event detection, and cross-modality cardiac rhythm diagnosis. The trained model and data will be publicly released upon publication through the this http URL. Our code is available at this https URL
- [93] arXiv:2410.15316 (replaced) [pdf, html, other]
-
Title: Ichigo: Mixed-Modal Early-Fusion Realtime Voice AssistantSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Large Language Models (LLMs) have revolutionized natural language processing, but their application to speech-based tasks remains challenging due to the complexities of integrating audio and text modalities. This paper introduces Ichigo, a mixed-modal model that seamlessly processes interleaved sequences of speech and text. Utilizing a tokenized early-fusion approach, Ichigo quantizes speech into discrete tokens and employs a uniform transformer-based architecture for both speech and text modalities. This method enables joint reasoning and generation across modalities without the need for separate adapters. We present a comprehensive training methodology, including pre-training on multilingual speech recognition datasets and fine-tuning on a curated instruction dataset. Ichigo demonstrates state-of-the-art performance on speech question-answering benchmarks, outperforming existing open-source speech language models and achieving comparable results to cascaded systems. Notably, Ichigo exhibits a latency of just 111 ms to first token generation, significantly lower than current models. Our approach not only advances the field of multimodal AI but also provides a framework for smaller research teams to contribute effectively to open-source speech-language models.
- [94] arXiv:2501.00398 (replaced) [pdf, html, other]
-
Title: TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio ClassificationComments: Accepted to SALMA Workshop ICASSP 2025Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Audio-language models (ALMs) excel in zero-shot audio classification, a task where models classify previously unseen audio clips at test time by leveraging descriptive natural language prompts. We introduce TSPE (Task-Specific Prompt Ensemble), a simple, training-free hard prompting method that boosts ALEs' zero-shot performance by customizing prompts for diverse audio classification tasks. Rather than using generic template-based prompts like "Sound of a car" we generate context-rich prompts, such as "Sound of a car coming from a tunnel". Specifically, we leverage label information to identify suitable sound attributes, such as "loud" and "feeble", and appropriate sound sources, such as "tunnel" and "street" and incorporate this information into the prompts used by Audio-Language Models (ALMs) for audio classification. Further, to enhance audio-text alignment, we perform prompt ensemble across TSPE-generated task-specific prompts. When evaluated on 12 diverse audio classification datasets, TSPE improves performance across ALMs by showing an absolute improvement of 1.23-16.36% over vanilla zero-shot evaluation.
- [95] arXiv:2501.07534 (replaced) [pdf, html, other]
-
Title: Investigating Map-Based Path Loss Models: A Study of Feature Representations in Convolutional Neural NetworksComments: 4 pages, 2 figures, 4 tablesSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Path loss prediction is a beneficial tool for efficient use of the radio frequency spectrum. Building on prior research on high-resolution map-based path loss models, this paper studies convolutional neural network input representations in more detail. We investigate different methods of representing scalar features in convolutional neural networks. Specifically, we compare using frequency and distance as input channels to convolutional layers or as scalar inputs to regression layers. We assess model performance using three different feature configurations and find that representing scalar features as image channels results in the strongest generalization.
- [96] arXiv:2503.02647 (replaced) [pdf, other]
-
Title: A Framework for Uplink ISAC Receiver Designs: Performance Analysis and Algorithm DevelopmentComments: 13 pages, 9 figures, submitted to an IEEE journal for possible publicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Uplink integrated sensing and communication (ISAC) systems have recently emerged as a promising research direction, enabling simultaneous uplink signal detection and target sensing. In this paper, we propose the flexible projection (FP)-type receiver that unify the projection-type receiver and the successive interference cancellation (SIC)-type receiver by using a flexible tradeoff factor to adapt to dynamically changing uplink ISAC scenarios. The FP-type receiver addresses the joint signal detection and target response estimation problem through two coordinated phases: 1) Communication signal detection using a reconstructed signal whose composition is controlled by the tradeoff factor, followed by 2) Target response estimation performed through subtraction of the detected communication signal from the received signal. With adjustable tradeoff factors, the FP-type receiver can balance the enhancement of the signal-to-interference-plus-noise ratio (SINR) with the reduction of correlation in the reconstructed signal for communication signal detection. The pairwise error probabilities (PEPs) are analyzed for both the maximum likelihood (ML) and the zero-forcing (ZF) detectors, revealing that the optimal tradeoff factor should be determined based on the adopted detection algorithm and the relative power of the sensing and communication (S\&C) signal. A homotopy optimization framework is first applied for the FP-type receiver with a fixed trade-off factor. This framework is then extended to develop the dynamic FP (DFP)-type receiver, which iteratively adjust the trade-off factor for improved algorithm performance and environmental adaptability. Subsequently, two extensions are explored to further enhance the receiver's performance: parallel DFP (PDFP)-type receiver and a block-structured receiver design. Finally, the effectiveness of the proposed receiver designs is verified via simulations.
- [97] arXiv:2503.06790 (replaced) [pdf, html, other]
-
Title: GenDR: Lightning Generative Detail RestoratorSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Recent research applying text-to-image (T2I) diffusion models to real-world super-resolution (SR) has achieved remarkable success. However, fundamental misalignments between T2I and SR targets result in a dilemma between inference speed and detail fidelity. Specifically, T2I tasks prioritize multi-step inversion to synthesize coherent outputs aligned with textual prompts and shrink the latent space to reduce generating complexity. Contrariwise, SR tasks preserve most information from low-resolution input while solely restoring high-frequency details, thus necessitating sufficient latent space and fewer inference steps. To bridge the gap, we present a one-step diffusion model for generative detail restoration, GenDR, distilled from a tailored diffusion model with larger latent space. In detail, we train a new SD2.1-VAE16 (0.9B) via representation alignment to expand latent space without enlarging the model size. Regarding step-distillation, we propose consistent score identity distillation (CiD) that incorporates SR task-specific loss into score distillation to leverage more SR priors and align the training target. Furthermore, we extend CiD with adversarial learning and representation alignment (CiDA) to enhance perceptual quality and accelerate training. We also polish the pipeline to achieve a more efficient inference. Experimental results demonstrate that GenDR achieves state-of-the-art performance in both quantitative metrics and visual fidelity.