We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 75 entries: 1-75 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 6 May 24

[1]  arXiv:2405.01552 [pdf, other]
Title: Enhancing 3T Retinotopic Maps Using Diffeomorphic Registration
Comments: 5 pages, 1 figures, 2 tables, 2024 IEEE International Symposium on Biomedical Imaging
Subjects: Image and Video Processing (eess.IV)

Retinotopic mapping aims to uncover the relationship between visual stimuli on the retina and neural responses on the visual cortical surface. This study advances retinotopic mapping by applying diffeomorphic registration to the 3T NYU retinotopy dataset, encompassing analyze-PRF and mrVista data. Diffeomorphic Registration for Retinotopic Maps (DRRM) quantifies the diffeomorphic condition, ensuring accurate alignment of retinotopic maps without topological violations. Leveraging the Beltrami coefficient and topological condition, DRRM significantly enhances retinotopic map accuracy. Evaluation against existing methods demonstrates DRRM's superiority on various datasets, including 3T and 7T retinotopy data. The application of diffeomorphic registration improves the interpretability of low-quality retinotopic maps, holding promise for clinical applications.

[2]  arXiv:2405.01600 [pdf, other]
Title: Deep Learning Descriptor Hybridization with Feature Reduction for Accurate Cervical Cancer Colposcopy Image Classification
Comments: 7 Pages double column, 5 figures, and 5 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cervical cancer stands as a predominant cause of female mortality, underscoring the need for regular screenings to enable early diagnosis and preemptive treatment of pre-cancerous conditions. The transformation zone in the cervix, where cellular differentiation occurs, plays a critical role in the detection of abnormalities. Colposcopy has emerged as a pivotal tool in cervical cancer prevention since it provides a meticulous examination of cervical abnormalities. However, challenges in visual evaluation necessitate the development of Computer Aided Diagnosis (CAD) systems.
We propose a novel CAD system that combines the strengths of various deep-learning descriptors (ResNet50, ResNet101, and ResNet152) with appropriate feature normalization (min-max) as well as feature reduction technique (LDA). The combination of different descriptors ensures that all the features (low-level like edges and colour, high-level like shape and texture) are captured, feature normalization prevents biased learning, and feature reduction avoids overfitting. We do experiments on the IARC dataset provided by WHO. The dataset is initially segmented and balanced. Our approach achieves exceptional performance in the range of 97%-100% for both the normal-abnormal and the type classification. A competitive approach for type classification on the same dataset achieved 81%-91% performance.

[3]  arXiv:2405.01644 [pdf, ps, other]
Title: A Classification-Based Adaptive Segmentation Pipeline: Feasibility Study Using Polycystic Liver Disease and Metastases from Colorectal Cancer CT Images
Comments: J Digit Imaging. Inform. med. (2024)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

Automated segmentation tools often encounter accuracy and adaptability issues when applied to images of different pathology. The purpose of this study is to explore the feasibility of building a workflow to efficiently route images to specifically trained segmentation models. By implementing a deep learning classifier to automatically classify the images and route them to appropriate segmentation models, we hope that our workflow can segment the images with different pathology accurately. The data we used in this study are 350 CT images from patients affected by polycystic liver disease and 350 CT images from patients presenting with liver metastases from colorectal cancer. All images had the liver manually segmented by trained imaging analysts. Our proposed adaptive segmentation workflow achieved a statistically significant improvement for the task of total liver segmentation compared to the generic single segmentation model (non-parametric Wilcoxon signed rank test, n=100, p-value << 0.001). This approach is applicable in a wide range of scenarios and should prove useful in clinical implementations of segmentation pipelines.

[4]  arXiv:2405.01658 [pdf, other]
Title: MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems
Comments: Accepted in DCA in MI Workshop@CVPR2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The acquisition of different data modalities can enhance our knowledge and understanding of various diseases, paving the way for a more personalized healthcare. Thus, medicine is progressively moving towards the generation of massive amounts of multi-modal data (\emph{e.g,} molecular, radiology, and histopathology). While this may seem like an ideal environment to capitalize data-centric machine learning approaches, most methods still focus on exploring a single or a pair of modalities due to a variety of reasons: i) lack of ready to use curated datasets; ii) difficulty in identifying the best multi-modal fusion strategy; and iii) missing modalities across patients. In this paper we introduce a real world multi-modal dataset called MMIST-CCRCC that comprises 2 radiology modalities (CT and MRI), histopathology, genomics, and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC). We provide single and multi-modal (early and late fusion) benchmarks in the task of 12-month survival prediction in the challenging scenario of one or more missing modalities for each patient, with missing rates that range from 26$\%$ for genomics data to more than 90$\%$ for MRI. We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting. Additionally, incorporating a strategy to generate the latent representations of the missing modalities given the available ones further improves the performance, highlighting a potential complementarity across modalities. Our dataset and code are available here: https://multi-modal-ist.github.io/datasets/ccRCC

[5]  arXiv:2405.01681 [pdf, other]
Title: Accounting for the Effects of Probabilistic Uncertainty During Fast Charging of Lithium-ion Batteries
Comments: 6 pages, 5 figures, accepted for ACC 2024
Subjects: Systems and Control (eess.SY)

Batteries are nonlinear dynamical systems that can be modeled by Porous Electrode Theory models. The aim of optimal fast charging is to reduce the charging time while keeping battery degradation low. Most past studies assume that model parameters and ambient temperature are a fixed known value and that all PET model parameters are perfectly known. In real battery operation, however, the ambient temperature and the model parameters are uncertain. To ensure that operational constraints are satisfied at all times in the context of model-based optimal control, uncertainty quantification is required. Here, we analyze optimal fast charging for modest uncertainty in the ambient temperature and 23 model parameters. Uncertainty quantification of the battery model is carried out using non-intrusive polynomial chaos expansion and the results are verified with Monte Carlo simulations. The method is investigated for a constant current--constant voltage charging strategy for a battery for which the strategy is known to be standard for fast charging subject to operating below maximum current and charging constraints. Our results demonstrate that uncertainty in ambient temperature results in violations of constraints on the voltage and temperature. Our results identify a subset of key parameters that contribute to fast charging among the overall uncertain parameters. Additionally, it is shown that the constraints represented by voltage, temperature, and lithium-plating overpotential are violated due to uncertainties in the ambient temperature and parameters. The C-rate and charge constraints are then adjusted so that the probability of violating the degradation acceleration condition is below a pre-specified value. This approach demonstrates a computationally efficient approach for determining fast-charging protocols that take probabilistic uncertainties into account.

[6]  arXiv:2405.01692 [pdf, other]
Title: Multi-Layer Network Formation through HAPS Base Station and Transmissive RIS-Equipped UAV
Subjects: Signal Processing (eess.SP)

In order to bolster future wireless networks, there has been a great deal of interest in non-terrestrial networks, especially aerial platform stations including the high altitude platform station (HAPS) and uncrewed aerial vehicles (UAV). These platforms can integrate advanced technologies such as reconfigurable intelligent surfaces (RIS) and non-orthogonal multiple access (NOMA). In this regard, this paper proposes a multi-layer network architecture to improve the performance of conventional HAPS super-macro base station (HAPS-SMBS)-assisted UAV. The architecture includes a HAPS-SMBS, UAVs equipped with active transmissive RIS, and ground Internet of things devices. We also consider multiple-input single-output (MISO) technology, by employing multiple antennas at the HAPS-SMBS and a single antenna at the Internet of things devices. Additionally, we consider NOMA as the multiple access technology as well as the existence of hardware impairments as a practical limitation. In particular, we compare the proposed system model with three different scenarios: HAPS-SMBS-assisted UAV that are equipped with active transmissive RIS and supported by single-input single-output system, HAPS-SMBS-assisted UAV that are equipped with amplify-and-forward relaying, and HAPS-SMBS-assisted UAV-equipped with passive transmissive RIS. Sum rate and energy efficiency are used as performance metrics, and the findings demonstrate that, in comparison to all benchmarks, the proposed system yields higher performance gain. Moreover, the hardware impairment limits the system performance at high transmit power levels.

[7]  arXiv:2405.01725 [pdf, other]
Title: Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning has made significant progress in computer vision, specifically in image classification, object detection, and semantic segmentation. The skip connection has played an essential role in the architecture of deep neural networks,enabling easier optimization through residual learning during the training stage and improving accuracy during testing. Many neural networks have inherited the idea of residual learning with skip connections for various tasks, and it has been the standard choice for designing neural networks. This survey provides a comprehensive summary and outlook on the development of skip connections in deep neural networks. The short history of skip connections is outlined, and the development of residual learning in deep neural networks is surveyed. The effectiveness of skip connections in the training and testing stages is summarized, and future directions for using skip connections in residual learning are discussed. Finally, we summarize seminal papers, source code, models, and datasets that utilize skip connections in computer vision, including image classification, object detection, semantic segmentation, and image reconstruction. We hope this survey could inspire peer researchers in the community to develop further skip connections in various forms and tasks and the theory of residual learning in deep neural networks. The project page can be found at https://github.com/apple1986/Residual_Learning_For_Images

[8]  arXiv:2405.01726 [pdf, ps, other]
Title: SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Denoising hyperspectral images (HSIs) is a crucial preprocessing procedure due to the noise originating from intra-imaging mechanisms and environmental factors. Utilizing domain-specific knowledge of HSIs, such as spectral correlation, spatial self-similarity, and spatial-spectral correlation, is essential for deep learning-based denoising. Existing methods are often constrained by running time, space complexity, and computational complexity, employing strategies that explore these priors separately. While the strategies can avoid some redundant information, considering that hyperspectral images are 3-D images with strong spatial continuity and spectral correlation, this kind of strategy inevitably overlooks subtle long-range spatial-spectral information that positively impacts image restoration. This paper proposes a Spatial-Spectral Selective State Space Model-based U-shaped network, termed Spatial-Spectral U-Mamba (SSUMamba), for hyperspectral image denoising. We can obtain complete global spatial-spectral correlation within a module thanks to the linear space complexity in State Space Model (SSM) computations. We introduce an Alternating Scan (SSAS) strategy for HSI data, which helps model the information flow in multiple directions in 3-D HSIs. Experimental results demonstrate that our method outperforms several compared methods. The source code will be available at https://github.com/lronkitty/SSUMamba.

[9]  arXiv:2405.01730 [pdf, other]
Title: Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Comments: Accepted by Speaker Odyssey 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Expressive voice conversion (VC) conducts speaker identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Emotional style modeling for arbitrary speakers in expressive VC has not been extensively explored. Previous approaches have relied on vocoders for speech reconstruction, which makes speech quality heavily dependent on the performance of vocoders. A major challenge of expressive VC lies in emotion prosody modeling. To address these challenges, this paper proposes a fully end-to-end expressive VC framework based on a conditional denoising diffusion probabilistic model (DDPM). We utilize speech units derived from self-supervised speech models as content conditioning, along with deep features extracted from speech emotion recognition and speaker verification systems to model emotional style and speaker identity. Objective and subjective evaluations show the effectiveness of our framework. Codes and samples are publicly available.

[10]  arXiv:2405.01750 [pdf, other]
Title: PointCompress3D -- A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In the context of Intelligent Transportation Systems (ITS), efficient data compression is crucial for managing large-scale point cloud data acquired by roadside LiDAR sensors. The demand for efficient storage, streaming, and real-time object detection capabilities for point cloud data is substantial. This work introduces PointCompress3D, a novel point cloud compression framework tailored specifically for roadside LiDARs. Our framework addresses the challenges of compressing high-resolution point clouds while maintaining accuracy and compatibility with roadside LiDAR sensors. We adapt, extend, integrate, and evaluate three cutting-edge compression methods using our real-world-based TUMTraf dataset family. We achieve a frame rate of 10 FPS while keeping compression sizes below 105 Kb, a reduction of 50 times, and maintaining object detection performance on par with the original data. In extensive experiments and ablation studies, we finally achieved a PSNR d2 of 94.46 and a BPP of 6.54 on our dataset. Future work includes the deployment on the live system. The code is available on our project website: https://pointcompress3d.github.io.

[11]  arXiv:2405.01753 [pdf, other]
Title: A Feedback Linearized Model Predictive Control Strategy for Input-Constrained Self-Driving Cars
Comments: Preprint of a manuscript currently under review for TCTS
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper proposes a novel real-time affordable solution to the trajectory tracking control problem for self-driving cars subject to longitudinal and steering angular velocity constraints. To this end, we develop a dual-mode Model Predictive Control (MPC) solution starting from an input-output feedback linearized description of the vehicle kinematics. First, we derive the state-dependent input constraints acting on the linearized model and characterize their worst-case time-invariant inner approximation. Then, a dual-mode MPC is derived to be real-time affordable and ensuring, by design, constraints fulfillment, recursive feasibility, and uniformly ultimate boundedness of the tracking error in an ad-hoc built robust control invariant region. The approach's effectiveness and performance are experimentally validated via laboratory experiments on a Quanser Qcar. The obtained results show that the proposed solution is computationally affordable and with tracking capabilities that outperform two alternative control schemes.

[12]  arXiv:2405.01816 [pdf, other]
Title: The Integrated Sensing and Communication Revolution for 6G: Vision, Techniques, and Applications
Subjects: Signal Processing (eess.SP)

Future wireless networks will integrate sensing, learning and communication to provide new services beyond communication and to become more resilient. Sensors at the network infrastructure, sensors on the user equipment, and the sensing capability of the communication signal itself provide a new source of data that connects the physical and radio frequency environments. A wireless network that harnesses all these sensing data can not only enable additional sensing services, but also become more resilient to channel-dependent effects like blockage and better support adaptation in dynamic environments as networks reconfigure. In this paper, we provide a vision for integrated sensing and communication (ISAC) networks and an overview of how signal processing, optimization and machine learning techniques can be leveraged to make them a reality in the context of 6G. We also include some examples of the performance of several of these strategies when evaluated using a simulation framework based on a combination of ray tracing measurements and mathematical models that mix the digital and physical worlds.

[13]  arXiv:2405.01822 [pdf, other]
Title: Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. The goal of this challenge was to promote the development of deep generative models (DGMs) for medical imaging and to emphasize the need for their domain-relevant assessment via the analysis of relevant image statistics. As part of this Grand Challenge, a training dataset was developed based on 3D anthropomorphic breast phantoms from the VICTRE virtual imaging toolbox. A two-stage evaluation procedure consisting of a preliminary check for memorization and image quality (based on the Frechet Inception distance (FID)), and a second stage evaluating the reproducibility of image statistics corresponding to domain-relevant radiomic features was developed. A summary measure was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, and to identify various artifacts. 58 submissions from 12 unique users were received for this Challenge. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. We observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.

[14]  arXiv:2405.01889 [pdf, ps, other]
Title: Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants
Comments: DAI-Labor of Technische Universit\"at Berlin Master thesis
Subjects: Systems and Control (eess.SY)

The increasing demand for direct electric energy in the grid is also tied to the increase of Electric Vehicle (EV) usage in the cities, which eventually will totally substitute combustion engine Vehicles. Nevertheless, this high amount of energy required, which is stored in the EV batteries, is not always used and it can constitute a virtual power plant on its own. Bidirectional EVs equipped with batteries connected to the grid can therefore charge or discharge energy depending on public needs, producing a smart shift of energy where and when needed. EVs employed as mobile storage devices can add resilience and supply/demand balance benefits to specific loads, in many cases as part of a Microgrid (MG). Depending on the direction of the energy transfer, EVs can provide backup power to households through vehicle-to-house (V2H) charging, or storing unused renewable power through renewable-to-vehicle (RE2V) charging. V2H and RE2V solutions can complement renewable power sources like solar photovoltaic (PV) panels and wind turbines (WT), which fluctuate over time, increasing the self-consumption and autarky. The concept of distributed energy resources (DERs) is becoming more and more present and requires new solutions for the integration of multiple complementary resources with variable supply over time. The development of these ideas is coupled with the growth of new AI techniques that will potentially be the managing core of such systems. Machine learning techniques can model the energy grid environment in such a flexible way that constant optimization is possible. This fascinating working principle introduces the wider concept of an interconnected, shared, decentralized grid of energy. This research on Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants focuses on providing solutions for such energy supply optimization models.

[15]  arXiv:2405.01916 [pdf, other]
Title: Multi-objective Optimal Trade-off Between V2G Activities and Battery Degradation in Electric Mobility-as-a-Service Systems
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper presents optimization models for electric Mobility-as-a-Service systems, whereby electric vehicles not only provide on-demand mobility, but also perform charging and Vehicle-to-Grid (V2G) operations to enhance the fleet operator profitability. Specifically, we formulate the optimal fleet operation problem as a mixed-integer linear program, with the objective combining of operational costs and revenues generated from servicing requests and grid electricity sales. Our cost function explicitly captures battery price and degradation, reflecting their impact on the fleet total cost of ownership due to additional charging and discharging activities. Simulation results for Eindhoven, The Netherlands, show that integrating V2G activities does not compromise the number of travel requests being served. Moreover, we emphasize the significance of accounting for battery degradation, as the costs associated with it can potentially outweigh the revenues stemming from V2G operations.

[16]  arXiv:2405.01928 [pdf, ps, other]
Title: Enhancing NLoS RIS-Aided Localization with Optimization and Machine Learning
Comments: 6 pages, 13 figures
Subjects: Signal Processing (eess.SP)

This paper introduces two machine learning optimization algorithms to significantly enhance position estimation in Reconfigurable Intelligent Surface (RIS) aided localization for mobile user equipment in Non-Line-of-Sight conditions. Leveraging the strengths of these algorithms, we present two methods capable of achieving extremely high accuracy, reaching sub-centimeter or even sub-millimeter levels at 3.5 GHz. The simulation results highlight the potential of these approaches, showing significant improvements in indoor mobile localization. The demonstrated precision and reliability of the proposed methods offer new opportunities for practical applications in real-world scenarios, particularly in Non-Line-of-Sight indoor localization. By evaluating four optimization techniques, we determine that a combination of a Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) results in localization errors under 30 cm in 90 % of the cases, and under 5 mm for close to 85 % of cases when considering a simulated room of 10 m by 10 m where two of the walls are equipped with RIS tiles.

[17]  arXiv:2405.01931 [pdf, other]
Title: RF Chain-Free mmWave Transmission: Modeling and Experimental Verification
Comments: Accepeted in PIMRC 2024, Copyright IEEE
Subjects: Signal Processing (eess.SP)

The utilization of millimeter wave frequency bands is expected to become prevalent in the following communication systems. However, generating and transmitting communication signals over these frequencies is not as straightforward as in sub-6 GHz frequencies due to complex transceiver structures. As an alternative to conventional transmitter architectures, this paper investigates the implementation of time-modulated arrays to effectively modulate and transmit high-quality communication signals at millimeter wave frequencies. By exploiting the array structures and analog beamformers, which are the fundamental components of millimeter wave transmitters, secure and low-cost transmission can be achieved. Though, harmonics of theoretically infinite bandwidth arise as a fundamental problem in this approach. Thus, this paper presents a frequency analysis tool for the time-modulated arrays with hardware impairments and shows how controlling the sampling period can reduce the harmonics. Furthermore, the derived results are experimentally verified at 25 GHz with two important remarks. First, the phase error of received signals can be reduced by 32% using the proposed architecture. Second, the harmonics can be significantly suppressed by the correct choice of sampling period for the given hardware.

[18]  arXiv:2405.01961 [pdf, other]
Title: Rescale-Invariant Federated Reinforcement Learning for Resource Allocation in V2X Networks
Subjects: Signal Processing (eess.SP)

Federated Reinforcement Learning (FRL) offers a promising solution to various practical challenges in resource allocation for vehicle-to-everything (V2X) networks. However, the data discrepancy among individual agents can significantly degrade the performance of FRL-based algorithms. To address this limitation, we exploit the node-wise invariance property of ReLU-activated neural networks, with the aim of reducing data discrepancy to improve learning performance. Based on this property, we introduce a backward rescale-invariant operation to develop a rescale-invariant FRL algorithm. Simulation results demonstrate that the proposed algorithm notably enhances both convergence speed and convergent performance.

[19]  arXiv:2405.01965 [pdf, ps, other]
Title: A Deep Learning Approach in RIS-based Indoor Localization
Comments: 6 pages, 7 figures
Subjects: Signal Processing (eess.SP)

In the domain of RIS-based indoor localization, our work introduces two distinct approaches to address real-world challenges. The first method is based on deep learning, employing a Long Short-Term Memory (LSTM) network. The second, a novel LSTM-PSO hybrid, strategically takes advantage of deep learning and optimization techniques. Our simulations encompass practical scenarios, including variations in RIS placement and the intricate dynamics of multipath effects, all in Non-Line-of-Sight conditions. Our methods can achieve very high reliability, obtaining centimeter-level accuracy for the 98th percentile (worst case) in a different set of conditions, including the presence of the multipath effect. Furthermore, our hybrid approach showcases remarkable resolution, achieving sub-millimeter-level accuracy in numerous scenarios.

[20]  arXiv:2405.01967 [pdf, other]
Title: Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorithms in two complex acoustic scenes. Both algorithms are evaluated with objective metrics and in experiments with hearing-impaired listeners performing a speech-in-noise test. Results are compared to two traditional enhancement strategies, i.e., adaptive differential microphone processing and binaural beamforming. While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers. Through a post-analysis, this can be attributed to improvements at low SNRs and to precise spatial filtering.

[21]  arXiv:2405.02007 [pdf, other]
Title: Analysing PolSAR data from vegetation by using the subaperture decomposition approach
Subjects: Signal Processing (eess.SP)

A common assumption in radar remote sensing studies for vegetation is that radar returns originate from a target made up by a set of uniformly distributed isotropic scatterers. Nonetheless, several studies in the literature have noted that orientation effects and heterogeneities have a noticeable impact in backscattering signatures according to the specific vegetation type and sensor frequency. In this paper we have employed the subaperture decomposition technique (i.e. a time-frequency analysis) and the 3-D Barakat degree of polarisation to assess the variation of the volume backscatterig power as a function of the azimuth look angle. Three different datasets, i.e. multi-frequency indoor acquisitions over short vegetation samples, and P-band airborne data and L-band satellite data over boreal and tropical forest, respectively, have been employed in this study. We have argued that despite depolarising effects may be only sensed through a small portion of the synthetic aperture, they can lead to overestimated retrievals of the volume scattering for the full resolution image. This has direct implications in the existing model-based and model-free polarimetric SAR decompositions.

[22]  arXiv:2405.02030 [pdf, other]
Title: Obstacle Avoidance of Autonomous Vehicles: An LPVMPC with Scheduling Trust Region
Subjects: Systems and Control (eess.SY)

Reference tracking and obstacle avoidance rank among the foremost challenging aspects of autonomous driving. This paper proposes control designs for solving reference tracking problems in autonomous driving tasks while considering static obstacles. We suggest a model predictive control (MPC) strategy that evades the computational burden of nonlinear nonconvex optimization methods after embedding the nonlinear model equivalently to a linear parameter-varying (LPV) formulation using the so-called scheduling parameter. This allows optimal and fast solutions of the underlying convex optimization scheme as a quadratic program (QP) at the expense of losing some performance due to the uncertainty of the future scheduling trajectory over the MPC horizon. Also, to ensure that the modeling error due to the application of the scheduling parameter predictions does not become significant, we propose the concept of scheduling trust region by enforcing further soft constraints on the states and inputs. A consequence of using the new constraints in the MPC is that we construct a region in which the scheduling parameter updates in two consecutive time instants are trusted for computing the system matrices, and therefore, the feasibility of the MPC optimization problem is retained. We test the method in different scenarios and compare the results to standard LPVMPC as well as nonlinear MPC (NMPC) schemes.

[23]  arXiv:2405.02085 [pdf, other]
Title: AFDM Chirp-Permutation-Index Modulation with Quantum-Accelerated Codebook Design
Subjects: Signal Processing (eess.SP)

We describe a novel index modulation (IM) scheme exploiting a unique feature of the recently proposed affine frequency division multiplexing (AFDM) in doubly-dispersive (DD) channels. Dubbed AFDM chirp-permutation-index modulation (CPIM), the proposed method encodes additional information via the permutation of the discrete affine Fourier Transform (DAFT) chirp sequence, without any sacrifice of the various beneficial properties of the AFDM waveform in DD channels. The effectiveness of the proposed method is validated via simulation results leveraging a novel reduced-complexity minimum mean-squared-error (MMSE)-based maximum-likelihood (ML) detector, highlighting the gains over the classical AFDM. As part of the work two interesting problems related to optimizing AFDM-CPIM are identified: the optimal codebook design problem, over a discrete solution space of dimension $\binom{N!}{K}$, where $N$ is the number of subcarriers and $K$ is the number of codewords; and the ML detection problem whose solution space is of dimension $KM^N$, where $M$ is the constellation size. In order to alleviate the computational complexity of these problems and enable large-scale variations of AFDM-CPIM, the two problems are reformulated as a higher-order binary optimization problem and mapped to the well-known quantum Grover adaptive search (GAS) algorithm for their solution.

[24]  arXiv:2405.02101 [pdf, other]
Title: Discrete Aware Matrix Completion via Convexized $\ell_0$-Norm Approximation
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

We consider a novel algorithm, for the completion of partially observed low-rank matrices in a structured setting where each entry can be chosen from a finite discrete alphabet set, such as in common recommender systems. The proposed low-rank matrix completion (MC) method is an improved variation of state-of-the-art (SotA) discrete aware matrix completion method which we previously proposed, in which discreteness is enforced by an $\ell_0$-norm regularizer, not by replaced with the $\ell_1$-norm, but instead approximated by a continuous and differentiable function normalized via fractional programming (FP) under a proximal gradient (PG) framework. Simulation results demonstrate the superior performance of the new method compared to the SotA techniques as well as the earlier $\ell_1$-norm-based discrete-aware matrix completion approach.

[25]  arXiv:2405.02109 [pdf, ps, other]
Title: Three-Dimensional Amyloid-Beta PET Synthesis from Structural MRI with Conditional Generative Adversarial Networks
Comments: Abstract Submitted and Presented at the 2024 International Society of Magnetic Resonance in Medicine. Singapore, Singapore, May 4-9. Abstract Number 2239
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Motivation: Alzheimer's Disease hallmarks include amyloid-beta deposits and brain atrophy, detectable via PET and MRI scans, respectively. PET is expensive, invasive and exposes patients to ionizing radiation. MRI is cheaper, non-invasive, and free from ionizing radiation but limited to measuring brain atrophy.
Goal: To develop an 3D image translation model that synthesizes amyloid-beta PET images from T1-weighted MRI, exploiting the known relationship between amyloid-beta and brain atrophy.
Approach: The model was trained on 616 PET/MRI pairs and validated with 264 pairs.
Results: The model synthesized amyloid-beta PET images from T1-weighted MRI with high-degree of similarity showing high SSIM and PSNR metrics (SSIM>0.95&PSNR=28).
Impact: Our model proves the feasibility of synthesizing amyloid-beta PET images from structural MRI ones, significantly enhancing accessibility for large-cohort studies and early dementia detection, while also reducing cost, invasiveness, and radiation exposure.

[26]  arXiv:2405.02124 [pdf, other]
Title: TIPAA-SSL: Text Independent Phone-to-Audio Alignment based on Self-Supervised Learning and Knowledge Transfer
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

In this paper, we present a novel approach for text independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (wav2vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained thanks to forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages.

[27]  arXiv:2405.02126 [pdf, other]
Title: Multipath-based SLAM with Cooperation and Map Fusion
Comments: 8 pages. arXiv admin note: text overlap with arXiv:2211.09241
Subjects: Signal Processing (eess.SP)

Multipath-based simultaneous localization and mapping (MP-SLAM) is a promising approach in wireless networks for obtaining position information of transmitters and receivers as well as information on the propagation environment. MP-SLAM models specular reflections of radio frequency (RF) signals at flat surfaces as virtual anchors (VAs), the mirror images of base stations (BSs). Conventional methods for MP-SLAM consider a single mobile terminal (MT) which has to be localized. The availability of additional MTs paves the way for utilizing additional information in the scenario. Specifically enabling MTs to exchange information allows for data fusion over different observations of VAs made by different MTs. Furthermore, cooperative localization becomes possible in addition to multipath-based localization. Utilizing this additional information enables more robust mapping and higher localization accuracy.

[28]  arXiv:2405.02131 [pdf, other]
Title: Physics-informed generative neural networks for RF propagation prediction with application to indoor body perception
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations.

[29]  arXiv:2405.02146 [pdf, other]
Title: A Spiking Neural Network Decoder for Implantable Brain Machine Interfaces and its Sparsity-aware Deployment on RISC-V Microcontrollers
Subjects: Signal Processing (eess.SP)

Implantable Brain-machine interfaces (BMIs) are promising for motor rehabilitation and mobility augmentation, and they demand accurate and energy-efficient algorithms. In this paper, we propose a novel spiking neural network (SNN) decoder for regression tasks for implantable BMIs. The SNN is trained with enhanced spatio-temporal backpropagation to fully leverage its capability to handle temporal problems. The proposed SNN decoder outperforms the state-of-the-art Kalman filter and artificial neural network (ANN) decoders in offline finger velocity decoding tasks. The decoder is deployed on a RISC-V-based hardware platform and optimized to exploit sparsity. The proposed implementation has an average power consumption of 0.50 mW in a duty-cycled mode. When conducting continuous inference without duty-cycling, it achieves an energy efficiency of 1.88 uJ per inference, which is 5.5X less than the baseline ANN. Additionally, the average decoding latency is 0.12 ms for each inference, which is 5.7X faster than the ANN implementation.

[30]  arXiv:2405.02184 [pdf, other]
Title: Hybrid Lyapunov-based feedback stabilization of bipedal locomotion based on reference spreading
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)

We propose a hybrid formulation of the linear inverted pendulum model for bipedal locomotion, where the foot switches are triggered based on the center of mass position, removing the need for pre-defined footstep timings. Using a concept similar to reference spreading, we define nontrivial tracking error coordinates induced by our hybrid model. These coordinates enjoy desirable linear flow dynamics and rather elegant jump dynamics perturbed by a suitable extended class ${\mathcal K}_\infty$ function of the position error. We stabilize this hybrid error dynamics using a saturated feedback controller, selecting its gains by solving a convex optimization problem. We prove local asymptotic stability of the tracking error and provide a certified estimate of the basin of attraction, comparing it with a numerical estimate obtained from the integration of the closed-loop dynamics. Simulations on a full-body model of a real robot show the practical applicability of the proposed framework and its advantages with respect to a standard model predictive control formulation.

[31]  arXiv:2405.02208 [pdf, other]
Title: Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Image Quality Assessment (IQA) is essential in various Computer Vision tasks such as image deblurring and super-resolution. However, most IQA methods require reference images, which are not always available. While there are some reference-free IQA metrics, they have limitations in simulating human perception and discerning subtle image quality variations. We hypothesize that the JPEG quality factor is representatives of image quality measurement, and a well-trained neural network can learn to accurately evaluate image quality without requiring a clean reference, as it can recognize image degradation artifacts based on prior knowledge. Thus, we developed a reference-free quality evaluation network, dubbed "Quality Factor (QF) Predictor", which does not require any reference. Our QF Predictor is a lightweight, fully convolutional network comprising seven layers. The model is trained in a self-supervised manner: it receives JPEG compressed image patch with a random QF as input, is trained to accurately predict the corresponding QF. We demonstrate the versatility of the model by applying it to various tasks. First, our QF Predictor can generalize to measure the severity of various image artifacts, such as Gaussian Blur and Gaussian noise. Second, we show that the QF Predictor can be trained to predict the undersampling rate of images reconstructed from Magnetic Resonance Imaging (MRI) data.

Cross-lists for Mon, 6 May 24

[32]  arXiv:2405.01558 (cross-list from cs.CV) [pdf, other]
Title: Configurable Learned Holography
Comments: 14 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optics (physics.optics)

In the pursuit of advancing holographic display technology, we face a unique yet persistent roadblock: the inflexibility of learned holography in adapting to various hardware configurations.
This is due to the variances in the complex optical components and system settings in existing holographic displays.
Although the emerging learned approaches have enabled rapid and high-quality hologram generation, any alteration in display hardware still requires a retraining of the model.
Our work introduces a configurable learned model that interactively computes 3D holograms from RGB-only 2D images for a variety of holographic displays.
The model can be conditioned to predefined hardware parameters of existing holographic displays such as working wavelengths, pixel pitch, propagation distance, and peak brightness without having to retrain.
In addition, our model accommodates various hologram types, including conventional single-color and emerging multi-color holograms that simultaneously use multiple color primaries in holographic displays.
Notably, we enabled our hologram computations to rely on identifying the correlation between depth estimation and 3D hologram synthesis tasks within the learning domain for the first time in the literature.
We employ knowledge distillation via a student-teacher learning strategy to streamline our model for interactive performance.
Achieving up to a 2x speed improvement compared to state-of-the-art models while consistently generating high-quality 3D holograms with different hardware configurations.

[33]  arXiv:2405.01584 (cross-list from cs.CL) [pdf, other]
Title: Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression
Comments: 12 pages, TKDE format
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Signal Processing (eess.SP)

We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, optimizing dictionary atoms to enhance discriminative power based on mutual information and class distribution. This process generates discriminative numerical representations, facilitating the training of simple classifiers such as SVMs and neural networks. We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance. Tested on six benchmark text datasets, our algorithm competes closely with top models, especially in limited-vocabulary contexts, using significantly fewer parameters. \review{Our algorithm closely matches top-performing models, deviating by only ~2\% on limited-vocabulary datasets, using just 10\% of their parameters. However, it falls short on diverse-vocabulary datasets, likely due to the LZW algorithm's constraints with low-repetition data. This contrast highlights its efficiency and limitations across different dataset types.

[34]  arXiv:2405.01591 (cross-list from cs.CL) [pdf, other]
Title: Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model
Comments: Under review
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises questions about the feasibility of these models when encountering with the inevitable variations and errors inherent in real-world medical data. In this paper, we introduce MID-M, a novel framework that leverages the in-context learning capabilities of a general-domain Large Language Model (LLM) to process multimodal data via image descriptions. MID-M achieves a comparable or superior performance to task-specific fine-tuned LMMs and other general-domain ones, without the extensive domain-specific training or pre-training on multimodal data, with significantly fewer parameters. This highlights the potential of leveraging general-domain LLMs for domain-specific tasks and offers a sustainable and cost-effective alternative to traditional LMM developments. Moreover, the robustness of MID-M against data quality issues demonstrates its practical utility in real-world medical domain applications.

[35]  arXiv:2405.01690 (cross-list from cs.NI) [pdf, other]
Title: Addressing the Load Estimation Problem: Cell Switching in HAPS-Assisted Sustainable 6G Networks
Comments: arXiv admin note: substantial text overlap with arXiv:2402.04386
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

This study aims to introduce and address the problem of traffic load estimation in the cell switching concept within the evolving landscape of vertical heterogeneous networks (vHetNets). The problem is that the practice of cell switching faces a significant challenge due to the lack of accurate data on the traffic load of sleeping small base stations (SBSs). This problem makes the majority of the studies in the literature, particularly those employing load-dependent approaches, impractical due to their basic assumption of perfect knowledge of the traffic loads of sleeping SBSs for the next time slot. Rather than developing another advanced cell switching algorithm, this study investigates the impacts of estimation errors and explores possible solutions through established methodologies in a novel vHetNet environment that includes the integration of a high altitude platform (HAPS) as a super macro base station (SMBS) into the terrestrial network. In other words, this study adopts a more foundational perspective, focusing on eliminating a significant obstacle for the application of advanced cell switching algorithms. To this end, we explore the potential of three distinct spatial interpolation-based estimation schemes: random neighboring selection, distance-based selection, and clustering-based selection. Utilizing a real dataset for empirical validations, we evaluate the efficacy of our proposed traffic load estimation schemes. Our results demonstrate that the multi-level clustering (MLC) algorithm performs exceptionally well, with an insignificant difference (i.e., 0.8%) observed between its estimated and actual network power consumption, highlighting its potential to significantly improve energy efficiency in vHetNets.

[36]  arXiv:2405.01758 (cross-list from cs.RO) [pdf, other]
Title: CGD: Constraint-Guided Diffusion Policies for UAV Trajectory Planning
Comments: 8 pages, 3 figures
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly generating trajectories similar to those from the expert, (1) their output does not explicitly account for dynamic feasibility, and (2) the policies do not accommodate changes in the constraints different from those used during training.
To overcome these limitations, we propose Constraint-Guided Diffusion (CGD), a novel IL-based approach to trajectory planning. CGD leverages a hybrid learning/online optimization scheme that combines diffusion policies with a surrogate efficient optimization problem, enabling the generation of collision-free, dynamically feasible trajectories. The key ideas of CGD include dividing the original challenging optimization problem solved by the expert into two more manageable sub-problems: (a) efficiently finding collision-free paths, and (b) determining a dynamically-feasible time-parametrization for those paths to obtain a trajectory. Compared to conventional neural network architectures, we demonstrate through numerical evaluations significant improvements in performance and dynamic feasibility under scenarios with new constraints never encountered during training.

[37]  arXiv:2405.01785 (cross-list from cs.IT) [pdf, other]
Title: Towards Green Communication: Soft Decoding Scheme for OOK Signals in Zero-Energy Devices
Comments: Accepted in IEEE International Communications Conference (ICC) workshop, Denver, Jun 2024
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The booming of Internet-of-Things (IoT) is expected to provide more intelligent and reliable communication services for higher network coverage, massive connectivity, and low-cost solutions for 6G services. However, frequent charging and battery replacement of these massive IoT devices brings a series of challenges. Zero energy devices, which rely on energy-harvesting technologies and can operate without battery replacement or charging, play a pivotal role in facilitating the massive use of IoT devices. In order to enable reliable communications of such low-power devices, Manchester-coded on-off keying (OOK) modulation and non-coherent detections are attractive techniques due to their energy efficiency, robustness in noisy environments, and simplicity in receiver design. Moreover, to extend their communication range, employing channel coding along with enhanced detection schemes is crucial. In this paper, a novel soft-decision decoder is designed for OOK-based low-power receivers to enhance their detection performance. In addition, exact closed-form expressions and two simplified approximations are derived for the log-likelihood ratio (LLR), an essential metric for soft decoding. Numerical results demonstrate the significant coverage gain achieved through soft decoding for convolutional code.

[38]  arXiv:2405.01792 (cross-list from cs.RO) [pdf, other]
Title: Learning Robust Autonomous Navigation and Locomotion for Wheeled-Legged Robots
Journal-ref: Science Robotics, 2024, Vol 9, Issue 89
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Autonomous wheeled-legged robots have the potential to transform logistics systems, improving operational efficiency and adaptability in urban environments. Navigating urban environments, however, poses unique challenges for robots, necessitating innovative solutions for locomotion and navigation. These challenges include the need for adaptive locomotion across varied terrains and the ability to navigate efficiently around complex dynamic obstacles. This work introduces a fully integrated system comprising adaptive locomotion control, mobility-aware local navigation planning, and large-scale path planning within the city. Using model-free reinforcement learning (RL) techniques and privileged learning, we develop a versatile locomotion controller. This controller achieves efficient and robust locomotion over various rough terrains, facilitated by smooth transitions between walking and driving modes. It is tightly integrated with a learned navigation controller through a hierarchical RL framework, enabling effective navigation through challenging terrain and various obstacles at high speed. Our controllers are integrated into a large-scale urban navigation system and validated by autonomous, kilometer-scale navigation missions conducted in Zurich, Switzerland, and Seville, Spain. These missions demonstrate the system's robustness and adaptability, underscoring the importance of integrated control systems in achieving seamless navigation in complex environments. Our findings support the feasibility of wheeled-legged robots and hierarchical RL for autonomous navigation, with implications for last-mile delivery and beyond.

[39]  arXiv:2405.01794 (cross-list from cs.RO) [pdf, ps, other]
Title: New design of smooth PSO-IPF navigator with kinematic constraints
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Robotic applications across industries demand advanced navigation for safe and smooth movement. Smooth path planning is crucial for mobile robots to ensure stable and efficient navigation, as it minimizes jerky movements and enhances overall performance Achieving this requires smooth collision-free paths. Partial Swarm Optimization (PSO) and Potential Field (PF) are notable path-planning techniques, however, they may struggle to produce smooth paths due to their inherent algorithms, potentially leading to suboptimal robot motion and increased energy consumption. In addition, while PSO efficiently explores solution spaces, it generates long paths and has limited global search. On the contrary, PF methods offer concise paths but struggle with distant targets or obstacles. To address this, we propose Smoothed Partial Swarm Optimization with Improved Potential Field (SPSO-IPF), combining both approaches and it is capable of generating a smooth and safe path. Our research demonstrates SPSO-IPF's superiority, proving its effectiveness in static and dynamic environments compared to a mere PSO or a mere PF approach.

[40]  arXiv:2405.01815 (cross-list from cs.SD) [pdf, other]
Title: Toward end-to-end interpretable convolutional neural networks for waveform signals
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent. It can potentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while remaining lightweight. Furthermore, we demonstrate the efficiency and interpretability of the front-end layer using the PhysioNet Heart Sound Database, illustrating its ability to handle and capture intricate long waveform patterns. Our contributions offer a portable solution for building efficient and interpretable models for raw waveform data.

[41]  arXiv:2405.01882 (cross-list from cs.RO) [pdf, other]
Title: Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter challenges in handling sparse point clouds, achieving real-time continuous classification, and coping with limited monitoring ranges when statically mounted. To overcome these limitations, we propose RobHAR, a movable robot-mounted mmWave radar system with lightweight deep neural networks for real-time monitoring of human activities. Specifically, we first propose a sparse point cloud-based global embedding to learn the features of point clouds using the light-PointNet (LPN) backbone. Then, we learn the temporal pattern with a bidirectional lightweight LSTM model (BiLiLSTM). In addition, we implement a transition optimization strategy, integrating the Hidden Markov Model (HMM) with Connectionist Temporal Classification (CTC) to improve the accuracy and robustness of the continuous HAR. Our experiments on three datasets indicate that our method significantly outperforms the previous studies in both discrete and continuous HAR tasks. Finally, we deploy our system on a movable robot-mounted edge computing platform, achieving flexible healthcare monitoring in real-world scenarios.

[42]  arXiv:2405.01919 (cross-list from cs.IT) [pdf, ps, other]
Title: Channel Orthogonalization in Panel-Based LIS
Comments: 6 pages, 3 figures. This work has been submitted to the IEEE for possible publication, copyright information may be affected upon publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Large intelligent surface (LIS) has gained momentum as a potential 6G-enabling technology that expands the benefits of massive multiple-input multiple-output (MIMO). On the other hand, orthogonal space-division multiplexing (OSDM) may give a promising direction for efficient exploitation of the spatial resources, analogous as what is achieved with orthogonal frequency-division multiplexing (OFDM) in the frequency domain. To this end, we study how to enforce channels orthogonality in a panel-based LIS scenario. Our proposed method consists of having a subset of active LIS-panels coherently serving a set of users, and another subset of LIS-panels operating in semi-passive mode by implementing a receive and re-transmit (RRTx) process. This results in an inter-symbol interference (ISI) channel, where we characterize the semi-passive processing required to achieve simultaneous orthogonality in time and space. We then employ the remaining degrees of freedom (DoFs) from the orthogonality constraint to minimize the semi-passive processing power, where we derive a closed-form global minimizer, allowing for efficient implementation of the proposed scheme.

[43]  arXiv:2405.01979 (cross-list from cs.IT) [pdf, other]
Title: Graph Neural Network based Active and Passive Beamforming for Distributed STAR-RIS-Assisted Multi-User MISO Systems
Comments: 13 pages, 7 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper investigates a joint active and passive beamforming design for distributed simultaneous transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) assisted multi-user (MU)- mutiple input single output (MISO) systems, where the energy splitting (ES) mode is considered for the STAR-RIS. We aim to design the active beamforming vectors at the base station (BS) and the passive beamforming at the STAR-RIS to maximize the user sum rate under transmitting power constraints. The formulated problem is non-convex and nontrivial to obtain the global optimum due to the coupling between active beamforming vectors and STAR-RIS phase shifts. To efficiently solve the problem, we propose a novel graph neural network (GNN)-based framework. Specifically, we first model the interactions among users and network entities are using a heterogeneous graph representation. A heterogeneous graph neural network (HGNN) implementation is then introduced to directly optimizes beamforming vectors and STAR-RIS coefficients with the system objective. Numerical results show that the proposed approach yields efficient performance compared to the previous benchmarks. Furthermore, the proposed GNN is scalable with various system configurations.

[44]  arXiv:2405.01988 (cross-list from cs.SD) [pdf, other]
Title: Joint sentiment analysis of lyrics and audio in music
Comments: published at DAGA 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in emotion taxonomies.

[45]  arXiv:2405.02034 (cross-list from math.OC) [pdf, other]
Title: Multi-Agent Coverage Control on Surfaces Using Conformal Mapping
Authors: Chao Zhai, Yuming Wu
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

Real-time environmental monitoring using a multi-agent system (MAS) has long been a focal point of cooperative control. It is still a challenging task to provide cost-effective services for potential emergencies in surface environments. This paper explores the transformation of a general surface into a two-dimensional (2D) disk through the construction of a conformal mapping. Multiple agents are strategically deployed within the mapped convex disk, followed by mapping back to the original surface environment. This approach circumvents the complexities associated with handling the difficulties and intricacies of path planning. Technical analysis encompasses the design of distributed control laws and the method to eliminate distortions introduced by the mapping. Moreover, the developed coverage algorithm is applied to a scenario of monitoring surface deformation. Finally, the effectiveness of the proposed algorithm is validated through numerical simulations.

[46]  arXiv:2405.02044 (cross-list from cs.LG) [pdf, other]
Title: Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY); Optimization and Control (math.OC)

Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. Based on these results, we present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.

[47]  arXiv:2405.02066 (cross-list from cs.CV) [pdf, other]
Title: WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.

[48]  arXiv:2405.02119 (cross-list from cs.SD) [pdf, other]
Title: Can We Identify Unknown Audio Recording Environments in Forensic Scenarios?
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of the recorded audio to the recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide tools for closed-set recording environment classification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, closed-set tools are not applicable without retraining on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality.
In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining. Instead, it is the first tool for robust few-shot classification of unseen environment locations. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, environment characteristics or recording position mismatches.
Our code and datasets will be made publicly available upon acceptance.

[49]  arXiv:2405.02132 (cross-list from cs.SD) [pdf, other]
Title: Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Large Language Models have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition is becoming a mainstream paradigm. Building upon this momentum, our research delves into an indepth examination of this paradigm on a large opensource Chinese dataset. Specifically, our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules in the context of the speech foundation encoderLLM ASR paradigm. Furthermore, we introduce a threestage training approach, expressly developed to enhance the model's ability to align auditory and textual information. The implementation of this approach, alongside the strategic integration of ASR components, enabled us to achieve the SOTA performance on the AISHELL1, TestNet, and TestMeeting test sets. Our analysis presents an empirical foundation for future research in LLMbased ASR systems and offers insights into optimizing performance using Chinese datasets. We will publicly release all scripts used for data preparation, training, inference, and scoring, as well as pretrained models and training logs to promote reproducible research.

[50]  arXiv:2405.02151 (cross-list from cs.SD) [pdf, other]
Title: GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

The continuous evolution of pre-trained speech models has greatly advanced Speech Emotion Recognition (SER). However, there is still potential for enhancement in the performance of these methods. In this paper, we present GMP-ATL (Gender-augmented Multi-scale Pseudo-label Adaptive Transfer Learning), a novel HuBERT-based adaptive transfer learning framework for SER. Specifically, GMP-ATL initially employs the pre-trained HuBERT, implementing multi-task learning and multi-scale k-means clustering to acquire frame-level gender-augmented multi-scale pseudo-labels. Then, to fully leverage both obtained frame-level and utterance-level emotion labels, we incorporate model retraining and fine-tuning methods to further optimize GMP-ATL. Experiments on IEMOCAP show that our GMP-ATL achieves superior recognition performance, with a WAR of 80.0\% and a UAR of 82.0\%, surpassing state-of-the-art unimodal SER methods, while also yielding comparable results with multimodal SER approaches.

[51]  arXiv:2405.02179 (cross-list from cs.SD) [pdf, other]
Title: Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)

Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for.In this paper we study the potential of large-scale pre-trained models for audio deepfake detection, with special focus on generalization ability. To this end, the detection problem is reformulated in a speaker verification framework and fake audios are exposed by the mismatch between the voice sample under test and the voice of the claimed identity. With this paradigm, no fake speech sample is necessary in training, cutting off any link with the generation method at the root, and ensuring full generalization ability. Features are extracted by general-purpose large pre-trained models, with no need for training or fine-tuning on specific fake detection or speaker verification datasets. At detection time only a limited set of voice fragments of the identity under test is required. Experiments on several datasets widespread in the community show that detectors based on pre-trained models achieve excellent performance and show strong generalization ability, rivaling supervised methods on in-distribution data and largely overcoming them on out-of-distribution data.

[52]  arXiv:2405.02180 (cross-list from cs.LG) [pdf, other]
Title: A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, particularly as diverse low-carbon technologies are increasingly integrated. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it shows superior scalability in different datasets compared to traditional statistical, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models.

[53]  arXiv:2405.02191 (cross-list from cs.CV) [pdf, ps, other]
Title: Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning
Comments: 4 pages,4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Peat, a crucial component in whisky production, imparts distinctive and irreplaceable flavours to the final product. However, the extraction of peat disrupts ancient ecosystems and releases significant amounts of carbon, contributing to climate change. This paper aims to address this issue by conducting a feasibility study on enhancing peat use efficiency in whisky manufacturing through non-destructive analysis using hyperspectral imaging. Results show that shot-wave infrared (SWIR) data is more effective for analyzing peat samples and predicting total phenol levels, with accuracies up to 99.81%.

[54]  arXiv:2405.02198 (cross-list from cs.RO) [pdf, other]
Title: The Cambridge RoboMaster: An Agile Multi-Robot Research Platform
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

Compact robotic platforms with powerful compute and actuation capabilities are key enablers for practical, real-world deployments of multi-agent research. This article introduces a tightly integrated hardware, control, and simulation software stack on a fleet of holonomic ground robot platforms designed with this motivation. Our robots, a fleet of customised DJI Robomaster S1 vehicles, offer a balance between small robots that do not possess sufficient compute or actuation capabilities and larger robots that are unsuitable for indoor multi-robot tests. They run a modular ROS2-based optimal estimation and control stack for full onboard autonomy, contain ad-hoc peer-to-peer communication infrastructure, and can zero-shot run multi-agent reinforcement learning (MARL) policies trained in our vectorized multi-agent simulation framework. We present an in-depth review of other platforms currently available, showcase new experimental validation of our system's capabilities, and introduce case studies that highlight the versatility and reliabilty of our system as a testbed for a wide range of research demonstrations. Our system as well as supplementary material is available online: https://proroklab.github.io/cambridge-robomaster

Replacements for Mon, 6 May 24

[55]  arXiv:2107.12416 (replaced) [pdf, other]
Title: Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent
Comments: The arxiv version contains proofs of Lemma 3 and Lemma 5, which are missing in the published version
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
[56]  arXiv:2207.14682 (replaced) [pdf, other]
Title: Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks
Comments: Published at MMFORWILD 2022, ICPR Workshops - Code: this https URL . International Conference on Pattern Recognition. Cham: Springer Nature Switzerland, 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[57]  arXiv:2307.05641 (replaced) [pdf, other]
Title: Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Comments: published at Interspeech 2023 - Code: this https URL
Journal-ref: Proc. INTERSPEECH 2023, 5057-5061 (2023)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58]  arXiv:2307.12512 (replaced) [pdf, other]
Title: XRLoc: Accurate UWB Localization to Realize XR Deployments
Comments: This paper is accepted by ACM SenSys 2023. The published version is this https URL in ACM Digital Library
Journal-ref: Proceedings of ACM Conference on Embedded Networked Sensor Systems (ACM SenSys'23), pp.459-473, 2023
Subjects: Human-Computer Interaction (cs.HC); Networking and Internet Architecture (cs.NI); Robotics (cs.RO); Signal Processing (eess.SP)
[59]  arXiv:2307.15184 (replaced) [pdf, other]
Title: Sparsity aware coding for single photon sensitive vision using Selective Sensing
Subjects: Image and Video Processing (eess.IV)
[60]  arXiv:2308.09110 (replaced) [pdf, other]
Title: JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer
Comments: 15 pages, 9 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[61]  arXiv:2310.09653 (replaced) [pdf, other]
Title: SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Comments: Accepted at ICML 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[62]  arXiv:2310.11760 (replaced) [pdf, ps, other]
Title: Performance Investigation of an Optimal Control Strategy for Zero-Emission Operations of Shipboard Microgrids
Comments: Submitted to SPEEDAM 2024
Subjects: Systems and Control (eess.SY)
[63]  arXiv:2312.01441 (replaced) [pdf, other]
Title: Koopman-based feedback design with stability guarantees
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[64]  arXiv:2312.05176 (replaced) [pdf, other]
Title: MRI Scan Synthesis Methods based on Clustering and Pix2Pix
Comments: Accepted at AIME 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[65]  arXiv:2312.12267 (replaced) [pdf, other]
Title: Optimal Power Flow Pursuit via Feedback-based Safe Gradient Flow
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[66]  arXiv:2312.13859 (replaced) [pdf, other]
Title: Nonlinear Functional Estimation: Functional Detectability and Full Information Estimation
Comments: 15 pages, 3 figures
Subjects: Systems and Control (eess.SY)
[67]  arXiv:2312.17279 (replaced) [pdf, other]
Title: Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Comments: Shorter version accepted to ICASSP 2024
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[68]  arXiv:2401.10990 (replaced) [pdf, other]
Title: A Nonlinear Observer Design for the Discrete-time Systems: Exploiting Matrix-Multiplier-based LMI Approach
Authors: Shivaraj Mohite
Subjects: Systems and Control (eess.SY)
[69]  arXiv:2402.04074 (replaced) [pdf, other]
Title: Mean-Square Stability and Stabilizability for LTI and Stochastic Systems Connected in Feedback
Subjects: Systems and Control (eess.SY)
[70]  arXiv:2402.08289 (replaced) [pdf, ps, other]
Title: Why Studying Cut-ins? Comparing Cut-ins and Other Lane Changes Based on Naturalistic Driving Data
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[71]  arXiv:2402.15942 (replaced) [pdf, other]
Title: Minimum energy density steering of linear systems with Gromov-Wasserstein terminal cost
Comments: 7 pages
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[72]  arXiv:2402.18554 (replaced) [pdf, other]
Title: Extended Kalman filter -- Koopman operator for tractable stochastic optimal control
Comments: 6 pages
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[73]  arXiv:2403.12619 (replaced) [pdf, other]
Title: Detection of Malicious Agents in Social Learning
Subjects: Social and Information Networks (cs.SI); Multiagent Systems (cs.MA); Signal Processing (eess.SP)
[74]  arXiv:2403.17701 (replaced) [src]
Title: Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation
Comments: Experimental method encountered errors, undergoing experiment again
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[75]  arXiv:2405.01220 (replaced) [pdf, other]
Title: Misspecification of Multiple Scattering in Scalar Wave Fields and its Impact in Ultrasound Tomography
Comments: 17 pages, 7 figures
Subjects: Signal Processing (eess.SP); Medical Physics (physics.med-ph)
[ total of 75 entries: 1-75 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2405, contact, help  (Access key information)